Sampling and Data

Sampling and Data

  • DataSet of values of qualitative or quantitative variables
  • Lowest level of abstraction, from which information and knowledge are derived
  • population: khΓ΄ng gian mαΊ«u, toΓ n bα»™ x (total clients, total guests)

  • data: sα»‘ liệu thα»±c tαΊΏ cα»§a cΓ‘c variable (1 giờ, 3 ngΓ y, etc)

    • set of values of qualitative or quantitative variables
    • lowest level of abstraction
    • info and knowledge are derived from this
    • from sample data β†’ calculate statistic
  • variable: cΓ‘c measuring units cα»§a data (hours, minutes, pieces, …)

  • statistics:

    • a number represents property of the sample
    • an estimate of population parameter
    • what you can obtained from the sample that can represent the parameter / goal of measuring
  • parameter

    • numerical characteristic of the whole population, estimated by a statistic
    • goal of measuring / surveying in this case
  • Attribute and Variable - β€œmeasurable?”

    • Attribute: characteristic of an object that cannot be measured
      • Example: sensibility
    • Variable: something that may or does vary and can be measured
      • Example: height
  • Discrete and Continuous Variables - β€œprecisely countable?”

    • Discrete: variable that can only take a countable number of values
      • Example: number of employees
    • Continuous: variable that may take on any value
      • Example: height of people

Types of Data Collection

  • Census: data collection about everyone or everything in a population
    • Advantages: high accuracy
    • Disadvantages: expensive, time-consuming, out of date
  • Sample survey: data collection from a part of the population
    • Advantages: less expensive, faster
    • Disadvantages: less accurate, depends on sample size and methods
  • Primary and Secondary data
    • Primary: data collected directly for the purpose of the survey
    • Secondary: data collected for some other purpose, but can be used for the survey
  • Methods of Obtaining Sample Data
    • Observation: gathering data by watching behavior, events, or characteristics in their natural setting
      • Advantages: understand ongoing process or situation, gather data on individual behaviors or interactions, know about physical setting
      • Disadvantages: data collection from individuals may not be realistic
    • Experimentation: testing competing models or hypotheses, or testing existing theories or new hypotheses
    • Qualitative techniques: investigating the why and how of decision making, using smaller but focused samples
    • Questionnaires: series of questions for gathering information from respondents
      • Advantages: quick, cheap, standardized answers
      • Disadvantages: may frustrate respondents, may lead to biased results
      • Methods: phone or personal interviews, postal surveys, self-completion

Sampling

  • Process of selecting a sample of items from a population
  • Why using sample instead of census: completeness, cost, time, accuracy
  • Sampling frame: list of all those within a population who can be sampled
    • Characteristics: completeness, accuracy, up to date, non-duplication
  • Random sampling: every item in the population has an equal chance of being included
    • Drawbacks: unrepresentative sample, scattered population (more spread out rather than focused)
  • Quasi-random sampling (method of selecting samples combining random and non-random sampling): approximation to random sampling, includes systematic, stratified, and multistage sampling
    • Systematic: select an element from the list at random and then every kth element
    • Stratified: divide population into homogeneous subgroups (subgroup sharing similar traits) and then sample within each subgroup
      • Advantages: representative, precise
      • Disadvantages: not useful when the population cannot be partitioned
    • Multistage: divide the population into groups and then sample within selected groups
      • Advantages: cost, speed, convenience
      • Disadvantages: bias, not truly random
  • Non-random sampling: used when the sampling frame cannot be established, includes quota and cluster sampling
    • Quota: segment the population into subgroups and then select subjects from specific subgroup, not of all population
      • Advantages: time, budget, accuracy
      • Disadvantages: unreliable, biased
    • Cluster: divide the population into groups and then sample all the elements in one or more groups
      • Advantages: convenient, practical
      • Disadvantages: less accurate, less representative

Survey Methods

  • Two main categories: questionnaire and interview
  • Two types of questionnaires: postal and group administered
    • Postal: questionnaire sent by mail to respondents
      • Advantages: inexpensive, same questionnaire, respondents’ convenience, time to read
      • Disadvantages: low response rate
    • Group administered: questionnaire filled by respondents in a group setting under supervision
      • Advantages: higher response rate
  • Interviews: may be qualitative or quantitative
    • Quantitative: personal and telephone interview
      • Personal: face-to-face conversation between interviewer and respondent
        • Advantages: high response rate, low response errors
        • Disadvantages: time-consuming, expensive, simple questions
      • Telephone: voice conversation between interviewer and respondent
        • Advantages: rapid, no travel, sensitive questions
        • Disadvantages: high refusal rate, short interview
    • Qualitative: focus group
      • Focus group: group of people asked about their perceptions, opinions, beliefs, and attitudes towards a product, service, concept, etc
        • Advantages: data and insights from group interaction, common language
        • Disadvantages: scheduling difficulty

Questionnaire Design

  • Basic rules for questionnaire construction:
    • Each question should be clear, unambiguous, and easy to understand
    • Every respondent should be able to answer every question
    • Each question should relate directly to survey objectives
    • Question should not be biased or make assumptions
    • Do not use double-barreled, long, negative, or guessing questions