Descriptive Statistics

frequency distributions

  • records number of times each value occur

arithmetic mean

  • sum of values of items / number of items

the mode

  • Definition: value that appears most often in a set of data
  • In a grouped frequency distribution
    • define the modal class (class with largest frequency)
    • use the formula: \begin{equation} \text{Mode} = L + \frac{{d_1}}{{d_1 + d_2}} \cdot h \end{equation} : the lower boundary of the modal class : absolute value of the difference between the frequency of the class modal and the class before (after) it : length of the interval of the modal class.

the median

  • Definition: the number in the middle of a sorted data set.
    • even-size data set: average of 2 middle numbers
    • odd-size data set: middle number

Range:

  • the difference between the largest and the smallest value

Percentile:

  • a measure indicates value BELOW which a given percentage of observations in a group of observations falls
    • eg. 20th percentile = value below which 20% of observations found
  • Find location of the Pth percentile : location in the ordered array of the desired percentile : number of observations : desired percentile
  • Find percentile based on location:

Quartiles

  • Definition: three points divide dataset into 4 equal groups, each comprises a quarter of the data
  • : middle number btw smallest & median : median : middle value btw median & highest

Deciles

  • any of nine values divide sorted data into 10 equal parts, each represents 10% of the sample / population
  • just like Percentile, for eg 8th Decile = 80% Percentile

Percentile with grouped data

  • calculate from frequency table : from calculate the location β†’ determine the class from the cumulative frequency (round up)

Inter-quartile range (IQR)

Variance

  • the average of the squared mean deviation for each value in a distribution
  • normal variance
    $x_i$: individual observation $\bar{x}$ : population mean $n$: number of observations
  • Variance with grouped data
    $f$: class frequency $M$: class midpoint $n$: number of observations

Standard deviation:

  • square root of the variance $$\sigma = \sqrt{\sigma^2}
- Standard deviation: in finance, can measure the risk associated with various investment opportunities. the higher the std, the greater the risk - Two investments with similar std may have different distribution of returns <!--ID: 1708098043547--> ## symmetry - one that can be divided into two mirrored halves of each other, with same arithmetic mean, median, and mode ("bell curve") <!--ID: 1708098043549--> ## skewness - the asymmetry of a frequency distribution curve, with different mean, mode, median - positive skewed distribution: lean left (just like holding left thumb), $mean > median$ - negative skewed distribution: lean right (just like holding right thumb), $mean < median$ <!--ID: 1708098043553--> ## hard parts -> cheat sheet - interpretation of the Quartiles <!--ID: 1708098043557-->