Correlation refers to any of a broad class of statistical relationship involving dependence
dependence is any statistical relationship between 2 random variables or 2 sets of data
a typical way of showing the correlation between 2 related variables - scatter diagram
iow
Correlation:
This is a statistical measure that indicates the extent to which two or more variables fluctuate together
A positive correlation indicates the extent to which those variables increase or decrease in parallel; a negative correlation indicates the extent to which one variable increases as the other decreases
For example, letβs consider the relationship between the amount of time spent studying and the grades obtained. These two variables are likely to be positively correlated because as the amount of time spent studying increases, the grades obtained also tend to increase.
Description:
The correlation of two random variables X and Y, denoted by Ο(X,Y)
defined as long as Var(X).Var(Y) is positive
Denotes the degree of linearity:
Near +1 or -1 denotes there is a high degree of linearity
Near 0 indicates such linearity is absent
Calculation:
Ο(X,Y)=Var(X)Var(Y)βCov(X,Y)β
β1β€Ο(X,Y)β€1
Ο(X,Y)=1 implies m=Οyβ/Οxβ>0 for y=mx+c
Ο(X,Y)=β1 implies m=βΟyβ/Οxβ<0 for y=mx+c
measuring the degree of correlation between two variables
to predict the values for one variable y given the other variable x β find line of best fit
Coefficient of determination:
(r2)
coef of determination = sqr of correlation coef
measures the explanatory power of the regression model; iow, how well our data fits the regression model
ranges from 0 to 1
0: none of the variability of the response data around its mean
1: model explains all the variability of the response data around its mean
eg: (r2=0.85)β 85% of the variation in the dependent variable is explained by the independent variables in the model. the remaining 15% is not explained by the model