Unit IV
Correlation
“Correlation analysis deals with the association between two or more variables.” —Simpson and Kafka
“Correlation is an analysis of the co-variation between two variables.” —A.M.Tuttle
Correlation measures the nature and strength of relationship between two variables. Correlation lies between +1 to -1. A correlation of +1 indicates a perfect positive correlation between two variables. A zero correlation indicates that there is no relationship between the variables. A correlation of -1 indicates a perfect negative correlation.
Karl Pearson’s Coefficient of Correlation is widely used mathematical method is used to calculate the degree and direction of the relationship between linear related variables. The coefficient of correlation is denoted by “r”.
Direct method
Shortcut method –
The value of the coefficient of correlation (r) always lies between ±1. Such as:
- r=+1, perfect positive correlation
- r=-1, perfect negative correlation
- r=0, no correlation
Example 1 - Compute Pearson’s coefficient of correlation between advertisement cost and sales as per the data given below:
Advertisement cost | 39 | 65 | 62 | 90 | 82 | 75 | 25 | 98 | 36 | 78 |
Sales | 47 | 53 | 58 | 86 | 62 | 68 | 60 | 91 | 51 | 84 |
Solution
X | Y | X - X | (X - X)2 | Y - Y | (Y - Y)2 |
|
39 | 47 | -26 | 676 | -19 | 361 | 494 |
65 | 53 | 0 | 0 | -13 | 169 | 0 |
62 | 58 | -3 | 9 | -8 | 64 | 24 |
90 | 86 | 25 | 625 | 20 | 400 | 500 |
82 | 62 | 17 | 289 | -4 | 16 | -68 |
75 | 68 | 10 | 100 | 2 | 4 | 20 |
25 | 60 | -40 | 1600 | -6 | 36 | 240 |
98 | 91 | 33 | 1089 | 25 | 625 | 825 |
36 | 51 | -29 | 841 | -15 | 225 | 435 |
78 | 84 | 13 | 169 | 18 | 324 | 234 |
650 | 660 |
| 5398 |
| 2224 | 2704 |
|
|
|
|
|
|
|
r = (2704)/√5398 √2224 = (2704)/(73.2*47.15) = 0.78
Thus Correlation coefficient is positively correlated
Example 2
Compute correlation coefficient from the following data
Hours of sleep (X) | Test scores (Y) |
8 | 81 |
8 | 80 |
6 | 75 |
5 | 65 |
7 | 91 |
6 | 80 |
X | Y | X - X | (X - X)2 | Y - Y | (Y - Y)2 |
|
8 | 81 | 1.3 | 1.8 | 2.3 | 5.4 | 3.1 |
8 | 80 | 1.3 | 1.8 | 1.3 | 1.8 | 1.8 |
6 | 75 | -0.7 | 0.4 | -3.7 | 13.4 | 2.4 |
5 | 65 | -1.7 | 2.8 | -13.7 | 186.8 | 22.8 |
7 | 91 | 0.3 | 0.1 | 12.3 | 152.1 | 4.1 |
6 | 80 | -0.7 | 0.4 | 1.3 | 1.8 | -0.9 |
40 | 472 |
| 7 |
| 361 | 33 |
X = 40/6 =6.7
Y = 472/6 = 78.7
r = (33)/√7 √361 = (33)/(2.64*19) = 0.66
Thus Correlation coefficient is positively correlated
Example 3
Calculate coefficient of correlation between X and Y series using Karl Pearson shortcut method
X | 14 | 12 | 14 | 16 | 16 | 17 | 16 | 15 |
Y | 13 | 11 | 10 | 15 | 15 | 9 | 14 | 17 |
Solution
Let assumed mean for X = 15, assumed mean for Y = 14
X | Y | Dx | Dx2 | Dy | Dy2 | Dxdy |
14 | 13 | -1.0 | 1.0 | -1.0 | 1.0 | 1.0 |
12 | 11 | -3.0 | 9.0 | -3.0 | 9.0 | 9.0 |
14 | 10 | -1.0 | 1.0 | -4.0 | 16.0 | 4.0 |
16 | 15 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 |
16 | 15 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 |
17 | 9 | 2.0 | 4.0 | -5.0 | 25.0 | -10.0 |
16 | 14 | 1 | 1 | 0 | 0 | 0 |
15 | 17 | 0 | 0 | 3 | 9 | 0 |
120 | 104 | 0 | 18 | -8 | 62 | 6 |
r = 8 *6 – (0)*(-8)
√8*18-(0)2 √8*62 – (-8)2
r = 48/√144*√432 = 0.19
Example 4 - Calculate coefficient of correlation between X and Y series using Karl Pearson shortcut method
X | 1800 | 1900 | 2000 | 2100 | 2200 | 2300 | 2400 | 2500 | 2600 |
F | 5 | 5 | 6 | 9 | 7 | 8 | 6 | 8 | 9 |
Solution
Assumed mean of X and Y is 2200, 6
X | Y | Dx | Dx (i=100) | Dx2 | Dy | Dy2 | Dxdy |
1800 | 5 | -400 | -4 | 16 | -1.0 | 1.0 | 4.0 |
1900 | 5 | -300 | -3 | 9 | -1.0 | 1.0 | 3.0 |
2000 | 6 | -200 | -2 | 4 | 0.0 | 0.0 | 0.0 |
2100 | 9 | -100 | -1 | 1 | 3.0 | 9.0 | -3.0 |
2200 | 7 | 0 | 0 | 0 | 1.0 | 1.0 | 0.0 |
2300 | 8 | 100 | 1 | 1 | 2.0 | 4.0 | 2.0 |
2400 | 6 | 200 | 2 | 4 | 0 | 0 | 0.0 |
2500 | 8 | 300 | 3 | 9 | 2 | 4 | 6.0 |
2600 | 9 | 400 | 4 | 16 | 3 | 9 | 12.0 |
|
|
|
|
|
|
|
|
|
|
| 0 | 60 | 9 | 29 | 24 |
Note – we can also proceed dividing x/100
r = (9)(24) – (0)(9)
√9*60-(0)2 √9*29– (9)2
r = 0.69
Example 5 –
X | 28 | 45 | 40 | 38 | 35 | 33 | 40 | 32 | 36 | 33 |
Y | 23 | 34 | 33 | 34 | 30 | 26 | 28 | 31 | 36 | 35 |
Solution
X | Y | X - X | (X - X)2 | Y - Y | (Y - Y)2 |
|
28 | 23 | -8 | 64 | -8.0 | 64.0 | 64.0 |
45 | 34 | 9 | 81 | 3.0 | 9.0 | 27.0 |
40 | 33 | 4 | 16 | 2.0 | 4.0 | 8.0 |
38 | 34 | 2 | 4 | 3.0 | 9.0 | 6.0 |
35 | 30 | -1 | 1 | -1.0 | 1.0 | 1.0 |
33 | 26 | -3 | 9 | -5.0 | 25.0 | 15.0 |
40 | 28 | 4 | 16 | -3 | 9 | -12.0 |
32 | 31 | -4 | 16 | 0 | 0 | 0.0 |
36 | 36 | 0 | 0 | 5 | 25 | 0.0 |
33 | 35 | -3 | 9 | 4 | 16 | -12 |
360 | 310 | 0 | 216 | 0 | 162 | 97 |
X = 360/10 = 36
Y = 310/10 = 31
r = 97/(√216 √162 = 0.51
Spearman’s Rank Correlation Coefficient - The Spearman’s Rank Correlation Coefficient is the non-parametric statistical measure used to study the strength of association between the two ranked variables. This method is used for ordinal set of numbers, which can be arranged in order.
Where, P = Rank coefficient of correlation
D = Difference of ranks
N = Number of Observations
The Spearman’s Rank Correlation coefficient lies between +1 to -1.
- +1 indicates perfect association of rank
- 0 indicates no association between the rank
- -1 indicates perfect negative association between the ranks
When ranks are not given - Rank by taking the highest value or the lowest value as 1
Equal Ranks or Tie in Ranks – in this case ranks are assigned on an average basis. For ex – if three students score of 5, at 5th, 6th, 7th ranks ach one of them will be assigned a rank of 5 + 6 + 7/3= 6.
If two individual ranked equal at third position, then the rank is calculates as (3+4)/2 = 3.5
Example 1 –
Test 1 | 8 | 7 | 9 | 5 | 1 |
Test 2 | 10 | 8 | 7 | 4 | 5 |
Solution
Here, highest value is taken as 1
Test 1 | Test 2 | Rank T1 | Rank T2 | d | d2 |
8 | 10 | 2 | 1 | 1 | 1 |
7 | 8 | 3 | 2 | 1 | 1 |
9 | 7 | 1 | 3 | -2 | 4 |
5 | 4 | 4 | 5 | -1 | 1 |
1 | 5 | 5 | 4 | 1 | 1 |
|
|
|
|
| 8 |
R = 1 – (6*8)/5(52 – 1) = 0.60
Example 2 -
Calculate Spearman rank-order correlation
English | 56 | 75 | 45 | 71 | 62 | 64 | 58 | 80 | 76 | 61 |
Maths | 66 | 70 | 40 | 60 | 65 | 56 | 59 | 77 | 67 | 63 |
Solution
Rank by taking the highest value or the lowest value as 1.
Here, highest value is taken as 1
English | Maths | Rank (English) | Rank (Math) | d | d2 |
56 | 66 | 9 | 4 | 5 | 25 |
75 | 70 | 3 | 2 | 1 | 1 |
45 | 40 | 10 | 10 | 0 | 0 |
71 | 60 | 4 | 7 | -3 | 9 |
62 | 65 | 6 | 5 | 1 | 1 |
64 | 56 | 5 | 9 | -4 | 16 |
58 | 59 | 8 | 8 | 0 | 0 |
80 | 77 | 1 | 1 | 0 | 0 |
76 | 67 | 2 | 3 | -1 | 1 |
61 | 63 | 7 | 6 | 1 | 1 |
|
|
|
|
| 54 |
R = 1-(6*54)
10(102-1)
R = 0.67
Therefore this indicates a strong positive relationship between the rank’s individuals obtained in the math and English exam.
Example 3 –
Find Spearman's rank correlation coefficient between X and Y for this set of data:
X | 13 | 20 | 22 | 18 | 19 | 11 | 10 | 15 |
Y | 17 | 19 | 23 | 16 | 20 | 10 | 11 | 18 |
Solution
X | Y | Rank X | Rank Y | d | d2 |
13 | 17 | 3 | 4 | -1 | 1 |
20 | 19 | 7 | 6 | 1 | 1 |
22 | 23 | 8 | 8 | 0 | 0 |
18 | 16 | 5 | 3 | 2 | 2 |
19 | 20 | 6 | 7 | -1 | 1 |
11 | 10 | 2 | 1 | 1 | 1 |
10 | 11 | 1 | 2 | -1 | 1 |
15 | 18 | 4 | 5 | -1 | 1 |
|
|
|
|
| 8 |
R =
R = 1 – 6*8/8(82 – 1) = 1 – 48 = 0.90
504
Example 4 – calculation of equal ranks or tie ranks
Find Spearman's rank correlation coefficient:
Commerce | 15 | 20 | 28 | 12 | 40 | 60 | 20 | 80 |
Science | 40 | 30 | 50 | 30 | 20 | 10 | 30 | 60 |
Solution
C | S | Rank C | Rank S | d | d2 |
15 | 40 | 2 | 6 | -4 | 16 |
20 | 30 | 3.5 | 4 | -0.5 | 0.25 |
28 | 50 | 5 | 7 | -2 | 4 |
12 | 30 | 1 | 4 | -3 | 9 |
40 | 20 | 6 | 2 | 4 | 16 |
60 | 10 | 7 | 1 | 6 | 36 |
20 | 30 | 3.5 | 4 | -0.5 | 0.25 |
80 | 60 | 8 | 8 | 0 | 0 |
|
|
|
|
| 81.5 |
R = 1 – (6*81.5)/8(82 – 1) = 0.02
Example 5 –
X | 10 | 15 | 11 | 14 | 16 | 20 | 10 | 8 | 7 | 9 |
Y | 16 | 16 | 24 | 18 | 22 | 24 | 14 | 10 | 12 | 14 |
Solution
X | Y | Rank X | Rank Y | d | d2 |
10 | 16 | 6.5 | 5.5 | 1 | 1 |
15 | 16 | 3 | 5.5 | -2.5 | 6.25 |
11 | 24 | 5 | 1.5 | 3.5 | 12.25 |
14 | 18 | 4 | 4 | 0 | 0 |
16 | 22 | 2 | 3 | -1 | 1 |
20 | 24 | 1 | 1.5 | -0.5 | 0.25 |
10 | 14 | 6.5 | 7.5 | -1 | 1 |
8 | 10 | 9 | 10 | -1 | 1 |
7 | 12 | 10 | 9 | 1 | 1 |
9 | 14 | 8 | 7.5 | 0.5 | 0.25 |
|
|
|
|
| 24 |
R = 1 – (6*24)/10(102 – 1) = 0.85
The correlation between X and Y is positive and very high.
Regression analysis is a technique of studying the dependence of one variable called dependent variable, on one or more variable called explanatory variable, with a view to estimate or predict the average value of the dependent variables in terms of the known or fixed values of the independent variables.
Regression analysis includes several variations, such as linear, multiple linear, and nonlinear. The most common models are simple linear and multiple linear.
Nonlinear regression analysis is commonly used for more complicated data sets in which the dependent and independent variables show a nonlinear relationship.
Linear model assumption
- The dependent and independent variables show a linear relationship between the slope and intercept
- The independent variable is not random
- The value of the residual (error) is zero
- The value of the residual (error) is constant across all observations
- The value of the residual (error) is not correlated across all observations
- The residual (error) values follow the normal distribution
Simple linear regression
Simple linear regression is a model that assesses the relationship between a dependent variable and an independent variable.
Y = a + bX + ϵ
Where:
Y – Dependent variable
X – Independent (explanatory) variable
a – Intercept
b – Slope
ϵ – Residual (error)
Multiple linear regressions
Multiple linear regression analysis is essentially similar to the simple linear model, with the exception that multiple independent variables are used in the model.
Y = a + bX1 + cX2 + dX3 + ϵ
Where:
Y – Dependent variable
X1, X2, X3 – Independent (explanatory) variables
a – Intercept
b, c, d – Slopes
ϵ – Residual (error)
Example
How to find a linear regression equation
Subject | X | Y |
1 | 43 | 99 |
2 | 21 | 65 |
3 | 25 | 79 |
4 | 42 | 75 |
5 | 57 | 87 |
6 | 59 | 81 |
|
|
|
Solution
Subject | X | Y | Xy | X2 | Y2 |
1 | 43 | 99 | 4257 | 1849 | 9801 |
2 | 21 | 65 | 1365 | 441 | 4225 |
3 | 25 | 79 | 1975 | 625 | 6241 |
4 | 42 | 75 | 3150 | 1764 | 5625 |
5 | 57 | 87 | 4959 | 3249 | 7569 |
6 | 59 | 81 | 4779 | 3481 | 6521 |
Total | 247 | 486 | 20485 | 11409 | 40022 |
To find a and b, use the following equation
Find a:
((486 × 11,409) – ((247 × 20,485)) / 6 (11,409) – 247*247)
484979 / 7445
=65.14
Find b:
(6(20,485) – (247 × 486)) / (6 (11409) – 247*247)
(122,910 – 120,042) / 68,454 – 2472
2,868 / 7,445
= .385225
y’ = a + bx
y’ = 65.14 + .385225x
Example
Calculate linear regression analysis
Students | X | Y |
1 | 95 | 85 |
2 | 85 | 95 |
3 | 80 | 70 |
4 | 70 | 65 |
5 | 60 | 70 |
Solution
Students | X | Y | X2 | y2 | Xy |
1 | 95 | 85 | 9025 | 7225 | 8075 |
2 | 85 | 95 | 7225 | 9025 | 8075 |
3 | 80 | 70 | 6400 | 4900 | 5600 |
4 | 70 | 65 | 4900 | 4225 | 4550 |
5 | 60 | 70 | 3600 | 4900 | 4200 |
Total | 390 | 385 | 31150 | 30275 | 30500 |
To find a and b, use the following equation
Find a:
((385 × 31150) – ((390 × 30500)) / 5 (31150) – 152100)
97750 / 3650
=26.78
Find b:
(5(30500) – (390 × 385)) / (5 (31150) – 152100)
2,350 / 3650
= .0.64
y’ = a + bx
y’ = 26.78 + .0.64x
Example
Given below are five observation collected in simple regression. Calculate the intercept, slope and write down the estimated regression equation
X | Y |
2 | 7 |
4 | 5 |
6 | 4 |
8 | 2 |
10 | 1 |
Solution
X | Y | X2 | y2 | Xy |
2 | 7 | 4 | 49 | 14 |
4 | 5 | 16 | 25 | 20 |
6 | 4 | 36 | 16 | 24 |
8 | 2 | 64 | 4 | 16 |
10 | 1 | 100 | 1 | 10 |
30 | 19 | 220 | 95 | 84 |
To find a and b, use the following equation
Find a:
((19 × 220) – ((30 × 84)) / 5 (220) – 900)
1660/ 200
=8.3
Find b:
(5(84) – (30 × 19)) / (5 (220) – 900)
-150 / 200
= -0.75
y’ = a + bx
y’ = 8.3 + (-0.75)x
Definition
“A time series is a set of observation taken at specified times, usually at equal intervals”
“A time series may be defined as a collection of reading belonging to different time periods of some economic or composite variables”..........by Ya Lun Chau
Time series establish relationship between cause and effects.
One variable is Time which is independent variable and the second data is the dependent variable.
Time series examples
- Stock price and Sensex
- Exchange rate, interest rate, inflation rate, national GDP
- Retail sales
- Electric power consumption
- Number of accident facilities
Components of time series
The change in time series is affected by economic, social, natural, industrial and political reasons. These reasons are called components of time series.
- Secular trend
- Seasonal variations
- Cyclical variation
- Irregular variation
Secular trend – the increase or decrease in the movements of a time series is called secular trend. The time series may show upward trend or downward trend for a period of years.
Examples- increase in population over a period of time. Price increase over a period of time. Sale of commodity decrease over a period of time.
Seasonal variations – seasonal variations are short term fluctuations in a time series which occur periodically in a year. This continues to repeat year after year. The major factors are weather conditions and customs of people.
Examples – woollen clothes are sold more in winter than in summer season, price increases during festivals,
Cyclical variations– Cyclical variations refers to recurrent upward or downward movement in a time series but the period of cycle is greater than a year. Also these variations are not regular as seasonal variations.
A business cycle has four phases which are prosperity, recession, depression and recovery. These four phases in a business are completed by passing one to another in this order.
Irregular variation –irregular variations are fluctuation in time series that are short in duration, erratic in nature, and follow no regularity in the occurrence pattern. This is also refers to as residual variations as it represents what is left out in time series after trend, cyclical and seasonal variations. Irregular fluctuations results due to the occurrence of unforeseen events like floods, earthquakes, famines, war.
Time series model
Additional model
Y = T+S+C+I, Where
Y = original data
T= trend value
S= seasonal fluctuations
C= cyclical fluctuations
Multiplication model
Y = T*S*C*I OR TCSI