UNIT 5
Correlation
Correlation is used to describe the linear relationship between two continuous variables (e.g., height and weight). In general, correlation tends to be used when there is no identified response variable. It measures the strength (qualitatively) and direction of the linear relationship between two or more variables.
Definition
“Correlation analysis deals with the association between two or more variables.” —Simpson and Kafka
“Correlation is an analysis of the co-variation between two variables.” —A.M. Tuttle
Uses and types
Uses
- Prognosis - The coefficient of correlation is used quite profitably in Prediction. It is used to predict the success one in a number of studies which will achieve in his further educational careers.
- Reliability: To test the reliability the co-efficient of correlation has been used very often. Through calculation of this statistics it has been sought to be asserted whether or not a test measures on two successive occasions the same type of thing.
3. Validity: A test’s width value can be obtained through correlation. Whenever a test is constructed the tests, not what it claims to test.
4. Test Construction: The coefficient of correlation is also being used in the test construction. There are always the questions whenever a new test is constructed, whether each element of the test is related to other elements or to the test as a whole and as to whether each element is related to the criteria chosen. Those relationships are all examined through the technique of correlation.
Types
Correlation measures the nature and strength of relationship between two variables. Correlation lies between +1 to -1. A correlation of +1 indicates a perfect positive correlation between two variables. A zero correlation indicates that there is no relationship between the variables. A correlation of -1 indicates a perfect negative correlation.
Degree of Correlation and its Nature
- Perfect correlation: If two variables change in the same proportion (increase or decrease), then the correlation between them is perfect correlation. Here, perfect correlation can be a positive or negative correlation.
- Coefficient of correlation (r) = 1: If there is perfect positive relationship between two variables, then the value of correlation will be +1.
- Coefficient of correlation (r) = −1: If there is perfect negative relationship between two variables, then the value of correlation will be −1
2. Zero correlation: The correlation is zero is said to be when two variables have no relationship between them,. It implies that a change in the value of one variable has no effect on the change in the value of the other variable.
- Coefficient of correlation (r) = 0: If there is no relationship between the two variables, then the value of correlation will be zero. However, it does not imply that these two variables are independent. It only indicates non-existence of linear relation between the two variables.
3. Limited degree of correlation: A limited degree of correlation exists between perfect correlation and zero correlation, i.e. the value of the coefficient of correlation lies between +1 and −1. This limited degree of correlation may be high, moderate or low.
- High degree of correlation: Correlation of two series of data is closer to one.
- Medium degree of correlation: Correlation of two series of data is neither large nor small.
- Low degree of correlation: Correlation of two series of data is small.
Product moment correlation is widely used mathematical method is used to calculate the degree and direction of the relationship between linear related variables. The coefficient of correlation is denoted by “r”.
Direct method
Shortcut method –
The value of the coefficient of correlation (r) always lies between ±1. Such as:
- r=+1, perfect positive correlation
- r=-1, perfect negative correlation
- r=0, no correlation
Example 1 - Compute Pearsons coefficient of correlation between advertisement cost and sales as per the data given below:
Advertisement cost | 39 | 65 | 62 | 90 | 82 | 75 | 25 | 98 | 36 | 78 |
Sales | 47 | 53 | 58 | 86 | 62 | 68 | 60 | 91 | 51 | 84 |
Solution
X | Y | X - X | (X - X)2 | Y - Y | (Y - Y)2 | |
39 | 47 | -26 | 676 | -19 | 361 | 494 |
65 | 53 | 0 | 0 | -13 | 169 | 0 |
62 | 58 | -3 | 9 | -8 | 64 | 24 |
90 | 86 | 25 | 625 | 20 | 400 | 500 |
82 | 62 | 17 | 289 | -4 | 16 | -68 |
75 | 68 | 10 | 100 | 2 | 4 | 20 |
25 | 60 | -40 | 1600 | -6 | 36 | 240 |
98 | 91 | 33 | 1089 | 25 | 625 | 825 |
36 | 51 | -29 | 841 | -15 | 225 | 435 |
78 | 84 | 13 | 169 | 18 | 324 | 234 |
650 | 660 |
| 5398 |
| 2224 | 2704 |
|
|
|
|
|
|
|
r = (2704)/√5398 √2224 = (2704)/(73.2*47.15) = 0.78
Thus Correlation coefficient is positively correlated
Example 2
Compute correlation coefficient from the following data
Hours of sleep (X) | Test scores (Y) |
8 | 81 |
8 | 80 |
6 | 75 |
5 | 65 |
7 | 91 |
6 | 80 |
X | Y | X - X | (X - X)2 | Y - Y | (Y - Y)2 | |
8 | 81 | 1.3 | 1.8 | 2.3 | 5.4 | 3.1 |
8 | 80 | 1.3 | 1.8 | 1.3 | 1.8 | 1.8 |
6 | 75 | -0.7 | 0.4 | -3.7 | 13.4 | 2.4 |
5 | 65 | -1.7 | 2.8 | -13.7 | 186.8 | 22.8 |
7 | 91 | 0.3 | 0.1 | 12.3 | 152.1 | 4.1 |
6 | 80 | -0.7 | 0.4 | 1.3 | 1.8 | -0.9 |
40 | 472 |
| 7 |
| 361 | 33 |
X = 40/6 =6.7
Y = 472/6 = 78.7
r = (33)/√7 √361 = (33)/(2.64*19) = 0.66
Thus Correlation coefficient is positively correlated
Example 3
Calculate coefficient of correlation between X and Y series using Karl pearson shortcut method
X | 14 | 12 | 14 | 16 | 16 | 17 | 16 | 15 |
Y | 13 | 11 | 10 | 15 | 15 | 9 | 14 | 17 |
Solution
Let assumed mean for X = 15, assumed mean for Y = 14
X | Y | Dx | Dx2 | Dy | Dy2 | Dxdy |
14 | 13 | -1.0 | 1.0 | -1.0 | 1.0 | 1.0 |
12 | 11 | -3.0 | 9.0 | -3.0 | 9.0 | 9.0 |
14 | 10 | -1.0 | 1.0 | -4.0 | 16.0 | 4.0 |
16 | 15 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 |
16 | 15 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 |
17 | 9 | 2.0 | 4.0 | -5.0 | 25.0 | -10.0 |
16 | 14 | 1 | 1 | 0 | 0 | 0 |
15 | 17 | 0 | 0 | 3 | 9 | 0 |
120 | 104 | 0 | 18 | -8 | 62 | 6 |
r = 8 *6 – (0)*(-8)
√8*18-(0)2 √8*62 – (-8)2
r = 48/√144*√432 = 0.19
Example 4 - Calculate coefficient of correlation between X and Y series using Karl pearson shortcut method
X | 1800 | 1900 | 2000 | 2100 | 2200 | 2300 | 2400 | 2500 | 2600 |
F | 5 | 5 | 6 | 9 | 7 | 8 | 6 | 8 | 9 |
Solution
Assumed mean of X and Y is 2200, 6
X | Y | Dx | Dx (i=100) | Dx2 | Dy | Dy2 | Dxdy |
1800 | 5 | -400 | -4 | 16 | -1.0 | 1.0 | 4.0 |
1900 | 5 | -300 | -3 | 9 | -1.0 | 1.0 | 3.0 |
2000 | 6 | -200 | -2 | 4 | 0.0 | 0.0 | 0.0 |
2100 | 9 | -100 | -1 | 1 | 3.0 | 9.0 | -3.0 |
2200 | 7 | 0 | 0 | 0 | 1.0 | 1.0 | 0.0 |
2300 | 8 | 100 | 1 | 1 | 2.0 | 4.0 | 2.0 |
2400 | 6 | 200 | 2 | 4 | 0 | 0 | 0.0 |
2500 | 8 | 300 | 3 | 9 | 2 | 4 | 6.0 |
2600 | 9 | 400 | 4 | 16 | 3 | 9 | 12.0 |
|
|
|
|
|
|
|
|
|
|
| 0 | 60 | 9 | 29 | 24 |
Note – we can also proceed dividing x/100
r = (9)(24) – (0)(9)
√9*60-(0)2 √9*29– (9)2
r = 0.69
Example 5 –
X | 28 | 45 | 40 | 38 | 35 | 33 | 40 | 32 | 36 | 33 |
Y | 23 | 34 | 33 | 34 | 30 | 26 | 28 | 31 | 36 | 35 |
Solution
X | Y | X - X | (X - X)2 | Y - Y | (Y - Y)2 | |
28 | 23 | -8 | 64 | -8.0 | 64.0 | 64.0 |
45 | 34 | 9 | 81 | 3.0 | 9.0 | 27.0 |
40 | 33 | 4 | 16 | 2.0 | 4.0 | 8.0 |
38 | 34 | 2 | 4 | 3.0 | 9.0 | 6.0 |
35 | 30 | -1 | 1 | -1.0 | 1.0 | 1.0 |
33 | 26 | -3 | 9 | -5.0 | 25.0 | 15.0 |
40 | 28 | 4 | 16 | -3 | 9 | -12.0 |
32 | 31 | -4 | 16 | 0 | 0 | 0.0 |
36 | 36 | 0 | 0 | 5 | 25 | 0.0 |
33 | 35 | -3 | 9 | 4 | 16 | -12 |
360 | 310 | 0 | 216 | 0 | 162 | 97 |
X = 360/10 = 36
Y = 310/10 = 31
r = 97/(√216 √162 = 0.51
Spearman’s Rank Correlation Coefficient - The Spearman’s Rank Correlation Coefficient is the non-parametric statistical measure used to study the strength of association between the two ranked variables. This method is used for ordinal set of numbers, which can be arranged in order.
Where, P = Rank coefficient of correlation
D = Difference of ranks
N = Number of Observations
The Spearman’s Rank Correlation coefficient lies between +1 to -1.
- +1 indicates perfect association of rank
- 0 indicates no association between the rank
- -1 indicates perfect negative association between the ranks
When ranks are not given - Rank by taking the highest value or the lowest value as 1
Equal Ranks or Tie in Ranks – in this case ranks are assigned on an average basis. For ex – if three students score of 5, at 5th, 6th, 7th ranks ach one of them will be assigned a rank of 5 + 6 + 7/3= 6.
If two individual ranked equal at third position, then the rank is calculates as (3+4)/2 = 3.5
Example 1 –
Test 1 | 8 | 7 | 9 | 5 | 1 |
Test 2 | 10 | 8 | 7 | 4 | 5 |
Solution
Here, highest value is taken as 1
Test 1 | Test 2 | Rank T1 | Rank T2 | d | d2 |
8 | 10 | 2 | 1 | 1 | 1 |
7 | 8 | 3 | 2 | 1 | 1 |
9 | 7 | 1 | 3 | -2 | 4 |
5 | 4 | 4 | 5 | -1 | 1 |
1 | 5 | 5 | 4 | 1 | 1 |
|
|
|
|
| 8 |
R = 1 – (6*8)/5(52 – 1) = 0.60
Example 2 -
Calculate Spearman rank-order correlation
English | 56 | 75 | 45 | 71 | 62 | 64 | 58 | 80 | 76 | 61 |
Math’s | 66 | 70 | 40 | 60 | 65 | 56 | 59 | 77 | 67 | 63 |
Solution
Rank by taking the highest value or the lowest value as 1.
Here, highest value is taken as 1
English | Math’s | Rank (English) | Rank (Math) | d | d2 |
56 | 66 | 9 | 4 | 5 | 25 |
75 | 70 | 3 | 2 | 1 | 1 |
45 | 40 | 10 | 10 | 0 | 0 |
71 | 60 | 4 | 7 | -3 | 9 |
62 | 65 | 6 | 5 | 1 | 1 |
64 | 56 | 5 | 9 | -4 | 16 |
58 | 59 | 8 | 8 | 0 | 0 |
80 | 77 | 1 | 1 | 0 | 0 |
76 | 67 | 2 | 3 | -1 | 1 |
61 | 63 | 7 | 6 | 1 | 1 |
|
|
|
|
| 54 |
R = 1-(6*54)
10(102-1)
R = 0.67
Therefore this indicates a strong positive relationship between the ranks individuals obtained in the math and English exam.
Example 3 –
Find Spearman's rank correlation coefficient between X and Y for this set of data:
X | 13 | 20 | 22 | 18 | 19 | 11 | 10 | 15 |
Y | 17 | 19 | 23 | 16 | 20 | 10 | 11 | 18 |
Solution
X | Y | Rank X | Rank Y | d | d2 |
13 | 17 | 3 | 4 | -1 | 1 |
20 | 19 | 7 | 6 | 1 | 1 |
22 | 23 | 8 | 8 | 0 | 0 |
18 | 16 | 5 | 3 | 2 | 2 |
19 | 20 | 6 | 7 | -1 | 1 |
11 | 10 | 2 | 1 | 1 | 1 |
10 | 11 | 1 | 2 | -1 | 1 |
15 | 18 | 4 | 5 | -1 | 1 |
|
|
|
|
| 8 |
R =
R = 1 – 6*8/8(82 – 1) = 1 – 48 = 0.90
504
Example 4 – calculation of equal ranks or tie ranks
Find Spearman's rank correlation coefficient:
Commerce | 15 | 20 | 28 | 12 | 40 | 60 | 20 | 80 |
Science | 40 | 30 | 50 | 30 | 20 | 10 | 30 | 60 |
Solution
C | S | Rank C | Rank S | d | d2 |
15 | 40 | 2 | 6 | -4 | 16 |
20 | 30 | 3.5 | 4 | -0.5 | 0.25 |
28 | 50 | 5 | 7 | -2 | 4 |
12 | 30 | 1 | 4 | -3 | 9 |
40 | 20 | 6 | 2 | 4 | 16 |
60 | 10 | 7 | 1 | 6 | 36 |
20 | 30 | 3.5 | 4 | -0.5 | 0.25 |
80 | 60 | 8 | 8 | 0 | 0 |
|
|
|
|
| 81.5 |
R = 1 – (6*81.5)/8(82 – 1) = 0.02
Example 5 –
X | 10 | 15 | 11 | 14 | 16 | 20 | 10 | 8 | 7 | 9 |
Y | 16 | 16 | 24 | 18 | 22 | 24 | 14 | 10 | 12 | 14 |
Solution
X | Y | Rank X | Rank Y | d | d2 |
10 | 16 | 6.5 | 5.5 | 1 | 1 |
15 | 16 | 3 | 5.5 | -2.5 | 6.25 |
11 | 24 | 5 | 1.5 | 3.5 | 12.25 |
14 | 18 | 4 | 4 | 0 | 0 |
16 | 22 | 2 | 3 | -1 | 1 |
20 | 24 | 1 | 1.5 | -0.5 | 0.25 |
10 | 14 | 6.5 | 7.5 | -1 | 1 |
8 | 10 | 9 | 10 | -1 | 1 |
7 | 12 | 10 | 9 | 1 | 1 |
9 | 14 | 8 | 7.5 | 0.5 | 0.25 |
|
|
|
|
| 24 |
R = 1 – (6*24)/10(102 – 1) = 0.85
The correlation between X and Y is positive and very high.
Sources
- B.N Gupta – Statistics
- S.P Singh – statistics
- Gupta and Kapoor – Statistics
- Yule and Kendall – Statistics method