Unit – 5
Applied Statistics
Method of Least Squares:
Let … (1)
Be the straight line to be fitted to the given data points .
Let be the theoretical value for .
Then
For S to be minimum
or
[To generalise , is written as y]
or
On Simplification equation (2) and (3) becomes
The equations (3) and (4) are known as Normal equations.
On solving equations (3) and (4), we get the values of a and b.
(b) To fit the parabola:
The normal equations are
On solving three equations, we get the values of a, b and c.
Note:
1. The normal equation (4) has been obtained by puttingon both sides of
Equation (1). Equation (5) is obtained by multiplying on both sides of (1).
2. The normal equation (7), (8), (9) are obtained by multiply by and on both sides of equation (6).
Example: Find the best values of a and b so that fits the data given in the table.
X | 0 | 1 | 2 | 3 | 4 |
Y | 1 | 2.9 | 4.8 | 6.7 | 8.6 |
Solution:
0 | 1 | 0 | 0 |
1 | 2.9 | 2.9 | 1 |
2 | 4.8 | 9.6 | 4 |
3 | 6.7 | 20.1 | 9 |
4 | 8.6 | 13.4 | 16 |
|
|
Normal equations …. (2)
…. (3)
On putting the values of in (2) and (3), we have
…. (4)
…. (5)
On solving (4) and (5), we get
On Substituting the values of a and b in (1), we get
Example: By the method of least squats, find the straight line that best fits the following data:
1 | 2 | 3 | 4 | 5 | |
14 | 27 | 40 | 55 | 68 |
Solution: Let the equation of the straight line best fit be …. (1)
1 | 14 | 14 | 1 |
2 | 27 | 54 | 4 |
3 | 40 | 120 | 9 |
4 | 55 | 220 | 16 |
5 | 68 | 340 | 25 |
Normal equations are
On Putting the values of in (2) and (3), we have
On solving (4) and (5), we get
On Substituting the values of a and b in (1), we get
Example: Find the least squares approximation of second degree for the discrete data.
-2 | -1 | 0 | 1 | 2 | |
15 | 1 | 1 | 3 | 19 |
Solution:
Let the equation of second degree polynomial be
-2 | 15 | -30 | 4 | 60 | -8 | 16 |
-1 | 1 | -1 | 1 | 1 | -1 | 1 |
0 | 1 | 0 | 0 | 0 | 0 | 0 |
1 | 3 | 3 | 1 | 3 | 1 | 1 |
2 | 19 | 38 | 4 | 76 | 8 | 16 |
Normal equations are
On putting the values of in equations (2), (3), (4), we have
On solving (5), (6), (7), we get
The required polynomial of second degree is
Change of Scale
If the data is of equal interval in large numbers then we change the scale as .
Example: Fit a second degree parabola to the following data by least squares method.
1929 | 1930 | 1931 | 1932 | 1933 | 1934 | 1935 | 1936 | 1937 | |
352 | 356 | 357 | 358 | 360 | 361 | 361 | 360 | 359 |
Solution: Taking
Taking
The equation is transformed to
1929 | -4 | 352 | -5 | 20 | 16 | -80 | -64 | 256 |
1930 | -3 | 360 | -1 | 3 | 9 | -9 | -27 | 81 |
1931 | -2 | 357 | 0 | 0 | 4 | 0 | -8 | 16 |
1932 | -1 | 358 | 1 | -1 | 1 | 1 | -1 | 1 |
1933 | 0 | 360 | 3 | 0 | 0 | 0 | 0 | 0 |
1934 | 1 | 361 | 4 | 4 | 1 | 4 | 1 | 1 |
1935 | 2 | 361 | 4 | 8 | 4 | 16 | 8 | 16 |
1936 | 3 | 360 | 3 | 9 | 9 | 27 | 27 | 81 |
1937 | 4 | 359 | 2 | 8 | 16 | 32 | 64 | 256 |
Total |
|
Normal equations are
On solving these equations, we get
Example: Fit a second-degree parabola to the following data:
x | 0 | 1 | 2 | 3 | 4 |
y | 1 | 1.8 | 1.3 | 2.5 | 6.3 |
Solution: Let and so that the parabola of fit becomes
…. (i)
The normal equations are
Saving these as simultaneous equations we get
(i) becomes
Or
Hence
Example: Fit a second-degree parabola to the following data:
1.5 | 2 | 2.5 | 3 | 3.5 | 4 | |
1.3 | 1.6 | 2 | 2.7 | 3.4 | 4.1 |
Solution: We shift the origin to (2.5, 0) and take 0.5 as the new unit. This amounts to changing the variable to X, by the relation
Let the parabola of fit be . The values of etc., ae calculated below:
1.0 | -3 | 1.1 | -3.3 | 9 | 9.9 | -27 | 81 |
1.5 | -2 | 1.3 | -2.6 | 4 | 5.2 | -8 | 16 |
2.0 | -1 | 1.6 | -1.6 | 1 | 1.6 | -1 | 1 |
2.5 | 0 | 2 | 0 | 0 | 0 | 0 | 0 |
3.0 | 1 | 2.7 | 2.7 | 1 | 2.7 | 1 | 1 |
3.5 | 2 | 3.4 | 6.8 | 4 | 13.6 | 8 | 16 |
4.0 | 3 | 4.1 | 12.3 | 9 | 36.9 | 27 | 81 |
Total | 0 | 16.2 | 14.2 | 28 | 69.9 | 0 | 196 |
The normal equations are
Solving these as simultaneous equations, we get
Replacing X by in the above equation, we get
Which simplifies by . This is the required parabola of best fit.
Test of Significance
The tests which enables us to decide whether to accept of to reject the null hypothesis is called the tests of significance. If the difference between the sample values and the population values are so large (lies in critical area). It is to be rejected.
Test of Significance of Large Samples (N>30)
Normal distribution is the limiting case of Binomial distribution when n is large enough. For normal distribution 5% of the items lie outside while only 1% of the items lie outside.
Where z is the standard normal variate and x is the observed number of successes. First, we find the value of z. Test of significance depends upon the value of z.
(i) (a) If , difference between the observed and expected number of successes significant at the 5% level of significance.
(b) If, difference is significant at 5% level of significance.
(ii) (a) If, difference between the observed and expected number of successes significant at 1% level of significance.
(b) If , difference is significant at 1% level of significance.
Example: A cubical die was thrown 9,000 times and 1 or 6 was obtained 3120 times. Can the deviation from expected value lie due to fluctuations of sampling?
Solution: Let us consider the hypothesis that the die is an unbiased one and hence the probability of obtaining 1 or
The expected value of the number of successes
Also
Actual number of successes = 3120
Difference between the actual number of successes and expected number of successes =3120-3000=120 which is <3
Hence the hypothesis is correct and the deviation is due to fluctuations of sampling due random causes.
Example: A coin was tossed 400 times and the head turned up 216 times. Test the hypothesis the coin unbiased at 5% level of significance.
Solution: Suppose the coin is unbiased.
Then the probability of getting the head in a toss
expected number of successes
Thus the excess of observed value over expected value = 216 – 200 = 16
Also S.D. Of simple sampling
Hence
As, the hypothesis is accepted at 5% level of significance i.e., we conclude that the coin is unbiased at 5% level of significance.
Example: A die was thrown 9000 times and a throw of 5 or 6 was obtained 3240 times. On the assumption of random throwing, do the data indicate an unbiased die?
Solution: Suppose the die is unbiased.
Then the probability of throwing 5 or 6 with one die
The expected number of successes
And the observed value of successes = 3240
Thus the excess of observed value over expected value 3240-3000=240
Also S.D. Of simple sampling
Hence nearly.
As , the hypothesis has to be rejected at1% level of significance and we conclude that the die is biased.
Comparison of Large Samples
Two large samples of sizes are taken from two populations giving proportions of attributes A’s as respectively.
If be the standard errors in the two samples then
and
If e be the standard error of the differences between and, then
If , the difference between and is real one.
If , the difference may be due to fluctuations of simple sampling.
But if z lies between 2 and 3, then the difference is significant at 5% level of significance.
(b) If the proportions of A’s are not the same in the two populations from which the samples are drawn put and are the true values of proportions then S.E. e of the difference is given by
If , the difference could have arisen due to fluctuations of simple sampling.
Example: In a city A 20% of a random sample of 900 school boys had a certain slight physical defect. In another city B, 18.5% of random sample of 1600 school boys had the same defect. Is this difference between the proportions significant?
Solutions: We have
And
And
Thus
Giving nearly
Also
As , the difference between the proportions is not significant.
Example: In two large populations there are 30% and 25% respectively of fair haired people. Is this difference likely to be hidden in samples of 1200 and 900 respectively from the two populations?
Solutions:
Here so that .
So that
Hence it is unlikely that the real difference will be hidden.
Example: One type of aircraft is found to develop engine trouble in 5 flights out of a total of 100 and another type in 7 flights out of a total of 200 flights. Is there a significant difference in the two types of aircrafts so far as engine defects are concerned.
Solution: flights, Number of troubled flights
flights, Number of troubled flights
, Difference is not significant.
Example: In a sample of 600 men from a certain city, 450 are found smokers. In another sample of 900 men from another city, 450 are smokers. Do the data indicate that the cities are significantly different with respect to the habit of smoking among men.
Solution: men. Number of smokers
men. Number of smokers
so that the difference is significant.
Significance Test of a sample mean
Given a random small sample from a normal population, we have to test the hypothesis that mean of the population is . For this, We first calculate where .
Then the find the value of P for the given df from the table.
If the calculated value of, the difference between and is said to be significant at 5% level if significance.
If , the difference is said to be significant at 1% level of significance.
If the data is said to be consistent with the hypothesis that is the mean of population.
Example: A certain stimulus administered to each 12 patients resulted in the following increases of blood pressure: 5, 2. 8, -1, 3, 0, 4, 6. Can it be concluded that the stimulus will in general be accompanied by an increase in blood pressure.
Solution:
Let us assume that the stimulus administered to all the 12 patients will increase the B.P. Taking the population to be normal with mean and S.D.,
Now
Here
For , from table IV.
Since the , our assumption is rejected i.e., the stimulus does not increase the B.P.
Example: The nine items of a sample have the following values: 45, 47, 50, 52, 48, 47, 49, 53, 51. Does the mean of these differ significantly from the assumed mean of 47.5?
Solution: We find the mean and standard deviation of the sample as follows:
45 | -3 | 9 |
47 | -1 | 1 |
50 | 2 | 4 |
52 | 2 | 4 |
48 | 0 | 0 |
47 | -1 | 1 |
49 | 1 | 1 |
53 | 5 | 25 |
51 | 3 | 9 |
Total | 10 | 66 |
mean
Hence
Here
For , we get table IV,
As calculated value of , the value of t is not significant at 5% level of significance which implies that there is no significant difference between and . Thus the test provides no evidence against the provides no evidence against the population mean being 47.5.
Example: A machinist is making engine parts with axle diameter of 0.7 inch. A random sample of 10 parts shows mean diameter 0.742 inch with
Degree of freedom
For we get from table IV, .
As the calculated value of , the value of t is significant at 5% level of significant at 5% level of significance. This implies that differs significantly fromand the hypothesis is rejected. Hence the work is inferior. In fact, the work is inferior even at 2% level of significance.
Significance Test of difference between sample means:
Given two independent samples and with means and standard deviations and from a normal population with the same variance, we have to test the hypothesis that the population means andare the same.
For this, we calculate …. (1)
Where
And
It can be shows that the variate t defined by (1) follows the t – distribution with degrees of freedom.
If the calculated value of the difference between the sample means is said to be significant at 5% level of significance.
If , the difference is said to be significant at 1% level of significance.
If , the data is said to be consistent with the hypothesis, that .
Cor: If the two samples are of the same size and the data are paired, then t is defined by
where
difference of the ith members of the samples;
mean of the difference ; and the number
Example: Eleven students were given a test in statics. They were given a month’s further tuition and a second test of equal difficulty was held at the end of it. Do the marks give evidence that the students have benefitted by extra coaching?
Boys | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 |
Marks I test | 23 | 20 | 19 | 21 | 18 | 20 | 18 | 17 | 23 | 16 | 19 |
Marks II test | 24 | 19 | 22 | 18 | 20 | 22 | 20 | 20 | 23 | 20 | 17 |
Solution: We compute the means and the S.D. Of the difference between the marks of the two tests as under:
mean of d’s
Assuming that the students have not been benefited by extra coaching, it implies that the mean of the difference between the marks of the two tests is zero i.e., .
Then nearly and
Students | |||||
1 | 23 | 24 | 1 | 0 | 0 |
2 | 20 | 19 | -1 | -2 | 4 |
3 | 19 | 22 | 3 | 2 | 4 |
4 | 21 | 18 | -3 | -4 | 16 |
5 | 18 | 20 | 2 | 1 | 1 |
6 | 20 | 22 | 2 | 1 | 1 |
7 | 18 | 20 | 2 | 1 | 1 |
8 | 17 | 20 | 3 | 2 | 4 |
9 | 23 | 23 | - | -1 | 1 |
10 | 16 | 20 | 4 | 3 | 9 |
11 | 19 | 17 | -2 | -3 | 9 |
|
|
|
|
|
From table IV, we find that (for ). As the calculated value of , the value of t is not significant at 5% level of significance i.e., the test provides no evidence that the students have benefited by extra coaching.
Example:
From a random sample of 10 pigs fed on diet A, the increases in weight in a certain period were 10, 6, 16, 17, 13, 12,8, 14,15, 9 lbs. For another random sample of 12 pigs fed on diet B, the increases in the same period were 7,13, 22, 15, 12, 14, 18, 8, 21, 23, 10, 17 lbs. Test whether diets A and B differ significantly as regards their effect on increases in weight?
Solution: We calculate the means and standard deviations of the sample as follows:
| Diet A |
|
| Diet B |
|
10 | -2 | 4 | 7 | -8 | 64 |
6 | -6 | 36 | 13 | -2 | 4 |
16 | 4 | 16 | 22 | 7 | 49 |
17 | 5 | 25 | 15 | 0 | 0 |
13 | 1 | 1 | 12 | -3 | 9 |
12 | 0 | 0 | 14 | -1 | 1 |
8 | -4 | 16 | 18 | 3 | 9 |
14 | 2 | 4 | 8 | -7 | 49 |
15 | 3 | 9 | 21 | 6 | 36 |
9 | -3 | 9 | 23 | 8 | 64 |
|
|
| 10 | -5 | 25 |
|
|
| 17 | 2 | 4 |
120 | 0 | 120 | 180 | 0 | 314 |
Assuming that the samples that the samples do not differ in weight so far as the two diets are connected i.e.,
Hence nearly
Here
For we find [From table IV]
the calculated value of .
Hence the difference between the sample means is not significant i.e., the two diets do not differ significantly as regards their effect on increase in weight.
Reference
- Erwin Kreyszig, Advanced Engineering Mathematics, 9th Edition, John Wiley and amp; Sons, 2006.
- N.P. Bali and Manish Goyal, A text book of Engineering Mathematics, Laxmi Publications, Reprint, 2010.
- Veerarajan T., Engineering Mathematics (for semester III), Tata McGraw- Hill, New Delhi, 2010
- C. L. Liu, Elements of Discrete Mathematics, 2nd Ed., Tata McGraw-Hill, 2000.