Unit – 5 | unit 5 applied statistics

Back to Study material

Mathematics-III

Unit – 5

Applied Statistics

5.1 Curve fitting by the method of least squares- fitting of straight lines

Method of Least Squares:

Let … (1)

Be the straight line to be fitted to the given data points .

Let be the theoretical value for .

Then

For S to be minimum

[To generalise , is written as y]

On Simplification equation (2) and (3) becomes

The equations (3) and (4) are known as Normal equations.

On solving equations (3) and (4), we get the values of a and b.

(b) To fit the parabola:

The normal equations are

On solving three equations, we get the values of a, b and c.

Note:

1. The normal equation (4) has been obtained by puttingon both sides of

Equation (1). Equation (5) is obtained by multiplying on both sides of (1).

2. The normal equation (7), (8), (9) are obtained by multiply by and on both sides of equation (6).

Example: Find the best values of a and b so that fits the data given in the table.

X	0	1	2	3	4
Y	1	2.9	4.8	6.7	8.6

Solution:


0	1	0	0
1	2.9	2.9	1
2	4.8	9.6	4
3	6.7	20.1	9
4	8.6	13.4	16

Normal equations …. (2)

…. (3)

On putting the values of in (2) and (3), we have

…. (4)

…. (5)

On solving (4) and (5), we get

On Substituting the values of a and b in (1), we get

Example: By the method of least squats, find the straight line that best fits the following data:

Solution: Let the equation of the straight line best fit be …. (1)


1	14	14	1
2	27	54	4
3	40	120	9
4	55	220	16
5	68	340	25

Normal equations are

On Putting the values of in (2) and (3), we have

On solving (4) and (5), we get

On Substituting the values of a and b in (1), we get

Example: Find the least squares approximation of second degree for the discrete data.

-2

-1

Solution:

Let the equation of second degree polynomial be


-2	15	-30	4	60	-8	16
-1	1	-1	1	1	-1	1
0	1	0	0	0	0	0
1	3	3	1	3	1	1
2	19	38	4	76	8	16

Normal equations are

On putting the values of in equations (2), (3), (4), we have

On solving (5), (6), (7), we get

The required polynomial of second degree is

5.2 Second degree parabolas and more general curves

Change of Scale

If the data is of equal interval in large numbers then we change the scale as .

Example: Fit a second degree parabola to the following data by least squares method.

1929

1930

1931

1932

1933

1934

1935

1936

1937

352

356

357

358

360

361

360

359

Solution: Taking

Taking

The equation is transformed to


1929	-4	352	-5	20	16	-80	-64	256
1930	-3	360	-1	3	9	-9	-27	81
1931	-2	357	0	0	4	0	-8	16
1932	-1	358	1	-1	1	1	-1	1
1933	0	360	3	0	0	0	0	0
1934	1	361	4	4	1	4	1	1
1935	2	361	4	8	4	16	8	16
1936	3	360	3	9	9	27	27	81
1937	4	359	2	8	16	32	64	256
Total

Normal equations are

On solving these equations, we get

Example: Fit a second-degree parabola to the following data:

x	0	1	2	3	4
y	1	1.8	1.3	2.5	6.3

Solution: Let and so that the parabola of fit becomes

…. (i)

The normal equations are

Saving these as simultaneous equations we get

(i) becomes

Hence

Example: Fit a second-degree parabola to the following data:

1.5

2.5

3.5

1.3

1.6

2.7

3.4

4.1

Solution: We shift the origin to (2.5, 0) and take 0.5 as the new unit. This amounts to changing the variable to X, by the relation

Let the parabola of fit be . The values of etc., ae calculated below:


1.0	-3	1.1	-3.3	9	9.9	-27	81
1.5	-2	1.3	-2.6	4	5.2	-8	16
2.0	-1	1.6	-1.6	1	1.6	-1	1
2.5	0	2	0	0	0	0	0
3.0	1	2.7	2.7	1	2.7	1	1
3.5	2	3.4	6.8	4	13.6	8	16
4.0	3	4.1	12.3	9	36.9	27	81
Total	0	16.2	14.2	28	69.9	0	196

The normal equations are

Solving these as simultaneous equations, we get

Replacing X by in the above equation, we get

Which simplifies by . This is the required parabola of best fit.

5.3 Test of significance : Large sample test for single proportion

Test of Significance

The tests which enables us to decide whether to accept of to reject the null hypothesis is called the tests of significance. If the difference between the sample values and the population values are so large (lies in critical area). It is to be rejected.

Test of Significance of Large Samples (N>30)

Normal distribution is the limiting case of Binomial distribution when n is large enough. For normal distribution 5% of the items lie outside while only 1% of the items lie outside.

Where z is the standard normal variate and x is the observed number of successes. First, we find the value of z. Test of significance depends upon the value of z.

(i) (a) If , difference between the observed and expected number of successes significant at the 5% level of significance.

(b) If, difference is significant at 5% level of significance.

(ii) (a) If, difference between the observed and expected number of successes significant at 1% level of significance.

(b) If , difference is significant at 1% level of significance.

Example: A cubical die was thrown 9,000 times and 1 or 6 was obtained 3120 times. Can the deviation from expected value lie due to fluctuations of sampling?

Solution: Let us consider the hypothesis that the die is an unbiased one and hence the probability of obtaining 1 or

The expected value of the number of successes

Also

Actual number of successes = 3120

Difference between the actual number of successes and expected number of successes =3120-3000=120 which is <3

Hence the hypothesis is correct and the deviation is due to fluctuations of sampling due random causes.

Example: A coin was tossed 400 times and the head turned up 216 times. Test the hypothesis the coin unbiased at 5% level of significance.

Solution: Suppose the coin is unbiased.

Then the probability of getting the head in a toss

expected number of successes

Thus the excess of observed value over expected value = 216 – 200 = 16

Also S.D. Of simple sampling

Hence

As, the hypothesis is accepted at 5% level of significance i.e., we conclude that the coin is unbiased at 5% level of significance.

Example: A die was thrown 9000 times and a throw of 5 or 6 was obtained 3240 times. On the assumption of random throwing, do the data indicate an unbiased die?

Solution: Suppose the die is unbiased.

Then the probability of throwing 5 or 6 with one die

The expected number of successes

And the observed value of successes = 3240

Thus the excess of observed value over expected value 3240-3000=240

Also S.D. Of simple sampling

Hence nearly.

As , the hypothesis has to be rejected at1% level of significance and we conclude that the die is biased.

5.4 Difference of proportions

Comparison of Large Samples

Two large samples of sizes are taken from two populations giving proportions of attributes A’s as respectively.

If be the standard errors in the two samples then

and

If e be the standard error of the differences between and, then

If , the difference between and is real one.

If , the difference may be due to fluctuations of simple sampling.

But if z lies between 2 and 3, then the difference is significant at 5% level of significance.

(b) If the proportions of A’s are not the same in the two populations from which the samples are drawn put and are the true values of proportions then S.E. e of the difference is given by

If , the difference could have arisen due to fluctuations of simple sampling.

Example: In a city A 20% of a random sample of 900 school boys had a certain slight physical defect. In another city B, 18.5% of random sample of 1600 school boys had the same defect. Is this difference between the proportions significant?

Solutions: We have

And

Thus

Giving nearly

Also

As , the difference between the proportions is not significant.

Example: In two large populations there are 30% and 25% respectively of fair haired people. Is this difference likely to be hidden in samples of 1200 and 900 respectively from the two populations?

Solutions:

Here so that .

So that

Hence it is unlikely that the real difference will be hidden.

Example: One type of aircraft is found to develop engine trouble in 5 flights out of a total of 100 and another type in 7 flights out of a total of 200 flights. Is there a significant difference in the two types of aircrafts so far as engine defects are concerned.

Solution: flights, Number of troubled flights

flights, Number of troubled flights

, Difference is not significant.

Example: In a sample of 600 men from a certain city, 450 are found smokers. In another sample of 900 men from another city, 450 are smokers. Do the data indicate that the cities are significantly different with respect to the habit of smoking among men.

Solution: men. Number of smokers

men. Number of smokers

so that the difference is significant.

5.3 Test for Sample mean

Significance Test of a sample mean

Given a random small sample from a normal population, we have to test the hypothesis that mean of the population is . For this, We first calculate where .

Then the find the value of P for the given df from the table.

If the calculated value of, the difference between and is said to be significant at 5% level if significance.

If , the difference is said to be significant at 1% level of significance.

If the data is said to be consistent with the hypothesis that is the mean of population.

Example: A certain stimulus administered to each 12 patients resulted in the following increases of blood pressure: 5, 2. 8, -1, 3, 0, 4, 6. Can it be concluded that the stimulus will in general be accompanied by an increase in blood pressure.

Solution:

Let us assume that the stimulus administered to all the 12 patients will increase the B.P. Taking the population to be normal with mean and S.D.,

Now

Here

For , from table IV.

Since the , our assumption is rejected i.e., the stimulus does not increase the B.P.

Example: The nine items of a sample have the following values: 45, 47, 50, 52, 48, 47, 49, 53, 51. Does the mean of these differ significantly from the assumed mean of 47.5?

Solution: We find the mean and standard deviation of the sample as follows:


45	-3	9
47	-1	1
50	2	4
52	2	4
48	0	0
47	-1	1
49	1	1
53	5	25
51	3	9
Total	10	66

mean

Hence

Here

For , we get table IV,

As calculated value of , the value of t is not significant at 5% level of significance which implies that there is no significant difference between and . Thus the test provides no evidence against the provides no evidence against the population mean being 47.5.

Example: A machinist is making engine parts with axle diameter of 0.7 inch. A random sample of 10 parts shows mean diameter 0.742 inch with

Degree of freedom

For we get from table IV, .

As the calculated value of , the value of t is significant at 5% level of significant at 5% level of significance. This implies that differs significantly fromand the hypothesis is rejected. Hence the work is inferior. In fact, the work is inferior even at 2% level of significance.

5.6 Difference of means and difference of standard deviations

Significance Test of difference between sample means:

Given two independent samples and with means and standard deviations and from a normal population with the same variance, we have to test the hypothesis that the population means andare the same.

For this, we calculate …. (1)

Where

And

It can be shows that the variate t defined by (1) follows the t – distribution with degrees of freedom.

If the calculated value of the difference between the sample means is said to be significant at 5% level of significance.

If , the difference is said to be significant at 1% level of significance.

If , the data is said to be consistent with the hypothesis, that .

Cor: If the two samples are of the same size and the data are paired, then t is defined by

where

difference of the ith members of the samples;

mean of the difference ; and the number

Example: Eleven students were given a test in statics. They were given a month’s further tuition and a second test of equal difficulty was held at the end of it. Do the marks give evidence that the students have benefitted by extra coaching?

Boys	1	2	3	4	5	6	7	8	9	10	11
Marks I test	23	20	19	21	18	20	18	17	23	16	19
Marks II test	24	19	22	18	20	22	20	20	23	20	17

Solution: We compute the means and the S.D. Of the difference between the marks of the two tests as under:

mean of d’s

Assuming that the students have not been benefited by extra coaching, it implies that the mean of the difference between the marks of the two tests is zero i.e., .

Then nearly and

Students
1	23	24	1	0	0
2	20	19	-1	-2	4
3	19	22	3	2	4
4	21	18	-3	-4	16
5	18	20	2	1	1
6	20	22	2	1	1
7	18	20	2	1	1
8	17	20	3	2	4
9	23	23	-	-1	1
10	16	20	4	3	9
11	19	17	-2	-3	9

From table IV, we find that (for ). As the calculated value of , the value of t is not significant at 5% level of significance i.e., the test provides no evidence that the students have benefited by extra coaching.

Example:

From a random sample of 10 pigs fed on diet A, the increases in weight in a certain period were 10, 6, 16, 17, 13, 12,8, 14,15, 9 lbs. For another random sample of 12 pigs fed on diet B, the increases in the same period were 7,13, 22, 15, 12, 14, 18, 8, 21, 23, 10, 17 lbs. Test whether diets A and B differ significantly as regards their effect on increases in weight?

Solution: We calculate the means and standard deviations of the sample as follows:

	Diet A			Diet B

10	-2	4	7	-8	64
6	-6	36	13	-2	4
16	4	16	22	7	49
17	5	25	15	0	0
13	1	1	12	-3	9
12	0	0	14	-1	1
8	-4	16	18	3	9
14	2	4	8	-7	49
15	3	9	21	6	36
9	-3	9	23	8	64
			10	-5	25
			17	2	4
120	0	120	180	0	314

Assuming that the samples that the samples do not differ in weight so far as the two diets are connected i.e.,

Hence nearly

Here

For we find [From table IV]

the calculated value of .

Hence the difference between the sample means is not significant i.e., the two diets do not differ significantly as regards their effect on increase in weight.

Reference

Erwin Kreyszig, Advanced Engineering Mathematics, 9th Edition, John Wiley and amp; Sons, 2006.
N.P. Bali and Manish Goyal, A text book of Engineering Mathematics, Laxmi Publications, Reprint, 2010.
Veerarajan T., Engineering Mathematics (for semester III), Tata McGraw- Hill, New Delhi, 2010
C. L. Liu, Elements of Discrete Mathematics, 2nd Ed., Tata McGraw-Hill, 2000.

Sign Up

Index

Notes

Highlighted

Underlined

Browse by Topics

Notes

Highlighted

Underlined