4.1 Interpolation and extrapolation meaning methods – binominal newton’s language | unit 4 interpolation and extrapolation

No.of employees

UNIT 4

Interpolation and Extrapolation

4.1 Interpolation and extrapolation: meaning, methods – binominal, newton’s language

Meaning

Interpolation is the process of finding a value between two points on a line or curve. To help us remember what it means, we should think of the first part of the word, 'inter,' as meaning 'enter,' which reminds us to look 'inside' the data we originally had. This tool, interpolation, is not only useful in statistics, but is also useful in science, business or any time there is a need to predict values that fall within two existing data points.

Extrapolation is defined as an estimation of a value based on extending the known series or factors beyond the area that is certainly known. In other words, extrapolation is a method in which the data values are considered as points such as x1, x2, ….., xn. It commonly exists in statistical data very often, if that data is sampled periodically and it approximates the next data point. One such example is when you are driving, you usually extrapolate about the road conditions beyond your sight.

Extrapolation is a statistical method beamed at understanding the unknown data from the known data. It tries to predict future data based on historical data. For example, estimating the size of a population a few years in the future based on the current population size and its rate of growth.

Methods

Binominal –

This method is simplest of all the algebraic methods, which can be used for both interpolation and extrapolation. This method is applicable under following conditions

The values should be in arithmetic progression, These should be a common difference between the values of the in depend variable.

The value of ‘x’ for which the value of ‘y’ is to be interpolated must be one of the values of x.

Example 1

Write down the formula of Binomial expansion method for 4 values of y
Answer:
If known values are 4
(y – 1)4 = y4 – 4y3 + 6y2 – 4y1 + yo = 0

Example 2

Expand (y – 1)5
Answer:.
(y – 1)5 = y5 – 5y4 + 10y2 – 10y2 + 5y1 – y0= 0

Example 3

From the following data estimate the production for the year 2001 and 2005 by binominal expansion method

Year	1995	1997	1999	2001	2003	2005
Production(000’s tons)	20	40	70	?	130	?

Solution

Let X and Y be year and production.

Year	1995	1997	1999	2001	2003	2005
Production(000’s tons)	20	40	70		130 (

Since the known values of Y are four, the fourth order leading differences will be zero.

i.e. or

The second equation can be obtained by increasing the suffixes of each term of y by one and keeping the coefficients same. We get

From equation (i)

On simplification,

i.e., Production for the year 2001 is 102.5 (000’ tons)

From (ii) Here

The production for the year 2005 is 145(000’ tons)

Example 4

From the following data estimate probable life expectancy of an average Indian for the years 1980 and 2010

Birth year	1950	1960	1970	1980	1990	2000	2010
Life expectancy (in year)	68.2	69.7	70.8	?	75.4	77	?

Solution

Let X and Y be birth year and life expectancy.

Birth year	1950	1960	1970	1980	1990	2000	2010
Life expectancy (in year)	68.2	69.7	70.8		75.4	77

Since the know n values ae 5, the estimation is based on the expansion of

We have to determine the value of

On simplification,

Hence the probable life expectancy for the year 1980 is 72.77 years.

Now expand with change of subscript and keeping coefficients as it is.

Example 5

Working class cost of living indices of a certain place for some years are given below. Interpolate the missing index number for 1995 and 1999

Year	1993	1994	1995	1996	1997	1998	1999
Index No.	320	300	?	280	278	250	?

Solution

Let X and Y be year and index number.

Year	1993	1994	1995	1996	1997	1998	1999
Index No.	320	300		280	278	250

Since the known values are 5, the fifth leading differences will be zero, i.e,,

And the second equation can be obtained by, increasing the suffixes of each term of y by one, keeping the coefficients same:

)

We have to determine the value of from equation (i)

On simplification

.Hence the missing Index number for 1995 is 284. From(ii)

Here

.Hence the Index number for 1990 is 150.

Newton’s language

This method is applicable in those cases where the independent variable X increases by equal intervals. But, like binominal expansion method it is necessary that the value Y which is to be interpolated corresponds to one of the given values of X.

Formula

If the ‘x’ series is in descending order. Convert them in an ascending order and then apply Newton formula.

The formula to find the value of’x’ in finding the missing value of’y’ using Newton’s method of interpolation

For a time series data,

Example 1

Use Newton’s method to find the number of employees whose wages ₹ 600 per day

Wages	300	500	700	900	1100
No. of Employees	36	31	24	22	18

Solution

27 employees earning 600 per day.

Example 2

The following table shows the expectation of life at different ages. Find the expectation of life at age 26.

Age	15	20	25	30	35
Expectation of life	30	29	27	22	20

Solution

Age	Expectation life
15
20
25
30
35

The expectation of life at 26 years is 26 years.

Example 3

Find the number of persons below the age of 70 years from the following data

Age in years	0-20	20-40	40-60	60-80	80-100
No of persons	333	160	135	67	65

Solution

There are 668 people below age of 70 years.

Example 4

Following is data regarding annual net life insurance premium. Using Newtons method estimate the premium at the age of 26 years.

Age(years)	20	25	30	35
Annual net premium (in Rs)	1426	1581	1771	1996

Solution

Let x and y be the age and annual net premium.
The number of known values of ‘y’ n = 4, so prepare leading differences up to Δ3

		Leading differences

Here

The Newton’s equation of interpolation is

1616.2 is premium at the age of 26 years.

Example 5

From the following data estimate the number of persons earning wages below Rs. 90 per day.

Wages per day	Below 40	40-60	60-80	80-100	100-120
No. of persons	500	240	200	140	100

Solution

Let x and y be the wages per day and no. of persons. Here class intervals are converted into less than / below type, because ‘y’ value to be interpolate is below 90 then prepare the leading difference table upto Δ4, since there are n = 5 known values of y.

Wages below x		Leading differences
Wages below x
	500


			=

Here

Key takeaways –

Interpolation is the technique of estimating the value of the dependent variable (Y) for any intermediate value of the independent variable (X).

extrapolation is the technique of estimating the value of dependent variable (Y) for any value of the independent variable (X) which is outside the range of the given series.

4.2 Analysis of time series

Introduction:

A time series is set of data collected at successive point in a time or over successive period of time.A time series is a collection of observations made sequentially through time.The interval between observations can be any time interval (hours within days, weeks, months, years, etc.).

Some examples of time series are:

Malaria incidence or deaths over calendar years.

Daily maximum temperatures.

Hourly records of babies born at a maternity hospital.

Monthly unemployment.

Weekly measures of money supply.

Daily closing prices of stock indices, and so on

a) An analysis of a single sequence of data is called univariate time-series

analysis.

b) An analysis of several sets of data for the same sequence of time periods is called multivariate time-series analysis or, more simply, multiple time-series analysis.

Utilities of time series:

1. It helps in understanding past behaviour and is useful for prediction of future.

2. It facilitates comparison.

3. The various components of time series are useful to study the effective change under each component.

4. The reasons for variation can be studied by comparing actual with expected results.

COMPONENT OF TIME SERIES:

Fluctuation in a time series is mainly due to four basic components.

1 Secular trend or trend (T).

2 Seasonal variation (S).

3 Cyclical variation or cyclic fluctuation (C).

4 Irregular or random moments (I).

Secular trend or trend (T):

Trend is the phenomenon of long term changed in a recorded data series, generally, in the same direction throughout the span of the series.

A sequence plot of time series (the time series value plotted vertically with respect to time itself on the horizontal axis) will usually reveal the presence of trend as a gentle upward or downward “drift” of the data path.

Upward sloping trend paths in a real- value time series may be indicative of growth phenomenon, a downward sloping path suggest contraction.

3. In a money-value time series an upward sloping path may represent some combination of real growth and inflation; a downward sloping trend path might indicate contraction with deflation.

4. Trend is usually the result of long-term factors such as changes in the population, demographics, technology, or consumer preferences.

Seasonal Variation:

This is the pattern of variation within time series which repeat itself year to year.

Seasonality may be associated with agricultural functions, seasonal weather pattern, custom and convention, or religious or secular holidays.

It is important to remember that a seasonable pattern in one time series may or may not resemble that in another time series.

Fans and air-conditioned sales are high in the summer month, agricultural sales are high at harvest time, RAIN CAOTS, UMBERELLA SALES HIGH IN MONSOON.

Cyclic Components:

Any regular pattern of sequences of values above and below the trend line lasting more than one year can be attributed to the cyclical component. Usually, this component is due to multiyear cyclical movements in the economy.

Cyclic variations are recurrent upward or downward movements in a time series but the period of cycle is greater than a year. Also, these variations are not regular as seasonal variation.

A business cycle showing these oscillatory movements has to pass through four phases-prosperity, recession, depression and recovery. In business, these four phases are completed by passing one to another in this order.

Irregular Variation-

Irregular variations are fluctuations in time series that are short in duration, erratic in nature and follow no regularity in the occurrence pattern. These variations are also referred to as residual variations since by after trend, cyclical and seasonal variations. Irregular fluctuations result due to the occurrence of unforeseen events like: FLOODs, EARTHQUAKES, WARS, and FAMINES etc.

Key takeaways –

A time series is set of data collected at successive point in a time or over successive period of time.

4.3 Decomposition of time series – moving average method and method of least square. Business forecasting

A moving average is a technique to get an overall idea of the trends in a data set; it is an average of any subset of numbers. The moving average is extremely useful for forecasting long-term trends. You can calculate it for any period of time. For example, if you have sales data for a twenty-year period, you can calculate a five-year moving average, a four-year moving average, a three-year moving average and so on. Stock market analysts will often use a 50 or 200 day moving average to help them see trends in the stock market and (hopefully) forecast where the stocks are headed.

An average represents the “middling” value of a set of numbers. The moving average is exactly the same, but the average is calculated several times for several subsets of data. For example, if you want a two-year moving average for a data set from 2000, 2001, 2002 and 2003 you would find averages for the subsets 2000/2001, 2001/2002 and 2002/2003. Moving averages are usually plotted and are best visualized.

Calculate three-yearly moving averages of number of students studying in a higher secondary school in a particular village from the following data.

Year	1995	1996	1997	1998	1999	2000	2001	2002	2003	2004
Number of students	332	317	357	392	402	405	410	427	435	438

Solution:

Computation of three- yearly moving averages.

Year	Number of students	3-yearly moving total	3-yearly moving averages
1995	332	---	---
1996	317	1006	335.33
1997	357	1066	355.33
1998	392	1151	383.67
1999	402	1199	399.68
2000	405	1217	405.67
2001	410	1242	414.00
2002	427	1272	424.00
2003	435	1300	433.33
2004	438	---	---

Example 9.5

Calculate four-yearly moving averages of number of students studying in a higher secondary school in a particular city from the following data.

Year	2001	2002	2003	2004	2005	2006	2007	2008
Sales	124	120	135	140	145	158	162	170

Solution:

Computation of four- yearly moving averages.

Year	Sales	4-yearly centered moving total	4-yearly moving average	4-yearly centered moving average
2001	124	---	---	---

2002	120	---	--	---
		519	129.75
2003	135	--		132.37
		540	135.00
2004	140	--		139.75
		578	144.5
2005	145	--		147.87
		605	151.25
2006	162	--		162.50
		635	166.25
2007	162	--		162.50
		665	166.25
2008	170	--	--	--

2009	175	--	--	-

Method of Least Square

The line of best fit is a line from which the sum of the deviations of various points is zero. This is the best method for obtaining the trend values. It gives a convenient basis for calculating the line of best fit for the time series. It is a mathematical method for measuring trend. Further the sum of the squares of these deviations would be least when compared with other fitting methods. So, this method is known as the Method of Least Squares and satisfies the following conditions:

(i) The sum of the deviations of the actual values of Y and Ŷ (estimated value of Y) is Zero. that is Σ(Y–Ŷ) = 0.

(ii) The sum of squares of the deviations of the actual values of Y and Ŷ (estimated value of Y) is least. that is Σ(Y–Ŷ)2 is least ;

Procedure:

(i) The straight line trend is represented by the equation Y = a + bX …(1)

where Y is the actual value, X is time, a, b are constants

(ii) The constants ‘a’ and ‘b’ are estimated by solving the following two normal

Equations ΣY = n a + b ΣX ...(2)

ΣXY = a ΣX + b ΣX2 ...(3)

Where ‘n’ = number of years given in the data.

(iii) By taking the mid-point of the time as the origin, we get ΣX = 0

(iv) When ΣX = 0 , the two normal equations reduces to

The constant ‘a’ gives the mean of Y and ‘b’ gives the rate of change (slope).

(v) By substituting the values of ‘a’ and ‘b’ in the trend equation (1), we get the Line of Best Fit.

Given below are the data relating to the production of sugarcane in a district.

Fit a straight-line trend by the method of least squares and tabulate the trend values.

Year	2000	2001	2002	2003	2004	2005	2006
Prod. Of Sugarcane	40	45	46	42	47	50	46

Solution:

Computation of trend values by the method of least squares (ODD Years).

Year(x)	Production of Sugarcane(Y)				Trend values
2000	40	-3	9	-120	42.04
2001	45	-2	4	-90	43.07
2002	46	-1	1	-46	44.11
2003	42	0	0	0	45.14
2004	47	1	1	47	46.18
2005	50	2	4	100	47.22
2006	46	3	9	138	48.25

Therefore, the required equation of the straight-line trend is given by

Y = a+bX;

Y = 45.143 + 1.036 (x-2003)

The trend values can be obtained by

When X = 2000, Yt = 45.143 + 1.036(2000–2003) = 42.035

When X = 2001, Yt = 45.143 + 1.036(2001–2003) = 43.071,

similarly, other values can be obtained.

2. Given below are the data relating to the sales of a product in a district.

Fit a straight-line trend by the method of least squares and tabulate the trend values.

Year	1995	1996	1997	1998	1999	2000	2001	2002
Sales	6.7	5.3	4.3	6.1	5.6	7.9	5.8	6.1

Solution:

Computation of trend values by the method of least squares.

In case of EVEN number of years, let us consider

Year	Sales (Y)			Trend value ()
1995	6.7	46.9	49	5.6166
1996	5.3	36.5	25	5.7190
1997	4.3	12.9	9	5.8214
1998	6.1	6.1	1	5.9238
1999	5.6	39.2	49	6.0261
2000	7.9	39.5	25	6.1285
2001	5.8	17.4	9	6.2309
2002	6.1	6.1	1	6.3333
	47.8	194.6	168

Therefore the required equation of the straight line trend is given by

When

Similarly other values can be obtained.

Business forecasting

Forecasting is a method or a technique for estimating future aspects of a business or the operation. It is a method for translating past data or experience into estimates of the future. It is a tool, which helps management in its attempts to cope with the uncertainty of the future. Forecasts are important for short-term and long-term decisions. Businesses may use forecast in several areas: technological forecast, economic forecast, demand forecast. There two broad categories of forecasting techniques: quantitative methods (objective approach) and qualitative methods (subjective approach). Quantitative forecasting methods are based on analysis of historical data and assume that past patterns in data can be used to forecast future data points. Qualitative forecasting techniques employ the judgment of experts in specified field to generate forecasts. They are based on educated guesses or opinions of experts in that area. There are two types of quantitative methods: Times-series method and explanatory methods.

Time-series methods make forecasts based solely on historical patterns in the data. Time-series methods use time as independent variable to produce demand. In a time series, measurements are taken at successive points or over successive periods. The measurements may be taken every hour, day, week, month, or year, or at any other regular (or irregular) interval. A first step in using time-series approach is to gather historical data. The historical data is representative of the conditions expected in the future. Time-series models are adequate forecasting tools if demand has shown a consistent pattern in the past that is expected to recur in the future. For example, new homebuilders in US may see variation in sales from month to month. But analysis of past years of data may reveal that sales of new homes are increased gradually over period of time. In this case trend is increase in new home sales.

Key Takeaways:

A time series is a collection of observations made sequentially through time.

The interval between observations can be any time interval (hours within days, weeks, months, years, etc).

There are four components of time series Secular, Seasonal, Cyclical and Irregular.

Reference-

B. N. Gupta: Business Math & Statistics.

S. P. Singh: Statistics.

Mukund Lal: Statistics.

K. N. Nayar: Statistics.

C. B. Gupta: Statistics.

Shukla & Sahay: Statistical Analysis.

C. D. Gupta: Statistical Analysis.

D. N. Elhana: Statistical Analysis.

Sign Up

Index

Notes

Highlighted

Underlined

Browse by Topics

Notes

Highlighted

Underlined