4.1 Frequency distribution Raw data | unit 4 measures of central tendency and measures of dispersion

BMS 1

Unit 4

Measures of Central Tendency and Measures of Dispersion

After data collection, you may face the problem of arranging them into a format from which you will be able to draw some conclusions. The arrangement of these data in different groups on the basis of some similarities is known as classification. According to Tuttle, “ A classification is a scheme for breaking a category into a set of parts, called classes, according to some precisely defined differing characteristics possessed by all the elements of the category” Thus classification, is the process of grouping data into sequences according to their common characteristics, which separate them into different but related parts.

Such classification facilitates analysis of the data and consequently prepares a foundation for absolute interpretation of the obtained scores. The prime objective of the classification of data is concerned with reducing complexities with raw scores by grouping them into some classes. This will provides a comprehensive insight into the data.

The classification procedure in statistics enables the investigators to manage the raw scores in such a way that they can proceed with ease in a systematic and scientific manner.

Key takeaways –

Raw data are classified in such a way its easy to understand through statistical methods

4.2 Attributes and variables

Both variable and attribute data measure the state of an object or a process, but the kind of information that each describes differs. Variable data involve numbers measured on a continuous scale, while attribute data involve characteristics or other information that you we quantify. Each has its own benefits over the others. Attributes are closely related to variables.

Example – age is an attribute that can be operationalised in many ways. It can represent two values old and young. In this case the attribute age is operationalised as a binary variable. If more than two values are possible and they can be ordered, the attribute is represented by ordinal variable such as young, middle and old.

Benefits

Variable data provide detailed and concrete information about a product.
Attribute data are often more helpful when qualitative information is needed.

Key takeaways –

Variable means the measured values can vary anywhere along a given scale. Attribute data, on the other hand, is qualitative data that have a quality characteristic or attribute that is described in terms of measurements.

4.3 Classification of data frequency distribution

Classification of data is the process of arranging the data into homogenous groups according to their common characteristics. Raw data cannot be easily understood and not fit for analysis and interpretation. Therefore, arrangement of data helps the user in comparison and analysis.

Example- population of a state can be grouped according to sex, age, etc

Definition

“Classification is the process of arranging data into sequences according to their common characteristics or separating them into different related parts.” - Prof. Secrist

Objectives of data classification

To consolidate the huge data in such a way that similarities and differences are easily understood.
It helps in comparison and analysis of data
Classification of data ensures prominent data are collected and optional data are separated
To allow a statistical method of the material gathered.
To study relationships

Types of classification

Geographical classification – when the data classified according to the geographical location or regions (like states, cities, regions, zones, areas, etc). It is called geographical classification. It is also known as a real or spatial classification.

Ex- production of food grains are classified in different states in India

S.No	Name of states	Total food grains (000’ tones)
1	Andhra Pradesh	1093.00
2	Bihar	12899.09
3	Karnataka	1834.70
4	Punjab	41289.00
5	Orissa	3600

2. Chronological classification – classification of data on the basis of time (like months, years, etc) of their occurrence are called chronological classification. This type of classification is suitable for data which takes place in course of time such as population, production, sales, etc.

Ex – profit of a company from 2001 to 2005

S.No	Year	Profits (in 000 Rs)
1	2001	77
2	2002	88
3	2003	89
4	2004	94
5	2005	99

3. Qualitative classification – under this classification, the data are classified on the basis of some attributes or quality such as sex, color, literacy, honesty, intelligence, religion, etc. In this the attributes cannot be measured. This sort of classification is known as descriptive classification.

For example, Population can be divided on the basis of marital status as married or unmarried etc.

4. Quantitative classification – quantitative classification states that classification of data according to some characteristics that can be measured such as height, weight, income, sales, profit, etc.

Ex – students are classified according to weights

S.No	Weight	No. of students
1	30-40	77
2	40-50	60
3	50-60	50
4	60 – 70	20

5. Alphabetical classification – when data are arranged according to alphabetical order is called alphabetical classification

Ex – state wise classification of population in alphabetical order

S.No	Name of states	Population
1	Andhra Pradesh	157
2	Bihar	150
3	Karnataka	200
4	Punjab	700
5	Orissa	450

Tabulation is a systematic & logical presentation of numeric data in rows and columns, to facilitate comparison and statistical analysis. The method of placing organized data in tabular form is known as tabulation. Tabulation simplifies complex data and facilitates comparison.

Definition

“Table involves the orderly and systematic presentation of numerical data in a form designed to elucidate the problem under consideration” – According prof. L.R Connor

“Table in its broadest sense is an orderly arrangement of data in column and rows” – According to Prof M.M.Blaire

Objectives of tabulation

It simplifies the raw data in meaningful form so that common man can easily understand in less time
It brings essential facts in clear and precise manner
Data presented in rows and columns helps in detailed comparison
Tables serve as the best source of organized data for further statistical analysis
Table saves the space without sacrificing the quality and quantity of data.

Parts of table

Table number	A serial number should be allotted to a table
Title of the table	Every table should be given a title. The title is the description of the content of table.
Caption	Caption refers to the column headings. Caption should be clearly defined and placed at the top of the column.
Stub	Stub refers to row headings. Stubs heading perform for the horizontal rows of the numbers in the table
Body	Body contains the information. Data presented in the body should be arranged according to the classification of stubs and caption.
Head note	Something that is not explained in the title, caption, and stubs can be explained in the head notes on the top of the table below the title. It is generally given in brackets
Source note	It is a brief statement indicating the source of data presented in the table.
Footnote	Some exception in the data can be given in the foot notes. Footnotes are placed below the body of the table.

Types of tabulation

Kinds of Tables
According to
Purpose
According to
Originality
According to
Construction
General
Purpose
Table
Sp...

According to purpose
1. General purpose table – general purpose table is a table which is of general use. It does not serve any specific purpose under consideration
2. Special purpose table – special purpose table is prepared with some specific purpose in mind.

2. According to originality

Original table – an original table is that table in which data are presented in the same manner in which they are collected.
Derived table – a derived table is that in which data is not presented in same manner in which they are collected. Here the data are first converted into ratio or percentage and then presented.

3. According to construction

Simple table – simple table also known as one-way table. Under this data are presented based on one characteristic

Faculty wise library user

b. Complex tables – in complex table data are presented according to two or more characteristics simultaneously.

The complex tables are

two way,
three-way table and
manifold table

Two-way table – Under this the variable under study is divided into two characteristics

Three-way table - Under this the variable under study is divided into three characteristics

Manifold table - Under this the variable under study is divided into large number of characteristics.

Key Takeaways:

Table in its broadest sense is an orderly arrangement of data in column and rows.
It simplifies the raw data in meaningful form so that common man can easily understand in less time.

4.4 Cumulative frequency distribution, Histogram and ogive curves.

Cumulative frequency is used to determine the number of observations that lie above (or below) a particular value in a data set. The cumulative frequency is calculated using a frequency distribution table, which can be constructed from stem and leaf plots or directly from the data.

The cumulative frequency is calculated by adding each frequency from a frequency distribution table to the sum of its predecessors. The last value will always be equal to the total for all observations, since all frequencies will already have been added to the previous total.

Histogram – histogram is a bar graph representing the frequency of occurrence by classes of data. In histogram data are plotted as a series of rectangle. ‘X axis’ consist of class intervals and ‘Y axis’ shows the frequencies. It is also called stair case or block diagram. Histogram is not suitable for open ended classes.

Ogive – An ogive graph shows cumulative frequency in statistics. It estimates the number of observations less than a given value or more than a given value. Cumulative frequency is obtained by adding to the given value

'Less than and equal to ogive: This consists of plotting the 'less than ' cumulative frequencies against the upper class boundaries of the respective classes. The points so obtained are joined by a smooth free hand curve to give 'less than ' ogive . Obviously, 'less than ' ogive is an increasing curve , sloping upwards from left to right .

‘More than and equal to’ ogive: This consists of plotting the ' more than ' cum. frequencies against the lower class boundaries of the respective classes . The points so obtained are joined by a smooth free hand curve to give ' equal give . Obviously, 'more than ' ogive is a decreasing curve , slopes downwards from left to right .

Key takeaways –

The cumulative frequency is calculated by adding each frequency from a frequency distribution table to the sum of its predecessors.
An ogive graph shows cumulative frequency in statistics

4.5 Requisites of ideal measures of central tendency

A measure of central tendency is a statistical summary that represents the center point of the dataset. It indicates where most values in a distribution fall. It is also called as measure of central location.

The three most common measure of central tendency are Mean, Median, and Mode.

Definition

According to Prof Bowley “Measures of central tendency (averages) are statistical constants which enable us to comprehend in a single effort the significance of the whole.”

Requisites of ideal measures of central tendency are as follows

It should be rigidly defined
It should be simple to understand and easy to calculate
It should be based upon all values of given data.
It should be capable of further mathematical treatment.
It should have sampling stability.
It should be not be unduly affected by extreme values.

4.6 Arithmetic Mean, Median and Mode for ungrouped and grouped data

Arithmetic mean

The mean is the arithmetic average, also called as arithmetic mean.
Mean is very simple to calculate and is most commonly used measure of the center of data.
Means is calculated by adding up all the values and divided by the number of observations.

Merits of Mean

1)Arithmetic mean rigidly defined by Algebraic Formula.

2) It is easy to calculate and simple to understand.

3) It is based on all observations of the given data.

4) It is capable of being treated mathematically hence it is widely used in statistical analysis.

5) Arithmetic mean can be computed even if the derailed distribution is not known but some of the observation and number of the observation are known.

6) It is least affected by the fluctuation of sampling.

7) For every kind of data mean can be calculated.

Demerits of Arithmetic mean:

1) It can neither be determined by inspection or by graphical location.

2) Arithmetic mean cannot be computed for qualitative data like data on intelligence honesty and smoking habit etc.

3) It is too much affected by extreme observations and hence it is not adequately represent data consisting of some extreme point.

4) Arithmetic mean cannot be computed when class intervals have open ends.

5) If any one of the data is missing then mean cannot be calculated.

Computation of sample mean -

If X1, X2, ………………Xn are data values then arithmetic mean is given by

Computation of the mean for ungrouped data

Example 1 – The marks obtained in 10 class test are 25, 10, 15, 30, 35

The mean = X = 25+10+15+30+35 = 115 =23

5 5

Analysis – The average performance of 5 students is 23. The implication is that students who got below 23 did not perform well. The students who got above 23 performed well in exam.

Example 2 – Find the mean

Xi	9	10	11	12	13	14	15
Freq (Fi)	2	5	12	17	14	6	3

Xi	Freq (Fi)	XiFi
9	2	18
10	5	50
11	12	132
12	17	204
13	14	182
14	6	84
15	3	45
	Fi = 59	XiFi= 715

Then, N = ∑ fi = 59, and ∑fi Xi=715

X = 715/59 = 12.11

Mean for grouped data/ Weighted Arithmetic Mean

Grouped data are the data that are arranged in a frequency distribution

Frequency distribution is the arrangement of scores according to category of classes including the frequency.

Frequency is the number of observations falling in a category

The formula in solving the mean for grouped data is called midpoint method. The formula is

Where, X = Mean

Xm = midpoint of each class or category

f = frequency in each class or category

∑f Xm = summation of the product of fXm

Example 3 – the following data represent the income distribution of 100 families. Calculate mean income of 100 families?

Income	30-40	40-50	50-60	60-70	70-80	80-90	90-100
No. of families	8	12	25	22	16	11	6

Solution:

Income	No. of families	Xm (Mid point)	fXm
30-40	8	35	280
40-50	12	34	408
50-60	25	55	1375
60-70	22	65	1430
70-80	16	75	1200
80-90	11	85	935
90-100	6	95	570
	n = 100		∑f Xm = 6198

X = ∑f Xm/n = 6330/100 = 63.30

Mean = 63.30

Example 4 – Calculate the mean number of hours per week spent by each student in texting message.

Time per week	0 – 5	5 – 10	10 - 15	15 - 20	20 – 25	25 – 30
No. of students	8	11	15	12	9	5

Solution:

Time per week (X)	No. of students (F)	Midpoint X	XF
0 - 5	8	2.5	20
5 – 10	11	7.5	82.5
10 - 15	15	12.5	187.5
15 - 20	12	17.5	210
20 - 25	9	22.5	202.5
25 – 30	5	27.5	137.5
	60		840

Mean = 840/60 = 14

Example 5 –

The following table of grouped data represents the weights (in pounds) of all 100 babies born at a local hospital last year.

Weight (pounds)	Number of Babies
[3−5)	8
[5−7)	25
[7−9)	45
[9−11)	18
[11−13)	4

Solution:

Weight (pounds)	Number of Babies	Midpoint X	XF
[3−5)	8	4	32
[5−7)	25	6	150
[7−9)	45	8	360
[9−11)	18	10	180
[11−13)	4	12	48
	100		770

Mean = 770/100 = 7.7

Mode

The mode is denoted Mo, is the value which occurs most frequently in a set of values.

Croxton and Cowden defined it as “the mode of a distribution is the value at the point armed with the item tends to most heavily concentrated. It may be regarded as the most typical of a series of value”

Mode for ungrouped data

Example 1- Find the mode of scores of section A

Scores = 25, 24, 24, 20, 17, 18, 10, 18, 9, 7

Solution – Mode is 24, 18 as both have occurred twice.

Mode for grouped data

Mode = L1 + (L2 – L1) d1

d1 +d2

L1= lower limit of the modal class,

L2= upper limit of the modal class‟

d1 =fm-f0 and d2=fm-f1

Where fm= frequency of the modal class,

f0 = frequency of the class preceding to the modal class,

f1= frequency of the class succeeding to the modal class.

Example 2 – Find the mode

Seconds	Frequency
51 - 55	2
56 - 60	7
61 - 65	8
66 - 70	4

Solution

The group with the highest frequency is the modal group: - 61-65

D1 = 8-7 = 1

D2 = 8-4 = 4

Mode = L1 + (L2 – L1) d1

d1 +d2

mode = 61 + (65-61) 1 = 61+4 (1/5) = 61.8

1+4

Mode = 61.8

Example 3 - In a class of 30 students marks obtained by students in science out of 50 is tabulated below. Calculate the mode of the given data

Marks obtained	No. of students
10 -20	5
20 – 30	12
30 – 40	8
40 – 50	5

Solution:

The group with the highest frequency is the modal group: - 20 -30

D1 = 12 - 5 = 7

D2 = 12 - 8 = 4

Mode = L1 + (L2 – L1) d1

d1 +d2

mode = 20 + (30-20) 7 = 20+10 (7/11) = 26.36

7+4

Mode = 61.8

Example 4- Based on the group data below, find the mode

Time to travel to work	Frequency
1 – 10	8
11 -20	14
21 – 30	12
31 – 40	9
41 – 50	7

Solution:

The group with the highest frequency is the modal group: - 11 - 20

D1 = 14 - 8 = 6

D2 = 14 - 12 = 2

Mode = L1 + (L2 – L1) d1

d1 +d2

mode = 11 + (20-11) 6 = 11+9 (6/8) = 17.75

6+2

Example 5 –

Compute the mode from the following frequency distribution

CI	F
70-71	2
68-69	2
66-67	3
64-65	4
62-63	6
60-61	7
58-59	5

Solution:

The group with the highest frequency is the modal group: - 60 - 61

D1 = 7 - 6 = 1

D2 = 7 - 5 = 2

Mode = L1 + (L2 – L1) d1

d1 +d2

mode = 60 + (61-60) 1 = 60+1 (1/3) 60.85

1+2

Merits of mode

It is easy to understand & easy to calculate
It is not affected by extreme values or sampling fluctuations.
Even if extreme values are not known mode can be calculated.
It can be located just by inspection in many cases.
It is always present within the data.

Demerits of mode

It is not rigidly defined.
It is not based upon all values of the given data.
It is not capable of further mathematical treatment.

Median

The points or value that divides the data into two equal parts
Firstly , the data are arranged in ascending or descending order .
The median is the middle number depending on the data size.
When the data size is odd, the median is the middle value
When the data size is even, median is the average of the middle two values
It is also known as middle score or 50th percentile

For ungrouped data median is calculated by (n+1)th value

Example 1 – find the median score of 7 students in science class

Score = 19, 17, 16, 15, 12, 11, 10

Median = (7+1)/2 = 4th value

Median = 15

Find the median score of 8 students in science class

Score = 19, 17, 16, 15, 12, 11, 10, 9

Median = (8+1)/2 = 4.5th value

Median = (15+12)/2 = 13.5

Example 2 – find the median of the table given below

Marks obtained	No. of students
20	6
25	20
28	24
29	28
33	15
38	4
42	2
43	1

Solution:

Marks obtained	No. of students	cf
20	6	6
25	20	26 (20+6)
28	24	50 (26+24)
29	28	78
33	15	93
38	4	97
42	2	99
43	1	100

Median = (n+1)/2 = 100+1/2 = 50.5

Median = (28+29)/2 = 28.5

Median of grouped data

Formula

MC = median class is a category containing the n/2

Lb = lower boundary of the median class

Cfp = cumulative frequency before the median class if the scores are arranged from lowest to highest value

Fm = frequency of the median class

c.i = size of the class interval

Ex- calculate the median

Example 3-

Calculate the median

Marks	No. of students
0-4	2
5-9	8
10-14	14
15-19	17
20-24	9

Solution:

Marks	No. of students	CF
0-4	2	2
5-9	8	10
10-14	14	24
15-19	17	41
20-24	9	50
	50

n = 50

n = 50/2= 25

The category containing n/2 is 15 -19

Lb = 15

Cfp = 24

f = 17

ci = 4

Median = 15 + 25-24 *4 = 15.23

Example 4 - Given the below frequency table calculate median

X	60 – 70	70 – 80	80- 90	90-100
F	4	5	6	7

Solution:

X	F	CF
60 - 70	4	4
70 - 80	5	9
80 - 90	6	15
90 - 100	7	22

n = 22

n = 22/2= 11

The category containing n+1/2 is 80 - 90

Lb = 80

Cfp = 9

f = 6

ci = 10

Median = 80 + 11-9 *10 = 83.33

Example 5– calculate the median of grouped data

Class interval	1-3	3-5	5-7	7-9	9-11	11-13
Frequency	4	12	13	19	7	5

Solution:

CI	F	CF
1-3	4	4
3-5	12	16
5-7	13	29
7-9	19	48
9-11	7	55
11-13	5	60

n = 60

n = 60/2= 30

The category containing n+1/2 is 7-9

Lb = 7

Cfp = 29

f = 19

ci = 2

Median = 7 + 30-29 *2 = 7.105

Merits of median

It is rigidly defined
It is easy to understand and easy to calculate
It is not affected by extreme values
It is not much affected by sampling fluctuation
It can be located graphically

Demerits of median

It is not based upon all values of the given data
It is difficult to calculate increasing order data size
It is not capable of further mathematical treatment.

Key takeaways –

Means is calculated by adding up all the values and divided by the number of observations
The mode is the value that appears most frequently in a data set.
the median is the value separating the higher half from the lower half of a data sample

4.7 Combined mean, Merits and demerits of measures of central tendency

If there are two groups containing n1 and n2 observations with means x1 and x2 respectively, then the combined arithmetic mean of two groups is given by

The above formula can be generalized for more than two groups. If n1 ,n2 ,……,nk are sizes of k groups with means x1, x2 ……., xk respectively then the mean x of the combined group is given by

Example 1

If average salaries of two groups of employees are Rs . 1500 and Rs . 2200 and there are 80 and 70 employees in the two groups. Find the mean of the combined group.
Solution

Given : Group I Group II

n1 =80 n2 =70

x1= 1500 x2= 2200

= 80 (1500 ) + 70 (2200)/ 80 +70

= 120000+ 154000/ 150

= 274000/ 150

= 1826.67
The average monthly salary of the combined group of 150 employees is Rs . 1826 .67

Example 2

The mean weight of a group of 50 workers is 58 kgs. The second group consists of 60 workers with average weight 62 kgs. and there are 90 workers in the third group with average weight 56 kgs. Find the average weight of the combined group

Solution

Given Group I Group II Group III

No. of workers n1 =50 n2 =60 n3= 90

Mean weight x1 =58 x2 =62 x3=56

= 50 (58) + 60 (62) + 90 (56)/50 + 60 + 90

= 11660 = 58.3/ 200

∴ The average weight of the combined group of 200 workers is 58.3 kgs

Merits and demerits of measures of central tendency

Merits of Arithmetic Mean.

(1) It is rigidly defined .

(2) It is easy to understand and easy to calculate.

(3) It is based on each and every observation of the series.

(4) It is capable for further mathematical treatment.

(5) It is least affected by sampling fluctuations.

Demerits of Arithmetic Mean.

(1) It is very much affected by extreme observations.

(2) It cannot be used in case of open end classes.

(3) It cannot be determined by inspection nor it can be located graphically.

(4) It cannot be obtained if a single observation is missing .

(5) It is a value which may not be present in the data .

Merits of Median:

It is rigidly defined.
It is easy to understand and easy to calculate.
It is not affected by extreme observations as it is a positional average.
It can be calculated , even if the extreme values are not known .
It can be located by mere inspection and can also be located graphically.
It is the only average to be used while dealing with qualitative characteristics which cannot be measured numerically.

Demerits of Median:

It is not a good representative in many cases.
It is not based on all observations.
It is not capable of further mathematical treatment .
It is affected by sampling fluctuations.
For continuous data case , the formula is obtained on the assumption of uniform distribution of frequencies over the class intervals. This assumption may not be true.

Merits of mode

6. It is easy to understand & easy to calculate

7. It is not affected by extreme values or sampling fluctuations.

8. Even if extreme values are not known mode can be calculated.

9. It can be located just by inspection in many cases.

10. It is always present within the data.

Demerits of mode

4. It is not rigidly defined.

5. It is not based upon all values of the given data.

6. It is not capable of further mathematical treatment.

Key takeaways –

There is a tendency in almost every statistical data that most of the values concentrate at the centre which is referred as ‘central tendency’. The typical values which measure the central tendency are called measures of central tendency.

4.8 Geometric mean: Definition, Merits and demerits

Geometric mean

Geometric mean is a type of mean or average, which indicates the central tendency of a set of numbers by using the product of their values.

Definition

The Geometric Mean (G.M) of a series containing n observations is the nth root of the product of the values.

For ungrouped data

Geometric Mean, GM = Antilog ∑logxi

Merits of Geometric Mean

It is rigidly defined.
It is based on all the items.
It is capable of further algebraic treatment.
It gives less weight to large items and more to small items.

Demerits of Geometric Mean

It is difficult to compute.
It is not easy to understand.
If there are negative values in the series, it cannot be computed.

Example 1 – find the G.M of the values

X	Log X
45	1.653
60	1.778
48	1.681
65	1.813
Total	6.925

GM = Antilog ∑logxi

= Antilog 6.925/4

= Antilog 1.73

= 53.82

For grouped data

Geometric Mean, GM = Antilog ∑ f logxi

Example 2 – calculate the geometric mean

X	F
60 – 80	22
80 – 100	38
100 – 120	45
120 – 140	35

Solution

X	f	Mid X	Log X	f log X
60 – 80	22	70	1.845	40.59
80 – 100	38	90	1.954	74.25
100 – 120	45	110	2.041	91.85
120 – 140	35	130	2.114	73.99
Total	140			280.68

GM = Antilog ∑ f logxi

= antilog 280.68/140

= antilog 2.00

GM = 100

Example 3 – calculate geometric mean

Class	Frequency
2-4	3
4-6	4
6-8	2
8-10	1

Solution

Class	frequency	x	Log x	flogx
2-4	3	3	1.0986	3.2958
4-6	4	5	1.2875	6.4378
6-8	2	7	0.5559	3.8918
8-10	1	9	0.2441	2.1972
	10			15.8226

GM = Antilog ∑ f logxi

= antilog 15.8226/10

= antilog 1.5823

GM = 4.866

Key takeaways –

The geometric mean is a mean or average, which indicates the central tendency or typical value of a set of numbers by using the product of their values

4.9 Harmonic mean: Definition, Merits and demerits

Harmonic mean

Harmonic mean is quotient of “number of the given values” and “sum of the reciprocals of the given values

For ungrouped data

Merits of Harmonic Mean

It is rigidly defined.
It is based on all the observations of a series i.e.
It cannot be calculated ignoring any item of a series.
It is capable of further algebraic treatment.

Demerits of Harmonic Mean

It is not easy to understand by a man of ordinary prudence.
Its calculation is cumbersome as it involves finding out of the reciprocals of the numbers.
It does not give better and accurate results when the means adopted are the same for the different ends achieved.
Its algebraic treatment is very much limited and not far and wide as that of the arithmetic mean.
It is greatly affected by the values of the extreme items.
It cannot be calculated, if any, of the items is zero.

Example 1 - Calculate the harmonic mean of the numbers 13.2, 14.2, 14.8, 15.2 and 16.1

Solution

X	1/X
13.2	0.0758
14.2	0.0704
14.8	0.0676
15.2	0.0658
16.1	0.0621
Total	0.3147

H.M of X = 5/0.3147 = 15.88

Example 2 - Find the harmonic mean of the following data {8, 9, 6, 11, 10, 5} ?

X	1/X
8	0.125
9	0.111
6	0.167
11	0.091
10	0.100
5	0.200
Total	0.794

H.M of X = 6/0.794 = 7.560

For grouped data

Example 3 - Calculate the harmonic mean for the below data

Marks	30-39	40-49	50-59	60-69	70-79	80-89	90-99
F	2	3	11	20	32	25	7

Solution

Marks	X	F	F/X
30-39	34.5	2	0.0580
40-49	44.5	3	0.0674
50-59	54.4	11	0.2018
60-69	64.5	20	0.3101
70-79	74.5	32	0.4295
80-89	84.5	25	0.2959
90-99	94.5	7	0.0741
Total		100	1.4368

HM = 100/1.4368 = 69.59

Example 4 – find the harmonic mean of the given class

Ages	4	5	6	7
No. of students	6	4	10	9

Solution

X	F	f/x
4	6	1.50
5	4	0.80
6	10	1.67
7	9	1.29
	29.00	5.25

HM = 29/5.25 = 5.5

Example 5 – calculate harmonic mean

Class	Frequency
2-4	3
4-6	4
6-8	2
8-10	1

Solution

Class	frequency	X	f/x
2-4	3	3	1
4-6	4	5	0.8
6-8	2	7	0.28
8-10	1	9	0.11
	10		2.19

Harmonic mean = 10/2.19 = 4.55

Merits of mean

It is rigidly defined
It is easy to understand and easy to calculate
It is based upon all values of the given data
It is capable of future mathematical treatment
It is not much affected by sampling fluctuation

Demerits of mean

It cannot be calculated if any observation are missing
It cannot be calculated for open end classes
It is effected by extreme values
It cannot be located graphically
It may be number which is not present in the data

Key takeaways –

The harmonic mean is a type of numerical average. It is calculated by dividing the number of observations by the reciprocal of each number in the series.

4.10 Choice of A.M., G.M. and H.M.

Arithmetic Mean

Arithmetic mean represents a number that is achieved by dividing the sum of the values of a set by the number of values in the set. If a1, a2, a3,….,an, is a number of group of values or the Arithmetic Progression, then;

AM=(a1+a2+a3+….,+an)/n

Geometric Mean

The Geometric Mean for a given number of values containing n observations is the nth root of the product of the values.

GM = n√(a1a2a3….an)

GM = (a1a2a3….an)1/n

Harmonic Mean

HM is defined as the reciprocal of the arithmetic mean of the given data values. It is represented as:

HM = n/[(1/a1) + (1/a2) + (1/a3) + ….+ (1/an)]

The relationship between AM, GM and HM is given by:

AM x HM = GM2

Now let us understand how this relation is derived;

First, consider a, AM, b is an Arithmetic Progression.

Now the common difference of Arithmetic Progression will be;

AM – a = b – AM

a + b = 2 AM …………..(1)

Secondly, let a, GM, b is a Geometric Progression. Then, the common ratio of this GP is;

GM/a = b/GM

ab = GM2……………(2)

Third, is the case of harmonic progression, a, HM, b, where the reciprocals of each term will form an arithmetic progression, such as:

1/a, 1/HM, 1/b is an AP.

Now the common difference of the above AP is;

1/HM – 1/a = 1/b – 1/HM

2/HM = 1/b + 1/a

2/HM = (a + b)/ab ………….(3)

Substituting eq. 1 and eq.2 in eq. 3 we get;

2/HM = 2AM/GM2

GM2 = AM x HM

Hence, this is the relation between Arithmetic, Geometric and Harmonic mean.

Key takeaways –

The relationship between AM, GM and HM is given by:

AM x HM = GM2

4.11 Concept of dispersion

Dispersion is the state of getting dispersed or spread. Statistical dispersion means the extent to which a numerical data is likely to vary about an average value. In other words, dispersion helps to understand the distribution of the data.

Definition

Dispersion is the measures of the variation of the items --- A.L.Bowly

Dispersion is a measure of extent to which the individual items vary --- L.R.Connor

Importance of measuring variation or dispersion

Testing the Reliability of the Measures of Central Tendency
Comparing two or more series on the basis of the invariability
Enabling to control the variability
Facilitating as a Basis for further statistical Analysis

Characteristics of a Measure of Variation

It is easy to understand and simple to calculate
It should be rigidly defined
It should be based on all observations and it should not be affected by extreme observations
It should be amenable to further algebraic treatment
It should have sampling stability

Methods of Measuring Dispersion

Range
Inter Quartile range
Quartile Deviation
Mean Deviation
Standard Deviation
Lorenz Curve

Key takeaways –

Dispersion is the measures of the variation of the items

4.12 Measures of dispersion: Range, Variance, Standard deviation (SD) for grouped and ungrouped data

Range – Range defines the difference between the maximum value and the minimum value given in a data set. More the range , group is more variable. The smaller the range the more homogenous is the group.

R = H – L

Example 1 – 5, 10, 15, 20, 7, 9, 17, 13, 12, 16, 8, 6

Range = H-L

=20 – 5 = 15

Coefficient of range –

Coefficient of range = (15/(20+5))*100 = 60

Example 2 – what is the range for the following set of numbers?

15,21,57,43,11,39,56,83,77,11,64,91,18,37

Solution

Range = H-L

= 91 – 11 = 80

Therefore the range is 80

Example 3 – the frequency table shows the number of goals the lakers scored in their last twenty matches. What was the range

No. of goals	Frequency
0	2
1	3
2	3
3	6
4	3
5	1
6	1
7	1

Solution

The range is the difference between the lowest and highest values.

The highest value was 7 (They scored 7 goals on 1 occasion)

The lowest value was 0 (They scored 0 goals on 2 occasions)

Therefore the range = 7 - 0 = 7

Example 4 – the following table shows the sales of DVD players made by a retail store each month last year

Month	No. of sales
January	25
Feb	43
March	39
April	28
May	29
June	35
July	32
August	46
September	28
October	43
November	51
December	63

Solution

The range is the difference between the lowest and highest values.

The lowest number of sales = 25 in January

The highest number of sales = 63 in December

So the range = 63 - 25 = 38

Example 5 – what is the range for the following set of numbers?

57, -5, 11, 39, 56, 82, -2, 11, 64, 18, 37, 15, 68

Solution

The range is the difference between the lowest and highest values.

The highest value is 82.

The lowest value is -5.

Therefore the range = 82 - (-5) = 82+5 = 87

Merits

Simple and easy to understand
It gives a quick answer

Demerits

It is not based on all observation
Affected by sampling fluctuations
It cannot be calculated in open ended distributions

2. Interquartile range - the interquartile range measures the range of the middle 50% of the values only. It is calculated as the difference between the upper and lower quartile.

Interquartile range = upper quartile – lower quartile

= Q3 – Q1

Example 1 – find the interquartile range for 1, 2, 18, 6, 7, 9, 27, 15, 5, 19, 12.

Solution

Arrange the numbers in ascending order

1, 2, 5, 6, 7, 9, 12, 15, 18, 19, 27

Find the median

Median = 9

(1, 2, 5, 6, 7), 9, (12, 15, 18, 19, 27)

Q1 as median in the lower half and Q3 as median in the upper half

Q1 = median in (1, 2, 5, 6, 7)

Q1 = 5

Q3 = median in (12, 15, 18, 19, 27)

Q3 = 18

Interquartile range = 18 – 5 = 13

Example 2 – find the interquartile for the following data set: 3, 5, 7, 8, 9, 11, 15, 16, 20, 21.

Solution

Arrange the numbers in ascending order

3, 5, 7, 8, 9, 11, 15, 16, 20, 21

Make a mark in the center of the data:

(3, 5, 7, 8, 9,) | (11, 15, 16, 20, 21)

Find the median

Q1 = 7

Q3 = 16

Interquartile range = 16 – 7 = 9

Example 3 - find the interquartile for the following data set: 1, 3, 4, 5, 5, 6, 7, 11

Make a mark in the center of the data:

(1, 3, 4, 5,) (5, 6, 7, 11)

Find the median

Q1 = (3+4)/2 = 3.5

Q3 = (6+7)/2 = 6.5

Interquartile range = 6.5 – 3.5 = 3

Example 4 -

Find the interquartile range for odd sample size

63,64,64,70,72,76,77,81,81

Solution

Make a mark in the center of the data:

(63,64,64,70,)72,(76,77,81,81)

Find the median

Q1 = (64+64)/2 = 64

Q3 = (77+81)/2 = 79

Interquartile range = 79 – 64 = 15

3. Quartile deviation

Quartile deviation is the product of half of the difference between the upper and the lower quartiles.

QD = (Q3 - Q1) / 2

Coefficient of Quartile Deviation = (Q3 – Q1) / (Q3 + Q1)

Quartile deviation for ungrouped data

Examples 1

Day	Frequency
1	20
2	35
3	25
4	12
5	10
6	23
7	18
8	14
9	30
10	40

Solution

Arrange the frequency data in ascending order

Day	Frequency
1	10
2	12
3	14
4	18
5	20
6	23
7	25
8	30
9	35
10	40

First quartile (Q1)

Qi= [i * (n + 1) /4] th observation

Q1= [1 * (10 + 1) /4] th observation

Q1 = 2.75 th observation

Thus, 2.75 th observation lies between the 2nd and 3rd value in the ordered group, between frequency 12 and 14

First quartile (Q1) is calculated as

Q1 = 2nd observation +0.75 * (3rd observation - 2nd observation)

Q1 = 12 + 0.75 * (14 – 12) = 13.50

Third quartile (Q3)

Qi= [i * (n + 1) /4] th observation

Q3= [3 * (10 + 1) /4] th observation

Q3 = 8.25 th observation

So, 8.25 th observation lies between the 8th and 9th value in the ordered group, between frequency 30 and 35

Third quartile (Q3) is calculated as

Q3 = 8th observation +0.25 * (9th observation – 8th observation)

Q3 = 30 + 0.25 * (35 – 30) = 31.25

Now using the quartiles values Q1 and Q3, we will calculate the quartile deviation.

QD = (Q3 - Q1) / 2

QD = (31.25 – 13.50) / 2 = 8.875

Coefficient of Quartile Deviation = (Q3 – Q1) / (Q3 + Q1)

= (31.25 – 13.50) /(31.25 + 13.50) = 0.397

Example 2 – calculate quartile deviation from the following test scores

Sl. N o	Test scores
1	17
2	17
3	26
4	27
5	30
6	30
7	31
8	37

Solution

First quartile (Q1)

Qi= [i * (n + 1) /4] th observation

Q1= [1 * (8 + 1) /4] th observation

Q1 = 2.25 th observation

Thus, 2.25 th observation lies between the 2nd and 3rd value in the ordered group, between frequency 17 and 26

First quartile (Q1) is calculated as

Q1 = 2nd observation +0.75 * (3rd observation - 2nd observation)

Q1 = 17 + 0.75 * (26 – 17) = 23.75

Third quartile (Q3)

Qi= [i * (n + 1) /4] th observation

Q3= [3 * (8 + 1) /4] th observation

Q3 = 6.75 th observation

So, 6.75 th observation lies between the 6th and 7th value in the ordered group, between frequency 30 and 31

Third quartile (Q3) is calculated as

Q3 = 6th observation +0.25 * (7th observation – 6th observation)

Q3 = 30 + 0.25 * (31 – 30) = 30.25

Now using the quartiles values Q1 and Q3, we will calculate the quartile deviation.

QD = (Q3 - Q1) / 2

QD = (30.25 – 23.75) / 2 = 3.25

Quartile deviation for grouped data

Where,

l = lower boundary of quartile group

h = width of quartile group

f = frequency of quartile group

N = total number of observation

C= cumulative frequency preceding quartile group

Example 3

Age in years	40 -44	45 – 49	50 – 54	55 - 59	60 – 64	65 - 69
Employees	5	8	11	10	9	7

Solution

In the case of Frequency Distribution, Quartiles can be calculated by using the formula:

Class interval	F	Class boundaries	CF
40 -44	5	39.5 – 44.5	5
45 – 49	8	44.5 – 49.5	13
50 – 54	11	49.5 – 54.5	24
55 – 59	10	54.5 – 59.5	34
60 – 64	9	59.5 – 64.5	43
65 – 69	7	64.5 – 69.5	50
Total	50

First quartile (Q1)

Qi= [i * (n ) /4] th observation

Q1 = [1*(50)/4]th observation

Q1 = 12.50th observation

So, 12.50th value is in the interval 44.5 – 49.5

Group of Q1 = 44.5 – 49.5

Qi = (I + (h / f) * ( i * (N/4) – c) ; i = 1,2,3

Q1 = (44.5 + ( 5/ 8)* (1* (50/4) – 5)

Q1 = 49.19

Third quartile (Q3)

Qi= [i * (n) /4] th observation

Q3= [3 * (50) /4] th observation

Q3 = 37.5th observation

So, 37.5th value is in the interval 59.5 – 64.5

Group of Q3 = 59.5 – 64.5

Qi = (I + (h / f) * ( i * (N/4) – c) ; i = 1,2,3

Q3 = (59.5 + ( 5/ 9)* (3* (50/4) – 34)

Q3 = 61.44

QD = (Q3 - Q1) / 2

QD = (61.44 – 49.19) / 2 = 6.13

Coefficient of Quartile Deviation = (Q3 – Q1) / (Q3 + Q1)

= (61.44 – 49.19) /(61.44 + 49.19) = 0.11

Example 4 – computation of quartile deviation for grouped test scores

Class	Frequency
9.3-9.7	22
9.8-10.2	55
10.3-10.7	12
10.8-11.2	17
11.3-11.7	14
11.8-12.2	66
12.3-12.7	33
12.8-13.2	11

Solution

Class	Frequency	Class boundaries	CF
9.3-9.7	2	9.25-9.75	2
9.8-10.2	5	9.75-10.25	2 + 5 = 7
10.3-10.7	12	10.25-10.75	7 + 12 = 19
10.8-11.2	17	10.75-11.25	19 + 17 = 36
11.3-11.7	14	11.25-11.75	36 + 14 = 50
11.8-12.2	6	11.75-12.25	50 + 6 = 56
12.3-12.7	3	12.25-12.75	56 + 3 = 59
12.8-13.2	1	12.75-13.25	59 + 1 = 60

First quartile (Q1)

Qi= [i * (n ) /4] th observation

Q1 = [1*(60)/4]th observation

Q1 = 15th observation

So, 15th value is in the interval 10.25-10.75

Group of Q1 = 10.25-10.75

Qi = (I + (h / f) * ( i * (N/4) – c) ; i = 1,2,3

Q1 = (10.25 + ( 0.5/ 12)* (1* (60/4) – 7)

Q1 = 10.58

Third quartile (Q3)

Qi= [i * (n) /4] th observation

Q3= [3 * (60) /4] th observation

Q3 = 45th observation

So, 45th value is in the interval 11.25-11.75

Group of Q3 = 11.25-11.75

Qi = (I + (h / f) * ( i * (N/4) – c) ; i = 1,2,3

Q3 = (11.25 + ( 0.5/ 14)* (3* (60/4) – 36)

Q3 = 11.57

QD = (Q3 - Q1) / 2

QD = (11.57 – 10.58) / 2 = 0.495

Example 5 – calculate quartile deviation from the following data

CI	F
10 – 15	6
15 – 20	10
20 – 25	15
25 – 30	22
30 – 40	12
40 – 50	9
50 – 60	4
60 – 70	2

Solution

CI	F	Cf
10 – 15	6	6
15 – 20	10	16
20 – 25	15	31
25 – 30	22	53
30 – 35	12	65
35 – 40	9	74
45 – 50	4	78
55 – 60	2	80

First quartile (Q1)

Qi= [i * (n ) /4] th observation

Q1 = [1*(80)/4]th observation

Q1 = 20th observation

So, 20th value is in the interval 20 - 25

Group of Q1 = 20 - 25

Qi = (I + (h / f) * ( i * (N/4) – c) ; i = 1,2,3

Q1 = (20 + ( 5/ 15)* (1* (80/4) – 16)

Q1 = 21.33

Third quartile (Q3)

Qi= [i * (n) /4] th observation

Q3= [3 * (80) /4] th observation

Q3 = 60th observation

So, 60th value is in the interval 30 - 35

Group of Q3 = 30 - 35

Qi = (I + (h / f) * ( i * (N/4) – c) ; i = 1,2,3

Q3 = (30 + ( 5/ 12)* (3* (80/4) – 53)

Q3 = 32.91

QD = (Q3 - Q1) / 2

QD = (32.91 – 21.33) / 2 = 5.79

Merits

It provide better result than range mode
It is not effected by extreme values

Demerits

It is completely dependent on central item
All items are not taken onto consideration

4. Mean deviation – The average of the absolute values of deviation from the mean, median or mode is called mean deviation. This method removes shortcoming of range and QD.

Where, ∑ is total of;

X is the score, X is the mean, and N is the number of scores

D = Deviation of individual scores from mean

Example 1 –

Computation of mean deviation in ungrouped data

X = 55, 45, 39, 41, 40, 48, 42, 53, 41, 56

Solution

X	(X – X )	Absolute deviation (signed ignored)
55	55 - 46 = 9	9
45	45 – 46 = -1	1
39	-7	7
41	-5	5
40	-6	6
48	2	2
42	-4	4
53	7	7
41	-5	5
56	10	10
∑X = 460		∑ X – X = 56

Mean = 460/10 = 46

MD = 56/10 = 5.6

Example 2- Peter did a survey on the number of pets owned by his classmates, with the following result. What is the mean deviation of the number of pets?

No. of pets	Frequency
0	4
1	12
2	8
3	2
4	1
5	2
6	1

Solution

X	F	Fx	(X – X )	f(X – X )
0	4	0	1.8	7.2
1	12	12	0.8	9.6
2	8	16	0.2	1.6
3	2	6	1.2	2.4
4	1	4	2.2	2.2
5	2	10	3.2	6.4
6	1	6	3.2	4.2
	30	54	4.2	33.6

Mean = 54/30 = 1.8

MD = 33.6/30 = 1.12

Computation of Mean deviation in grouped data

Example 3 -

Class interval	15 – 19	20 – 24	25 – 29	30 – 34	35 – 39	40 – 44	45 - 49
Frequency	1	4	6	9	5	3	2

Class Interval	F	X	FX	D	FD
15 – 19	1	17	17	15	15
20 – 24	4	22	88	10	40
25 – 29	6	27	162	5	30
30 – 34	9	32	288	0	0
35 – 39	5	37	185	5	25
40 – 44	3	42	126	10	30
45 – 49	2	47	94	15	30
	N = 30		∑fx = 960		= 170

Mean =960/30 = 32

MD = 170 / 30 = 5.667

Coefficient of mean deviation

Coefficient of mean deviation = (5.67/32)*100 = 17.71

Example 4 – Calculate mean deviation from the median.

Class	5 -15	15 - 25	25 - 35	35 - 45	45 – 55
Frequency	5	9	7	3	8

Solution

x	f	cf	Midpoint x	x –median	F(x-m)
5 -15	5	5	10	17.42	87.1
15 -25	9	14	20	7.42	66.78
25 -35	7	21	30	2.58	18.06
35 -45	3	24	40	12.58	37.74
45- 55	8	32	50	22.58	180.64
	32				390.32

Since n/2 = 32/2 = 16, therefore the class is 25 – 35 is the median.

Median =

Median = 25+16-14 *10 = 27.42

MD from median is 390.32/32 = 12.91

Example 5 – calculate the mean deviation from continuous frequency distribution

Age group	15 - 25	25 - 35	35 - 45	45 – 55
No. of people	25	54	34	20

Solution

Age group (X)	Number of people (f)	Midpoint x	fx	X – x	f( x – x )
15 – 25	25	20	500	13.684	324.1
25 – 35	54	30	1620	3.684	198.936
35 – 45	34	40	1360	6.316	214.744
45 – 55	20	50	1000	16.316	352.32
	133				1090.1

Mean = 4480/133 = 33.684

MD = 1090.1/133 = 8.196

Merits

It is easy to calculate
It helps in making comparison
It is not affected by extreme items

Demerits

It ignores algebraic sign. And are not used for mathematical treatment
It is not reliable

5. Standard deviation – standard deviation is calculated as square root of average of squared deviations taken from actual mean. It is also called root mean square deviation. This measure suffers from less drawbacks and provides accurate results. It removes the drawbacks of ignoring algebraic sign. We square the deviation to make them positive.

Two ways of computing SD

Direct method

2. Shortcut method

d = Deviation of the score from an assumed mean, say AM; i.e. d = (X – AM). AM is assumed mean

d2 = the square of the deviation.

∑d = the sum of the deviations.

∑d2 = the sum of the squared deviations.

N = No. of the scores

Standard deviation in ungrouped data

Direct method

Example 1–

X = 12, 15, 10, 8, 11, 13, 18, 10, 14, 9

Mean = 120/10 = 12

Scores	d
12	12-12 = 0	0
15	15-12 = 3	9
10	10 -12 = -2	4
8	-4	16
11	-1	1
13	1	1
18	6	36
10	-2	4
14	2	4
9	-3	9
= 120	= 0	= 84

= 2.9

2. Shortcut method

Assumed mean (AM) = 11

Scores	D = (X- AM)
12	12-11 = 1	1
15	15-11 = 4	16
10	10 -11 = -1	1
8	-3	9
11	0	0
13	2	4
18	7	49
10	--1	1
14	3	9
9	-2	4
= 120	= 10	= 94

SD from short cut method = 2.9

Example 2 –Ram did a survey of the number of pets owned by his classmates, with the following results

No. of pets	Frequency
0	4
1	12
2	8
3	2
4	1
5	2
6	1

Solution

X	f	fx	X – x	(x – x ) 2	F (x – x ) 2
0	4	0	-1.8	3.24	12.96
1	12	12	-0.8	0.64	7.68
2	8	16	0.2	0.04	0.32
3	2	6	1.2	1.44	2.88
4	1	4	2.2	4.84	4.84
5	2	10	3.2	10.24	20.48
6	1	6	4.2	17.64	17.64
	30	54			66.80

Mean = 54/30 = 1.8

SD = √66.80/30 = 1.49

Standard deviation in grouped data

Direct method

Example 3 –

C.I.	0 - 2	3 - 5	6- 8	9-11	12-14	15 -17	18 - 20
F	1	3	5	7	6	5	3

Solution

C.I	f	Midpoint x	Fx	D		fd2
0-2	1	1	1	-10.1	102.01	102.01
3-5	3	4	12	-7.1	50.41	151.23
6-8	5	7	35	-4.1	16.81	84.05
9-11	7	10	70	-1.1	1.21	8.47
12-14	6	13	78	1.9	3.61	21.66
15-17	5	16	80	4.9	24.01	120.05
18-20	3	19	57	7.9	62.41	187.23
	30		333			674.70

Mean = 333/30 = 11.1

SD =

Shortcut method

C.I	F	Midpoint x	d(X-AM)	Fd	fd2
0-2	1	1	-9	-9	81
3-5	3	4	-6	-18	108
6-8	5	7	-3	-15	45
9-11	7	10	0	0	0
12-14	6	13	3	18	54
15-17	5	16	6	30	180
18-20	3	19	9	27	243
	30			33	711

Assumed mean = 10

Step deviation method

C.I	F	Midpoint x	d	Fd	fd2
0-2	1	1	-3	-3	9
3-5	3	4	-2	-6	12
6-8	5	7	-1	-5	5
9-11	7	10	0	0	0
12-14	6	13	1	6	6
15-17	5	16	2	10	20
18-20	3	19	3	9	27
	30			11	79

Here, d is calculate as (X –AM)/i, where i is length of class interval

d = (1 -10)/3 = -3 and so on

Coefficient of standard deviation

Coefficient of SD = (4.74/11.1)*100 = 42.70

Example 4 – calculate the standard deviation using the direct method

Class interval	Frequency
30 – 39	3
40 – 49	1
50 – 59	8
60 – 69	10
70 – 79	7
80 – 89	7
90 – 99	4

Solution

Class interval	Frequency	Midpoint x	fx	X – x	(x – x ) 2	F (x – x ) 2
30 – 39	3	34.5	103.5	-33.5	1122.25	3366.75
40 – 49	1	44.5	44.5	-23.5	552.25	552.25
50 – 59	8	54.5	436.0	-13.5	182.25	1458
60 – 69	10	64.5	645.0	-3.5	12.25	122.5
70 – 79	7	74.5	521.5	6.5	42.25	295.75
80 – 89	7	84.5	591.5	16.5	272.25	1905.75
90 – 99	4	94.5	378.0	26.5	702.25	2809
	40		2720			10510

Mean = 2720/40 = 68

SD = √10510/40 = 16.20

Example 5 - calculate the mean and standard deviation of hours spent watching television by the 220 students.

Hours	No. of students
10 – 14	2
15 – 19	12
20 – 24	23
25 – 29	60
30 – 34	77
35 – 39	38
40 – 44	8

Solution

Hours	No. of students	x	fx	X – x	(x – x ) 2	F (x – x ) 2
10 – 14	2	12	24	-17.82	317.49	634.98
15 – 19	12	17	204	-12.82	164.31	1971.67
20 – 24	23	22	506	-7.82	61.12	1405.85
25 – 29	60	27	1620	-2.82	7.94	476.53
30 – 34	77	32	2464	2.18	4.76	366.55
35 – 39	38	37	1406	7.18	51.58	1959.98
40 – 44	8	42	336	12.18	148.40	1187.17
	220		6560			8002.73

Mean = 6560/220 = 29.82

SD = √8002.73/220 = 6.03

Key takeaways –

Range is the difference between the highest and the lowest value of frequency for a given frequency distribution.
The Quartile Deviation (QD) is the product of half of the difference between the upper and. lower quartiles.
The Standard Deviation is a measure of how spread out numbers are.
The mean deviation is defined as a statistical measure which is used to calculate the average deviation from the mean value of the given data set.

4.13 Combined SD, Measures of relative dispersion

Combined SD

For the combined standard deviation of multiple distributions – X1, X2, X3 … Xk, with individual standard deviations – s1, s2, s3 … sk, arithmetic means – a1, a2, a3 …. ak, and each distribution in turn containing n1, n2, n3 …. nk number of data points, the formula is –

Where the summation extends over all the distributions.
In the final formula,

Where, – the arithmetic mean of the i’th distribution

– the combined arithmetic mean of all distributions, given by –

Measures of relative dispersion

Absolute measure of dispersion indicates the amount of variation in a set of values in terms of units of observations. For example, when rainfalls on different days are available in mm, any absolute measure of dispersion gives the variation in rainfall in mm. On the other hand relative measures of dispersion are free from the units of measurements of the observations They are pure numbers. They are used to compare the variation in two or more sets, which are having different units of measurements of observations.

The various absolute and relative measures of dispersion are listed below.

Absolute measure

1. Range

2. Quartile deviation

3. Mean deviation

4. Standard deviation

Relative measure

Co-efficient of Range
Co-efficient of Quartile deviation
Co-efficient of Mean deviation
Co-efficient of variation

4.14 Coefficient of range

The relative measure corresponding to range, called the coefficient of range, is obtained by applying the following formula:

• Coefficient of Range= (L- S)/ (L S)

Example 1

Find the value of range and its co-efficient for the following data.

7, 9, 6, 8, 11, 10

Solution

L = 11, S = 4.

Range = L – S = 11- 4 = 7

Co-efficient of Range = 𝐿−𝑆 / 𝐿+𝑆

Co-efficient of Range = 11−4/11+4

Co-efficient of Range = 7/15

Co-efficient of Range = 0.4667

Example 2

Calculate range and its co efficient from the following distribution.

Size: 60-63 63-66 66-69 69-72 72-75

Number: 5 18 42 27 8

Solution

L = Upper boundary of the highest class. = 75

S = Lower boundary of the lowest class. = 60

Range = L – S = 75 – 60 = 15

Co-efficient of Range = 𝐿−𝑆/𝐿+S

Co-efficient of Range =75−60/75+60

Co-efficient of Range =15/135

Co-efficient of Range = 0.1111

Example 3

The marks obtained by 9 students are given below:

X = 45 32 37 46 39 36 41 48 36

Calculate coefficient of range

Solution

Co-efficient of Range = 𝐿−𝑆/𝐿+S

Co-efficient of Range =48−32/48+32

Co-efficient of Range =16/80

Co-efficient of Range = 0.2

Key takeaways –

The coefficient of range on the other hand is the ratio of difference between the highest and lowest value of frequency to the sum of highest and lowest value of frequency.

4.15 Coefficient of variation

Coefficient of variation

Formula for calculate Arithmetic Mean (A.M)

Q1) Calculate coefficient variation for the following frequency distribution.

Wages in Rupees earned per day	0-10	10-20	20-30	30-40	40-50	50-60
No. of Labourers	5	9	15	12	10	3

A 1)

Wages earned C.I	Mid value	Frequency
52	5	5	-2	-10	20
153	15	9	-1	-9	9
25	25	15	0	0	0
35	35	12	1	12	12
45	45	10	2	20	40
55	55	3	3	9	27
Total	-

Using formula,

….. (refer last Ex.)

Now, A.M

A.M

Coefficient of Variation

Q2)

Fluctuations in the aggregate of marks obtained by two groups of students are given below.

Group A	518	519	530	530	530	544	518	550	527	527	531	550	550	529	528
Group B	825	830	830	819	814	814	844	842	826	826	832	835	835	840	840

A2) First we represent the data in frequency distribution from group A

518

519

527

528

529

530

531

542

544

550

-12

-11

-3

-2

-1

144

121

144

196

400

-24

-11

-6

-2

-1

288

121

144

196

1200

Total

For group B,

814

819

825

826

830

832

835

840

842

844

-16

-11

-5

-4

256

121

100

144

196

-24

-11

-6

-2

-1

288

121

144

196

1200

Total

As we calculate,

σ for Group A σA=11.105

Now A.M

A.M

Coefficient of Variation

Same for Group B,

Now,

Coefficient of Variation

Key takeaways –

The coefficient of variation (CV) is defined as the ratio of the standard deviation to the mean. , It shows the extent of variability in relation to the mean of the population

References

Practical business mathematics by S.A Bari
Mathematics of commerce by K. Selvakumar
Business mathematics with application by Dinesh Khattar and S.R Arora
Statistical methods by Gupta SP

Sign Up

Index

Notes

Highlighted

Underlined

Browse by Topics

Notes

Highlighted

Underlined