UNIT 3
Introduction
Introduction of Statistics:
The word statistics is derived from the Latin word “Status” that means a group of numbers that represent some information of our human interest. In ancient periods, the use of statistics was made to meet the administrative needs of the state. In modern time, the statistics is not only used for administrative of the state alone, also evaluate all those activities in our lives which can be expressed in quantitative terms.
The term “statistics” is defined in two senses: - in singular and in Plural senses.
Firstly, in plural sense, statistics means systematic collection of numerical facts. Secondly in singular sense, the term statistics means the various methods used for collection, analysis and interpretation of numerical facts. It is described as statistical method. In our study we are more concerned with the second meaning of statistics.
Definition of Statistics:
“Statistics is a body of methods for making wise decisions on the face of uncertainty.” —Wallis and Roberts
“Statistics is a body of methods for obtaining and analysing numerical data in order to make better decisions in an uncertain world.” —Edward N. Dubois
Statistics are numerical statement of facts in any department of enquiry placed interrelation to each other. - Bouly.
The science of Statistics is essentially a branch of applied mathematics and can be regarded as a mathematics applied to observation data. - R.A fisher.
After analysing the various definitions of statistics, the most proper definition of statistics is as follows –
“Statistics in the plural sense are numerical statements of facts capable of some meaningful analysis and interpretation, and in singular sense, it relates to the collection, classification, presentation and interpretation of numerical data.”
Key takeaways -statistics means systematic collection of numerical facts
Scope of Statistics:
Functions of Statistics:
Expression of facts in numbers – One of the important function of statistics is to express facts in definite form ie, in the form of numbers. The results expressed in definite form are more convincing than the result expressed on the basis of quality.
Presentation of facts – Statistics helps in presenting the complex data in a simple form, so that it becomes easy to understand. Statistical methods present data in the form of graph, diagram, average, coefficient, etc.
Comparison – After simplifying the data, it can be correlated and compared. Comparing data relating to fact is one of the functions of statistics as absolute figures convey less meaning.
It helps other science- Many laws of economic, law of demand, law of supply has been verified with the help of statistics.
Forecasting – Statistics also predicts future course of action. On the basis of estimates with the help of statistics we can make future policies.
Policy making – Statistics helps in formulating favorable policies. Based on the forecast the government makes policies.
Key takeaways – function of statistics helps in presentation and comparison
Importance of Statistics:
Importance for administrator administration – With the help of statistics, finance minister makes the use of revenue and expenditure data to prepare budget. Also, it helps in taking decision regarding taxes.
Importance for businessman – statistics helps in providing relevant data. Thus, with the help of those data a business man can estimate demand and supply of the commodity.
Importance in economics – statistics helps in measuring economics such as gross national output, consumption, saving, investment, expenditures, etc.
Importance for politician – Politician use statistics in formulating economic, social and educational policies of the country
Importance in the field of education – statistics has wide application in education for determining the reliability and viability to a test, factor analysis, etc.
Key takeaways – statistics helps the business, education, political sectors
Limitations and Distrust of Statistics:
Study of numerical facts only – statistical method does not study quantitative phenomena such as honesty, wisdom, etc. So experiments are done to measure the reaction of man through data.
Study of aggregates only – statistics study only aggregates of quantitative facts. It does not study any particular unit.Prof. Horace Sacrist defined statistics, “By statistics we mean aggregates of facts…. And placed in relation to each other”
It does not depict the entire story of phenomena – Any phenomena happen, due to many causes. But all the cause is not expressed in numbers. So, correct conclusion cannot be drawn. Analysing quantitative data and ignoring qualitative data cannot give 100% conclusion.
Homogeneity of data – To compare the data, it is essential that whatever statistics are collected, the same must be uniform in quality.
It is liable to be miscued – As W.I. King points out, “One of the short-comings of statistics is that do not bear on their face the label of their quality.” Thus, the data collected by inexperienced person may be dishonest or biased. So, to get correct conclusion data must be used in caution.
Too many methods to study a problem –to find a single result many statistical methods are used. All the methods result vary in each case. “It must not be assumed that the statistics is the only method to use in research, neither should this method of considered the best attack for the problem.” —Croxten and Cowden
Key takeaways – statistics have few limitations like biased data, homogeneity
Collection of data
Definition
Data collection is defined as the procedure of collecting, measuring and analyzing accurate insights for research using standard validated techniques.
Irrespective of the field of research, data collection is the primary and most important step for research. Depending on the required information, the approach of data collection is different for different fields of study.
The objective of data collection is ensuring that rich information and reliable data is collected for statistical analysis so that data-driven decisions can be made for research.
Data collection method
Data collection methods can be divided into two categories: secondary methods of data collection and primary methods of data collection.
Secondary data - Secondary data is a type of data that has already been published in books, newspapers, magazines, journals, online portals etc. there is lot of information available in these sources. Therefore appropriate secondary data are used in the study plays an important role in terms of increasing the levels of research validity and reliability.
Primary data –
Primary data collection methods can be divided into two groups: quantitative and qualitative.
Quantitative data collection methods are based in mathematical calculations in various formats. Methods of quantitative data collection and analysis include questionnaires with closed-ended questions, methods of correlation and regression, mean, mode and median and others. Quantitative methods are less expensive and they can be applied within shorter duration of time. These methods are easy to make comparison between the findings.
Qualitative research methods, on the other hand, do not involve numbers or mathematical calculations. Qualitative research is closely associated with words, sounds, feeling, emotions, colours and other elements that are non-quantifiable.
Primary and secondary data are discussed more in detail in the below section.
Primary data is the information collected through original or first-hand research. Primary data is more reliable and authenticate as the data is nor changed or altered by any human beings. Also, the data is not published yet. Primary data is gathered by any authorized organization, investigator, and enumerator.
“Data which are gathered originally for a certain purpose are known as primary data.” — Horace Secrist
Sources of primary data
The sources of primary data are as follows –
1. Experiments: In natural sciences, experiments are most reliable source of data collections. Experiments are conducted for medicine, psychological studies, nutrition and other scientific studies. Experiments are conducted in the fields as well as laboratories. The results of experiments are analysed by statistical test and thereafter conclusions are drawn.
2. Survey: surveys are used in social science, management, marketing and psychology to some extent. Surveys are conducted in different methods.
3. Questionnaire: Questionnaires consist of list of question either open ended or close ended for which the participants answer. Questionnaire can be conducted via telephone, mail, institute, fax, etc.
4. Interview: Interviews are expensive method of data collection. The interviewer collects information from each respondent independently. It involves in-depth questioning and follow up question. While taking interview, the interviewer can observe the body language and other reaction to the question.
5. Observation: observation can be conducted with or without knowledge of the participants. Observation can be made either natural or artificially created environment.
Secondary Data:
Secondary data are public information that has been collected by others. The data collected from primary research and used by other is referred as secondary data. The secondary data may be obtained from various sources like industry surveys, database and information system, etc.
“The data which are used in an investigation, but which have been gathered originally by someone else for some other purpose are known as secondary data.” — Blair
Sources of secondary data
Government statistics – Government statistics are widely available and easily accessible online. It provides information regarding trade activity, pricing and economic trends, business information, patents, population statistics, heath record, etc
Books – Books are available on any topic you want to research. Books provide insight on how much information is given for a particular topic and you can prepare your literature review.
Journals – Journals provide up to date information on the very specific topic on which you want to research. Journal is one of the most important sources for providing the information on data collected.
Magazine or newspaper – Newspaper or magazine provide daily information regarding politics, business, sports, fashion, etc which can be used for conducting research.
Internet – internet is becoming advance, fast and reachable to the masses and much information is available on internet. Almost all journals, books are available on internet. Some are free and others you have to pay price
Company website – company’s website provides lots of information. They have a section called investor relations which contains full of annual reports, regulatory findings and investor presentations that can provide insights into both the individual company’s performance and that of the industry at large.
Key takeaways – Data can be collected through primary and secondary data
Census and sampling method of survey
Both census and sampling provide information about a population. In census, each and every unit of population is studies. While in sampling small units are studied which represents the population. Government uses both census and sampling data for various purposes like planning, development programs, etc.
Census method
A well-organized procedure of gathering, recording and analyzing information regarding the members of the population is called a census. Under method census each and every unit of the universe is included in the collection of data. Huge amount of finance, time and labour are required for gathering information. This method is useful to find out the ratio of male to female, the ratio of literate to illiterate people, the ratio of people living in urban areas to the people in rural areas.
Merits
- It helps government with future plans
- It gives complete information about population
- It gives more reliable and accurate information
- It covers wide range of the study
Demerits
- It is time consuming and expensive
- Sometimes we many loose information while investing all individual
- It need a number of manpower
Sampling method
The sample is a small segment considered for study which represents the standard of entire population. The selection of sample should give justifiable conclusion about the whole population. When the population size is very large and it is difficult to consider all members then sampling method is used. Under this method selection of appropriate representative sample is utmost important. On the basis of data collected from sample, conclusion is drawn for the whole population.
Types of sampling method
- Probability sampling
- It is also called as random sampling
- Random sampling is one of the simplest sampling technique in which each sample have an equal chance of being chosen from the population
- It is an unbiased representation of the population
Types of random sampling
- Simple random sampling – It is one of the basic and easiest form of random sampling. Simple random sampling assures that every member have an equal chance of being included in the sample.
2. Stratified random sampling – It is also known as proportional random sampling. In this sampling technique, the populations are split into different groups. The overall sample selected randomly from different groups. This techniques guarantee that each group will be represented in sample.
3. Systematic random sampling – systematic random sampling refers to selecting sample on a system of interval in a numbered population.
4. Cluster random sampling – under cluster sampling, the researcher divide the population into separate groups known as cluster. Here each cluster represents the population as a whole. The researcher randomly selects the cluster for his analysis.
b. Non random sampling –
- It is also called as non probability sampling
- Non random sampling is one of the sampling technique in which each sample does not have an equal chance of being chosen from the population
- It is abiased representation of the population
Types of non random sampling
- Convenience sampling – Under this technique, the samples are selected because they are easily accessible to the researcher. This technique is easiest, cheapest and less time consuming
2. Consecutive sampling – It is similar like convenience sampling. Under this technique all subjects that are available as a part of sample are included which result a better presentation of the entire population
3. Quota sampling – Under quota sampling, the samples are selected on the same proportions of individuals as the entire population depending on characteristics, traits as the basis of quota.
4. Judgmental sampling - Judgmental sampling is more known as purposive sampling. The researcher keeps a specific purpose in mind and selects the subject for sampling process. The researcher believes that some subjects are fit for the research compared to other individuals..
5. Snowball sampling – this technique is used when the population size is small. Under snowball sampling, the researcher ask initial subject to identify another potential subject who meets the criteria of research. Thus, this technique hardly represents the population.
Key takeaways – Census consider each and every unit in population sample takes small segment from the entire population
Direct and Indirect, Personal Investigation on the basis of existing documents
Direct personal investigation
Under this the investigator visits the person those are source of the data and collects necessary information either through interview with the persons concerned or through observation of the data on the spot. Where intensive study of any phenomenon is required this method is suitable.
Merits
- The data are obtained directly so it is more reliable
- It avoids sensitive question
- This involves more chances of response
- According to the standard of the sources, the questions are adjusted
- There is uniformity in data
Demerits
- It is not suitable where the field is very vast and wide.
- It is very much expensive.
- It needs a large number of enumerators.
- It takes long time to collect the data from all the persons.
Suitability
- Investigation area is limited
- High degree of accuracy required
- To keep the result of investigation secret
- Area of investigation is homogeneous i.e. having same qualities.
Indirect oral investigation
Under this the investigator collects the data indirectly by interviewing persons who are known to be close to the original persons or the incidence. When the original persons are not found or found to be reluctant to provide the required information, this method is adopted. Here a list of questions is prepared, and the witnesses are invited and made to answer the questions. The investigator records the answer.
Merits
- It can cover a wide area.
- It needs less time, energy and money.
- Third party does not conceal the facts.
- Intelligence, skill and tact of the investigator brings accuracy.
Demerits
- The information are not reliable as the data are obtained from other person
- The third parties may be biased
- The chosen witness may not be proper for the cause or not expert in this field.
Suitable
- The direct approach to sources does not exist.
- The person cannot be relied upon
- The area of investigation is large.
- The information is needed, is kept secret from person.
Key takeaways – investigation is done directly visiting the customers and indirectly by sourcing the witness.
Preparation of Questionaries’ and Schedules
Questionnaires
List of a research or survey question asked to respondents, and designed to extract specific information. Questionnaire refers to a technique of data collection which consist of a series of written questions along with alternative answers
Purpose of questionnaire
- Collect the appropriate data
- Make data comparable and amenable to analysis
- Minimize bias in formulating and asking question
- To make question engaging and varied
Schedules
Schedule is a formalized set of questions, statements and spaces for answers, provided to the enumerators who ask questions to the respondents and note down the answers.
Steps in the development of questionnaires
- Deciding on the information required - The first step is to decide 'what are the things one needs to know from the respondent in order to meet the survey's objectives
- Define the target respondents – The next step is to define the population about which he/she wishes to generalize from the sample data to be collected. To determine the target audience, the questionnaire must take into account factors such as the age, education, etc.
- Choosing methods of reaching target respondents –The researcher must choose the method of reaching the target audiences. The main methods available in survey research are:
- Personal interviews
- Group or focus interviews
- Mailed questionnaires
- Telephone interviews.
4. Decide on question content- Researchers must always be prepared to ask, "Is this question really needed?". No question should be included unless the data it gives rise to is directly of use in testing one or more of the hypotheses established during the research design.
5. Create questions with straightforward,unbiased language – the wuestion should be straifgt forward and do not create any confusion for your customers because this may wrongly influence their answers.
6. Ensure every question is important – while designing questionnaire, make sure each question has a specific purpose. Each one should be aimed at collecting certain pieces of information that reveal new insights into different aspects of your business.
7. Order your question logically - A good questionnaire is like a good book. The beginning questions should lay the framework, the middle ones should cut to the core issues, and the final questions should tie all of the loose ends up.
8. Test your questionnaire –its important to test the questionnaire, once it is completed. Start by giving your employees the questionnaire to test, then send it to small groups of customers and analyze the results.
Key takeaways –
- List of a research or survey question asked to respondents are questionnaire and schedule are provided to the enumerators
2.2 Sample Survey
Population
A well-organized procedure of gathering, recording and analyzing information regarding the members of the population is called a census. Under method census each and every unit of the universe is included in the collection of data. Huge amount of finance, time and labour are required for gathering information. This method is useful to find out the ratio of male to female, the ratio of literate to illiterate people, the ratio of people living in urban areas to the people in rural areas.
Sampling units
A sampling unit can refer to any single person, animal, plant, product or ‘thing’ being researched. Sampling units are taken from an entire population, such as a country, customer database or region, and put into a smaller group to form a research sample. This group of units is then used to research, analyse and draw conclusions on.
Example -Conducting research using a sample of university students, a single university student would be a sampling unit.
Sampling variances
The variance is mathematically defined as the average of the squared differences from the mean. In order to understand what you are calculating with the variance, break it down into steps:
Step 1: Calculate the mean (the average weight).
Step 2: Subtract the mean and square the result.
Step 3: Work out the average of those differences.
Sample survey
The sample is a small segment considered for study which represents the standard of entire population. The selection of sample should give justifiable conclusion about the whole population. When the population size is very large and it is difficult to consider all members then sampling method is used. Under this method selection of appropriate representative sample is utmost important. On the basis of data collected from sample, conclusion is drawn for the whole population.
Types of sampling method
c. Probability sampling
- It is also called as random sampling
- Random sampling is one of the simplest sampling technique in which each sample have an equal chance of being chosen from the population
- It is an unbiased representation of the population
Types of random sampling
- Simple random sampling – It is one of the basic and easiest form of random sampling. Simple random sampling assures that every member have an equal chance of being included in the sample.
2. Stratified random sampling – It is also known as proportional random sampling. In this sampling technique, the populations are split into different groups. The overall sample selected randomly from different groups. This techniques guarantee that each group will be represented in sample.
3. Systematic random sampling – systematic random sampling refers to selecting sample on a system of interval in a numbered population.
4. Cluster random sampling – under cluster sampling, the researcher divide the population into separate groups known as cluster. Here each cluster represents the population as a whole. The researcher randomly selects the cluster for his analysis.
Two sampling stage - In cluster sampling, all the elements in the selected clusters are surveyed. Moreover, the efficiency in cluster sampling depends on the size of the cluster. As the size increases, the efficiency decreases. It suggests that higher precision can be attained by distributing a given number of elements over a large number of clusters and then by taking a small number of clusters and enumerating all elements within them. This is achieved in sub sampling.
In sub sampling
- Divide the population into clusters.
- Select a sample of clusters [first stage}
- From each of the selected cluster, select a sample of the specified number of elements [second stage]
- A pictorial scheme of two stage sampling scheme is as follows:
- Non random sampling –
- It is also called as non probability sampling
- Non random sampling is one of the sampling technique in which each sample does not have an equal chance of being chosen from the population
- It is an biased representation of the population
Types of non random sampling
- Convenience sampling – under this technique, the samples are selected because they are easily accessible to the researcher. This technique is easiest, cheapest and less time consuming
2. Consecutive sampling – It is similar like convenience sampling. Under this technique all subjects that are available as a part of sample are included which result a better presentation of the entire population
3. Quota sampling – Under quota sampling, the samples are selected on the same proportions of individuals as the entire population depending on characteristics, traits as the basis of quota.
4. Judgmental sampling - Judgmental sampling is more known as purposive sampling. The researcher keeps a specific purpose in mind and selects the subject for sampling process. The researcher believes that some subjects are fit for the research compared to other individuals.
Advantages of Purposive Sampling (Judgment Sampling)
- Purposive sampling is one of the most cost-effective and time-effective sampling methods available
- If there are only limited number of primary data sources who can contribute to the study purposive sampling may be the only appropriate method available
Disadvantages of Purposive Sampling (Judgment Sampling)
- Vulnerability to errors in judgment by researcher
- Low level of reliability and high levels of bias.
- Inability to generalize research findings
5. Snowball sampling – this technique is used when the population size is small. Under snowball sampling, the researcher ask initial subject to identify another potential subject who meets the criteria of research. Thus, this technique hardly represents the population.
Key takeaways –
- Sample survey can be probability sampling and non probability sampling
2.3 Graphic Representation of Data
A graph is a visual form of presentation of statistical data. A graph is more attractive than a table of figure. It helps the common man to understand more efficiently and effectively. It facilitates comparisons between two or more phenomena very easily.
Histogram – histogram is a bar graph representing the frequency of occurrence by classes of data. In histogram data are plotted as a series of rectangle. ‘X axis’ consist of class intervals and ‘Y axis’ shows the frequencies. It is also called stair case or block diagram. Histogram is not suitable for open ended classes.
Frequency polygon –a frequency polygon is a graph where midpoints of each interval are joined by using lines. The heights of the points represent the frequencies. It is usually done by creating a histogram or by calculating the midpoints of each interval from the frequency distribution table.
Frequency curve – a frequency curve is a smooth curve obtained by joining the midpoints of all rectangles forming histogram. It is drawn by using free hand. The curve should begin and end at the base line.
Ogive – An ogive graph shows cumulative frequency in statistics. It estimates the number of observations less than a given value or more than a given value. Cumulative frequency is obtained by adding to the given value
Less than ogive method - The frequencies of all preceding classes are added to the frequency of a class.
More than ogive class - The frequencies of the succeeding classes are added to the frequency of a class
Lorenz curve – It is the graphical representation of income and wealth. It was developed by Max O. Lorenz in 1905. The Lorenz curve shows how wealth, revenue, land, etc are not equally distributed among the people.
Key takeaways -A graph is a visual form of presentation of statistical data. Graph can be histogram, olives, frequency curve
Sources
I.B. N. Gupta : Business Math & Statistics
II. S. P. Singh : Statistics
III. Mukund Lal : Statistics
IV. K. N. Nayar : Statistics
V. C. B. Gupta : Statistics
VI. Shukla & Sahay : Statistical Analysis
VII. C. D. Gupta : Statistical Analysis
VIII. D. N. Elhana : Statistical Analysis