UNIT 1
Introduction to statistics
The word statistics is derived from the Latin word “Status” that means a group of numbers that represent some information of our human interest. In ancient periods, the use of statistics was made to meet the administrative needs of the state. In modern time, the statistics is not only used for administrative of the state alone, also evaluate all those activities in our lives which can be expressed in quantitative terms.
The term “statistics” is defined in two senses: - in singular and in Plural senses.
Firstly in plural sense, statistics means systematic collection of numerical facts. Secondly in singular sense, the term statistics means the various methods used for collection, analysis and interpretation of numerical facts. It is described as statistical method. In our study we are more concerned with the second meaning of statistics.
Definition:
“Statistics is a body of methods for making wise decisions on the face of uncertainty.” —Wallis and Roberts
“Statistics is a body of methods for obtaining and analyzing numerical data in order to make better decisions in an uncertain world.” —Edward N. Dubois
Statistics are numerical statement of facts in any department of enquiry placed interrelation to each other.- Bouly.
The science of Statistics is essentially a branch of applied mathematics and can be regarded as a mathematics applied to observation data.- R.A fisher.
After analyzing the various definitions of statistics, the most proper definition of statistics are as follows
“Statistics in the plural sense are numerical statements of facts capable of some meaningful analysis and interpretation, and in singular sense, it relates to the collection, classification, presentation and interpretation of numerical data.”
Scope of statistics
Functions of statistics
2. Presentation of facts – Statistics helps in presenting the complex data in a simple form, so that it becomes easy to understand. Statistical methods present data in the form of graph, diagram, average, coefficient, etc.
3. Comparison – After simplifying the data, it can be correlated and compared. Comparing data relating to fact is one of the functions of statistics as absolute figures convey less meaning.
4. It helps other science- Many laws of economic, law of demand, law of supply have been verified with the help of statistics.
5. Forecasting – Statistics also predicts future course of action. On the basis of estimates with the help of statistics we can make future policies.
6. Policy making – Statistics helps in formulating favorable policies. Based on the forecast the government makes policies.
Uses and importance of statistics
2. Importance for businessman – statistics helps in providing relevant data. Thus with the help of those data a business man can estimate demand and supply of the commodity.
3. Importance in economics – statistics helps in measuring economics such as gross national output, consumption, saving, investment, expenditures, etc
4. Importance for politician – Politician use statistics in formulating economic, social and educational policies of the country
5. Importance in the field of education – statistics has wide application in education for determining the reliability and viability to a test, factor analysis, etc.
Limitation of statistics
2. Study of aggregates only – statistics study only aggregates of quantitative facts. It does not study any particular unit. Prof. Horace Sacrist defined statistics, “By statistics we mean aggregates of facts…. and placed in relation to each other”
3. It does not depict the entire story of phenomena – Any phenomena happen, due to many causes. But all the cause is not expressed in numbers. So, correct conclusion cannot be drawn. Analyzing quantitative data and ignoring qualitative data cannot give 100% conclusion.
4. Homogeneity of data – To compare the data, it is essential that whatever statistics are collected, the same must be uniform in quality.
5. It is liable to be miscued – As W.I. King points out, “One of the short-comings of statistics is that do not bear on their face the label of their quality.” Thus the data collected by inexperienced person may be dishonest or biased. So to get correct conclusion data must be used in caution.
6. Too many methods to study a problem –to find a single result many statistical methods are used. All the methods result vary in each case. “It must not be assumed that the statistics is the only method to use in research, neither should this method of considered the best attack for the problem.” —Croxten and Cowden.
7. Managerial application:
The field of statistics has numerous applications in business. Because of technological advancements, large amounts of data are generated by business these days. These data are now being used to make decisions. These better decisions we make help us improve the running of a department, a company , or the entire economy.
Marketing
Marketing is all about creating and growing customers profitably. Statistics is used in almost every aspect of creating and growing customers profitably. Statistics is extensively used in making decisions regarding how to sell products to customers. Also, intelligent use of statistics helps managers to design marketing campaigns targeted at the potential customers. Marketing research is the systematic and objective gathering, recording and analysis of data about aspects related to marketing.
Finance:
Uncertainty is the hallmark of the financial world. All financial decisions are based on “Expectation” that is best analysed with the help of the theory of probability and statistical techniques. Probability and statistics are used extensively in designing of new insurance policies and in fixing of premiums for insurance policies.
Economics:
Statistical data and methods render valuable assistance in the proper understanding of the economic problem and the formulation of economic policies. Most economic phenomena and indicators can be quantified and dealt with statistically sound logic
Operations:
The field of operations is about transforming various resources into product and services in the place, quantity, cost, quality and time as required by the customers. Statistics plays a very useful role at the input stage through sampling inspection and inventory management, in the process stage through statistical quality control and six sigma method, and in the output stage through sampling inspection.
Human Resource Management or Development:
Human Resource departments are inter alia entrusted with the responsibility of evaluating the performance, developing rating systems, evolving compensatory reward and training system, etc. All these functions involve designing forms, collecting, storing, retrieval and analysis of a mass of data. All these functions can be performed efficiently and effectively with the help of statistics.
Information Systems:
Information Technology (IT) and statistics both have similar systematic approach in problem solving. IT uses Statistics in various areas like, optimisation of server time, assessing performance of a program by finding time taken as well as resources used by the Program. It is also used in testing of the software.
Statistical investigation – planning and organization:
By the term investigation (or enquiry) we mean the search for information or knowledge. Statistical investigation, thus, implies search for knowledge with the help of statistical devices like collection, analysis and interpretation, etc. According to Griffin, “statistical enquiries have always required considerable skill on the part of the statistician, rooted in a broad knowledge of the subject matter area and combined with considerable ingenuity in overcoming practical difficulties”.
For example, if an investigation is made into accounts of a college hostel, then the investigation will mainly cover:
1. Income from residents as seat rent, meal charge, any grant from college.
2. Expenditure as hostel rent, overhead charges.
3. Expenditure on dry rations, broad meals, including special meals.
4. Expenditure for annual functions.
ORGANISATION OF STATISTICAL INVESTIGATION
Statistical investigation is a long and comprehensive process. It extends over various stages from planning to the final preparation of the report. The various stages are -
1. Planning of statistical investigation.
2. Collection of data.
3. Editing of data.
4. Presentation of data.
5. Analysis of data
6. Interpretation of data.
7. Preparation of the report.
PLANNING OF STATISTICAL INVESTIGATION-
A proper system is essential for conducting a statistical investigation. Planning must precede the execution. Careful planning is essential to get the best results at the minimum cost and time. It is essential to consider the following points while planning a statistical investigation.
1. Objective of the enquiry should be fully known.
2. Scope of the enquiry should be determined.
3. Nature of information to be collected should be decided.
4. Unit of data collection should be defined.
5. Source of data collection or type of data to be used, that is, primary or secondary should be decided.
6. Method of data collection, that is, census or sampling method, should be decided beforehand.
7. Choice of frame should be made.
8. Reasonable standard of accuracy should be fixed.
Statistical units
A statistical unit is a unit of observation or measurement for which data are collected or derived. The statistical unit is therefore the basic element for compiling and tabulating statistical data.
Methods of investigation - Census and Sampling
Both census and sampling provide information about a population. In census, each and every unit of population is studies. While in sampling small units are studied which represents the population. Government uses both census and sampling data for various purposes like planning, development programs, etc.
Census method
A well-organized procedure of gathering, recording and analyzing information regarding the members of the population is called a census. Under method census each and every unit of the universe is included in the collection of data. Huge amount of finance, time and labour are required for gathering information. This method is useful to find out the ratio of male to female, the ratio of literate to illiterate people, the ratio of people living in urban areas to the people in rural areas.
Merits
a) It helps government with future plans.
b) It gives complete information about population.
c) It gives more reliable and accurate information.
d) It covers wide range of the study.
Demerits
a) It is time consuming and expensive.
b) Sometimes we many loose information while investing all individual.
c) It need a number of manpower.
Sampling method
The sample is a small segment considered for study which represents the standard of entire population. The selection of sample should give justifiable conclusion about the whole population. When the population size is very large and it is difficult to consider all members then sampling method is used. Under this method selection of appropriate representative sample is utmost important. On the basis of data collected from sample, conclusion is drawn for the whole population.
Types of sampling method
- It is also called as random sampling.
- Random sampling is one of the simplest sampling technique in which each sample have an equal chance of being chosen from the population.
- It is an unbiased representation of the population.
Types of random sampling
2. Stratified random sampling – It is also known as proportional random sampling. In this sampling technique, the populations are split into different groups. The overall sample selected randomly from different groups. This techniques guarantee that each group will be represented in sample.
3. Systematic random sampling – systematic random sampling refers to selecting sample on a system of interval in a numbered population.
4. Cluster random sampling – under cluster sampling, the researcher divide the population into separate groups known as cluster. Here each cluster represents the population as a whole. The researcher randomly selects the cluster for his analysis.
b. Non random sampling –
Types of non random sampling
2. Consecutive sampling – It is similar like convenience sampling. Under this technique all subjects that are available as a part of sample are included which result a better presentation of the entire population
3. Quota sampling – Under quota sampling, the samples are selected on the same proportions of individuals as the entire population depending on characteristics, traits as the basis of quota.
4. Judgmental sampling - Judgmental sampling is more known as purposive sampling. The researcher keeps a specific purpose in mind and selects the subject for sampling process. The researcher believes that some subjects are fit for the research compared to other individuals.
5. Snowball sampling – this technique is used when the population size is small. Under snowball sampling, the researcher ask initial subject to identify another potential subject who meets the criteria of research. Thus, this technique hardly represents the population.
Key takeaways –
Definition
Data collection is defined as the procedure of collecting, measuring and analyzing accurate insights for research using standard validated techniques.
Irrespective of the field of research, data collection is the primary and most important step for research. Depending on the required information, the approach of data collection is different for different fields of study.
The objective of data collection is ensuring that rich information and reliable data is collected for statistical analysis so that data-driven decisions can be made for research.
Data collection method
Data collection methods can be divided into two categories: secondary methods of data collection and primary methods of data collection.
Secondary data - Secondary data is a type of data that has already been published in books, newspapers, magazines, journals, online portals etc. there is lot of information available in these sources. Therefore appropriate secondary data are used in the study plays an important role in terms of increasing the levels of research validity and reliability.
Primary data –
Primary data collection methods can be divided into two groups: quantitative and qualitative.
Quantitative data collection methods are based in mathematical calculations in various formats. Methods of quantitative data collection and analysis include questionnaires with closed-ended questions, methods of correlation and regression, mean, mode and median and others. Quantitative methods are less expensive and they can be applied within shorter duration of time. These methods are easy to make comparison between the findings.
Qualitative research methods, on the other hand, do not involve numbers or mathematical calculations. Qualitative research is closely associated with words, sounds, feeling, emotions, colours and other elements that are non-quantifiable.
Primary and secondary data are discussed more in detail in the below section.
Primary data is the information collected through original or first-hand research. Primary data is more reliable and authenticate as the data is nor changed or altered by any human beings. Also, the data is not published yet. Primary data is gathered by any authorized organization, investigator, and enumerator.
“Data which are gathered originally for a certain purpose are known as primary data.” — Horace Secrist
Sources of primary data
The sources of primary data are as follows –
1. Experiments: In natural sciences, experiments are most reliable source of data collections. Experiments are conducted for medicine, psychological studies, nutrition and other scientific studies. Experiments are conducted in the fields as well as laboratories. The results of experiments are analysed by statistical test and thereafter conclusions are drawn.
2. Survey: surveys are used in social science, management, marketing and psychology to some extent. Surveys are conducted in different methods.
3. Questionnaire: Questionnaires consist of list of question either open ended or close ended for which the participants answer. Questionnaire can be conducted via telephone, mail, institute, fax, etc.
4. Interview: Interviews are expensive method of data collection. The interviewer collects information from each respondent independently. It involves in-depth questioning and follow up question. While taking interview, the interviewer can observe the body language and other reaction to the question.
5. Observation: observation can be conducted with or without knowledge of the participants. Observation can be made either natural or artificially created environment.
Secondary Data:
Secondary data are public information that has been collected by others. The data collected from primary research and used by other is referred as secondary data. The secondary data may be obtained from various sources like industry surveys, database and information system, etc.
“The data which are used in an investigation, but which have been gathered originally by someone else for some other purpose are known as secondary data.” — Blair
Sources of secondary data
ii.Books – books are available on any topic you want to research. Books provide insight on how much information is given for a particular topic and you can prepare your literature review.
iii.Journals – journals provide up to date information on the very specific topic on which you want to research. Journal is one of the most important sources for providing the information on data collected.
iv.Magazine or newspaper – Newspaper or magazine provide daily information regarding politics, business, sports, fashion, etc which can be used for conducting research.
v.Internet – internet is becoming advance, fast and reachable to the masses and much information is available on internet. Almost all journals, books are available on internet. Some are free and others you have to pay price
vi.Company website – company’s website provides lots of information. They have a section called investor relations which contains full of annual reports, regulatory findings and investor presentations that can provide insights into both the individual company’s performance and that of the industry at large.
Key takeaways – Data can be collected through primary and secondary data.
Editing of data – classification of data
The collected data, also known as raw data or ungrouped data are always in an un organised form and need to be organised and presented in meaningful and readily comprehensible form in order to facilitate further statistical analysis. It is, therefore, essential for an investigator to condense a mass of data into more and more comprehensible and assimilable form. The process of grouping into different classes or sub classes according to some characteristics is known as classification, tabulation is concerned with the systematic arrangement and presentation of classified data. Thus classification is the first step in tabulation.
For Example, letters in the post office are classified according to their destinations viz., Delhi, Madurai, Bangalore, Mumbai etc.,
Types of classification:
Statistical data are classified in respect of their characteristics. Broadly there are four basic types of classification namely
a) Chronological classification:
In chronological classification the collected data are arranged according to the order of time expressed in years, months, weeks, etc.,
The data is generally classified in ascending order of
Eg -
b) Geographical classification:
In this type of classification the data are classified according to geographical region or place. For instance, the production of paddy in different states in Iraq, production of wheat in different countries etc
c) Qualitative classification:
In this type of classification data are classified on the basis of same attributes or quality like sex, literacy, religion, employment etc., Such attributes cannot be measured along with a scale. For example, if the population to be classified in respect to one attribute, say sex, then we can classify them into two namely that of males and females. Similarly, they can also be classified into ‘married or ‘ single’ on the basis of another attribute ‘marital status’. Thus when the classification is done with respect to one attribute, which is dichotomous in nature, two classes are formed, one possessing the attribute and the other not possessing the attribute. This type of classification is called simple or dichotomous classification.
d) Quantitative classification:
Quantitative classification refers to the classification of data according to some characteristics that can be measured such as height, weight, etc., For example the group of a children may be classified according to weight as given below.
In this type of classification there are two elements, namely (i) the variable (i.e) the weight in the above example, and (ii) the frequency in the number of children. There are 50 children having weights ranging from 5 to 10 kg, 200 children. having weight ranging between 10 to 15 kg and so on.
Tabulation is a systematic & logical presentation of numeric data in rows and columns, to facilitate comparison and statistical analysis. The method of placing organized data in tabular form is known as tabulation. Tabulation simplifies complex data and facilitates comparison.
Definition
“Table involves the orderly and systematic presentation of numerical data in a form designed to elucidate the problem under consideration” – According prof. L.R Connor
“Table in its broadest sense is an orderly arrangement of data in column and rows” – According to Prof M.M.Blaire
Objectives of tabulation
a) It simplifies the raw data in meaningful form so that common man can easily understand in less time.
b) It brings essential facts in clear and precise manner.
c) Data presented in rows and columns helps in detailed comparison.
d) Tables serve as the best source of organized data for further statistical analysis.
e) Table saves the space without sacrificing the quality and quantity of data.
Parts of table
Table number |
|
Title of the table |
|
Caption |
|
Stub |
|
Body |
|
Head note |
|
Source note |
|
Footnote |
|
Frequency distribution and statistical series
A frequency distribution is a tabular arrangement of data whereby the data is grouped into different intervals, and then the number of observations that belong to each interval is determined. Data that is presented in this manner are known as grouped data.
The smallest value that can belong to a given interval is called the lower class limit, while the largest value that can belong to the interval is called the upper class limit. The difference between the upper class limit and the lower class limit is defined to be the class width. When designing the intervals to be used in a frequency distribution, it is preferable that the class widths of all intervals be the same.
The relative frequency distribution and percentage frequency distribution are variants of the frequency distribution. The relative frequency distribution is similar to the frequency distribution, except that instead of the number of observations belonging to a particular interval, the ratio of the number of observations in the interval to the total number of observations, also known as the relative frequency, is determined. The percentage frequency distribution is arrived at by multiplying the relative frequencies of each interval by 100%.
The cumulative frequency distribution is obtained by computing the cumulative frequency, defined as the total frequency of all values less than the upper class limit of a particular interval, for all intervals. From a frequency distribution, this can be done by simply adding together the frequencies of the interval and all other preceding intervals (i.e., intervals whose values are less than the values of a particular interval). We can also calculate the relative cumulative frequency distribution and the percentage cumulative frequency distribution from the cumulative frequency distribution.
Statistical series
Individual Series These are those series in which the items are listed singly. These series may be presented in two ways
a) According to serial numbers
b) Ascending or descending order of data
• Frequency Series Frequency series may be of two types
a) Discrete Series or Frequency Array It is that series in which data are presented in way that exact measurement of items are clearly shown. In this series there are no class intervals and a particular item in the series.
b) frequency Distribution It is that series in which items cannot be exactly measured. The items assume a range of values and are placed within the limits is called class interval.
Frequency distribution is also known as continuous series or series with class-intervals, or series of grouped data.
Key takeaways –
Reference-