Collection of Data in Statistics
The collection and analysis of data constitute the main stages of execution of any statistical investigation. The procedure for collection of data depends upon various considerations such as objective, scope, nature of investigation, etc.
Availability of resources like money, time, manpower, etc., also affects the choice of a procedure. Data may be collected either from a primary or from a secondary source. They are described below.
Table of Contents
- 1 Collection of Data in Statistics
- 1.1 Types of Data – Primary and Secondary
- 1.2 Methods of Collecting Primary Data
- 1.3 Merits and Demerits of Collecting Primary Data
- 1.4 Methods of Collecting Secondary Data
- 1.5 Designing Questionnaire
- 1.5.1 Covering Letter
- 1.5.2 Number of Questions Should Be Kept to the Minimum
- 1.5.3 Questions Should Be Simple, Short, and Unambiguous
- 1.5.4 Type of Questions
- 1.5.5 Questions of Sensitive or Personal Nature Should Be Avoided
- 1.5.6 Answers to Questions Should Not Require Calculations
- 1.5.7 Logical Arrangement
- 1.5.8 Crosscheck and Footnotes
- 1.5.9 Pre-test the Questionnaire
- 2 Editing and Coding of Data
- Types of Data – Primary and Secondary
- Methods of Collecting Primary Data
- Merits and Demerits of Collecting Primary Data
- Methods of Collecting Secondary Data
- Designing Questionnaire
Types of Data – Primary and Secondary
Data used in statistical study is termed either ‘primary’ or ‘secondary’ depending upon whether it was collected specifically for the study undertaken or for some other purposes.
When the data used in a statistical study was collected under the control and supervision of the investigator, such type of data is referred to as ‘primary data’. Primary data are collected afresh and for the first time, and thus, happen to be original in character.
On the other hand, when the data is not collected for this purpose, but is derived from other sources then such data is referred to as ‘secondary data’. Generally speaking, Secondary Data are collected by some other organization to satisfy their need but being used by someone else for entirely different reasons.
The difference between primary and secondary data is only in terms of degree. For example, data, which are primary in the hands of one, becomes secondary in the hands of another. Suppose an investigator wants to study the working conditions of labourers in an industry.
If the investigator or his agent collects the data directly, then it is called a ‘primary data’. But if subsequently someone else uses this collected data for some other purpose, then this data becomes a ‘secondary data’.
Methods of Collecting Primary Data
Generally, for managerial decision-making, it is necessary to analyze information regarding a large number of characteristics.
Collection of primary data may thus be time consuming, expensive, and hence requires a great deal of deliberation. According to the nature of information required, one of the following methods or their combination could be selected.
In this method investigator collects the data through his/her personal observations. This method is very useful if data is created in the system through capturing transactions. Computerized transaction processing could be modified to generate necessary data or information.
An investigator well versed with the system or a part of the system is ideally suited for collecting this kind of data. Since the investigator is solely involved in collecting the data, his/her training, skill, and knowledge plays an important role as far as the quality of the data is concerned.
Sometimes, audio/video aids could also be used to record the observations.
In this case, data is collected from a person, who is likely to have information about the problem under study. The information collected by oral or written interrogation forms a primary data. Usually enquiry commissions, board of investigations, investigation teams and committees collect data in this manner.
Quality of the data largely depends upon the person interviewed, his/her motives, memory and cooperation, and interviewer’s repute and rapport with the person being interviewed. We should be careful while collecting data by this method.
Questionnaire With Personal Interview
This is by far the most common and popular method. In this method, individuals are personally interviewed and answers recorded to collect the data. Questionnaire is structured and followed in specific sequence.
Occasionally, a part of the questionnaire may be unstructured to motivate the interviewee to give additional information or information on intimate matters. Accuracy of the data depends on the ability, sincerity and tactfulness of the interviewer to conduct the interview in friendly and professional environment.
In this case structured questionnaire is mailed to selected persons with request to fill them and return. Supplementary information clarifying terms, explaining process, etc., is also attached with the questions. In a few cases, inducements for filling and returning the questionnaire are also given.
Covering letter with a questionnaire is necessary for developing rapport, explaining the reason for collecting the data, and alleviating fears of the respondent if any. It is assumed that the respondents are literate and can answer the questions without any ambiguity.
This is a less expensive and faster method to collect large volume of data, over a wide geographic area, in standard form, and at the convenience of the respondent. This method is, therefore, most popular and extensively used.
However, we must guard against two disadvantages of this method viz. absence of interviewer, resulting in large proportion of non-response and possibility of lowering of the reliability of the responses if the respondent is not motivated enough.
These shortcomings could be overcome by increasing sample size and comprehensive design of questionnaire.
This method is less expensive but limited in scope as the respondent must possess a telephone and has it listed. Further, the respondent must be available and in the frame of mind to provide correct answers. This method is comparatively less reliable for public surveys.
However, for industrial survey, in developed regions, and with known customers, this method could be the best suited. Obviously, in this method there is a limit to the number of questions that the interviewee could answer in three to four minutes.
If there are just three to five yes/no type questions and two to three short questions, this method is very efficient.
Internet Surveys Of late, Internet surveys have become popular. These are less expensive, fast and could be interactive. However, its scope is limited to those who have regular Internet access.
With rapid growth in personal computers and Internet connectivity it would be one of the main methods of collecting primary data. With its interactivity and multimedia facilities it combines the advantages of other methods.
Merits and Demerits of Collecting Primary Data
Type of research, its purpose, conditions under which the data are obtained will determine the method of collecting the data. If relatively few items of information are required quickly, and funds are limited telephonic interviews are recommended.
If respondents are industrial clients Internet could also be used. If depth interviews and probing techniques are to be used, it is necessary to employ investigators to collect data. Thus, each method has its utility and none is superior in all situations.
We could combine two methods to improve the quality of data collected. For example, when a wide geographical area is being covered, the mail questionnaires supplemented by personnel interviews will yield more reliable results.
Merits and Demerits of Observation Method
- Original data are collected.
- Collected data are more accurate and reliable.
- The investigator can modify or put indirect questions in order to extract satisfactory information.
- The collected data are often homogeneous and comparable.
- Some additional information may also get collected, along with the regular information, which may prove to be helpful in future investigations.
- Misinterpretations or misgivings, if any, on the part of the respondents can be avoided by the investigators.
- Since the information is collected from the persons who are well aware of the situation, it is likely to be unbiased and reliable.
- This method is particularly suitable for the collection of confidential information. For example, a person may not like to reveal his habit of drinking, smoking, gambling, etc., which may be revealed by others.
- This method is expensive and time consuming, particularly when the field of investigation is large.
- It is not possible to properly train a large team of investigators.
- The bias or prejudice of investigators can affect the accuracy of data to a large extent.
- Data are collected as per the convenience and willingness of the respondents.
- The persons, providing the information, may be prejudiced or biased.
- Since the interest of the person, providing the information is not at stake, the collected information is often vague and unreliable.
- The information collected from different persons may not be homogeneous and comparable.
Merits and Demerits of the Questionnaire Method
- This method is useful for the collection of information from an extensive area of investigation.
- This method is economical as it requires less time, money, and labor.
- The collected information is original and reliable.
- It is free from the bias of the investigator.
- Very often, there is the problem of ‘non-response’ as the respondents are not willing to provide answers to certain questions.
- The respondents may provide wrong information if the questions are not properly understood.
- It is not possible to collect information if the respondents are not educated.
- Since it is not possible to ask supplementary questions, the method is not flexible.
- The results of an investigation are likely to be misleading if the attitude of the respondents is biased.
- The process is time-consuming, particularly when the information is to be obtained by post.
Methods of Collecting Secondary Data
Secondary data is one that has been collected/analyzed by some other agency for another purpose.
Sources of secondary data could be:
- Various publications of central, state, and local governments. This is an important and reliable source to get unbiased data.
- Various publications of foreign governments or of international bodies. Although it is a good source, the context under which it is collected needs to be verified before using this data. For international situations, this data could be very useful and authentic.
- Journals of trade, commerce, economics, science, engineering, medicine, etc. This data could be very reliable for a specific purpose.
- Other published sources like books, magazines, newspapers, reports, etc.
- Unpublished data, based on internal records and documents of an organization could provide the most authentic and much cheaper information provided we could identify the source. Diaries, letters, etc could also provide secondary data. The problem with the unpublished data is that it’s difficult to locate and get access.
The success of collecting data through a questionnaire depends mainly on how skilfully and imaginatively the questionnaire has been designed. A badly designed questionnaire will never be able to gather the relevant data. In designing the questionnaire, some of the important points to be kept in mind are:
Every questionnaire should contain a cover letter. The cover letter should highlight the purpose of the study and assure the respondent that all responses will be kept confidential. It is desirable that some inducement or motivation is provided to the respondent for a better response.
The objectives of the study and questionnaire design should be such that the respondent derives a sense of satisfaction through his involvement.
Number of Questions Should Be Kept to the Minimum
The fewer the questions, the greater the chances of getting a better response and of having all the questions answered. Otherwise, the respondent may feel disinterested and provide inaccurate answers, particularly toward the end of the questionnaire.
As a rough indication, the number of questions should be between 10 to 20. If the number of questions has to be more than 25, it is desirable that the questionnaire be divided into various parts to ensure clarity.
Questions Should Be Simple, Short, and Unambiguous
The questions should be simple, short, and easy to understand and such that their answers are unambiguous. For example, if the question is, “Are you literate?” the respondent may have doubts about the meaning of literacy.
To some, literacy may mean a university degree whereas to others even the capacity to read and write may mean literacy. Hence, it is desirable to specify “Have passed (a) high school (b) graduation (c) post-graduation”.
Type of Questions
Questions can be of Yes/No type, or of multiple choice depending on the requirement of the investigator. Open-ended questions should generally be avoided.
Questions of Sensitive or Personal Nature Should Be Avoided
The questions should not require the respondent to disclose any private, personal, or confidential information. For example, questions relating to sales, profits, marital happiness, tax liability, etc., should be avoided as far as possible.
If such questions are necessary for the survey, assurance should be given to the respondent that the information provided shall be kept strictly confidential and shall not be used at any cost to the respondent’s disadvantage.
Answers to Questions Should Not Require Calculations
The questions should be framed in such a way that their answers do not require any calculations.
The questions should be logically arranged so that there is a continuity of responses and the respondent does not feel the need to refer back to the previous questions.
It is desirable that the questionnaire should begin with some introductory questions followed by vital questions crucial to the survey and end with some light questions so that the overall impression of the respondent is a happy one.
Crosscheck and Footnotes
The questionnaire should contain some such questions, which act as a crosscheck to the reliability of the information provided. For example, when a question relating to income is asked, it is desirable to include the question: “Are you an income taxpayer?”
Certain questions might create doubt in the mind of respondents. For the purpose of clarity, it is desirable to give footnotes. The purpose of footnotes is to clarify all possible doubts, which may emerge from the questions and cannot be removed while framing them.
For example, if a question relates to income limits like 1000-2000, 2000-3000, etc., a person getting exactly ₹2,000 should know in which income class he has to place himself.
Pre-test the Questionnaire
Once the questionnaire has been designed, it is important to pre-test it. The pre-testing is also known as a pilot survey because it precedes the main survey work. Pretesting allows the rectification of problems, inconsistencies, repetition, etc. Proper testing, revisiting, and re-testing, yield high dividends.
Editing and Coding of Data
Between the two stages of collection of data and analysis of data, there is always an intermediate stage, known as the editing of data.
Editing Primary Data
Once the questionnaires have been filled and the data collected, it is necessary to edit this data to ensure completeness, consistency, accuracy, and homogeneity.
Each questionnaire should be complete in all respects, i.e. the respondent should have answered each and every question. If some important questions have been left unanswered, attempts should be made to contact the respondent and get a response.
If despite all efforts, answers to vital questions are not given, such questionnaires should be dropped from the final analysis.
Questionnaire should be checked to see that there are no contradictory answers. Contradictory responses may arise due to wrong answers filled up by the respondent or because of carelessness on the part of the investigator in recording the data.
The questionnaire should be checked for the accuracy of information provided by the respondent. This is the most difficult job of the investigator and at the same time the most important one. If inaccuracies were permitted, this would lead to misleading results. Inaccuracies may be randomly cross-checked by the supervisor.
It is important to check whether all the respondents have understood the questions in the same sense. For instance, if there is a question on income, it should be very clearly stated whether it refers to the weekly, monthly, or yearly income and checked that the respondents have answered in the same way.
Editing Secondary Data
The editing of the data is a process of examining the raw data to detect errors and omissions and to correct them, if possible, so as to ensure completeness, consistency, accuracy, and homogeneity. Editing can be done at two stages:
Field editing consists of reviewing the interviewer’s report for completeness and translating what the interviewer has written in abbreviated form at the time of interviewing the respondent.
This sort of editing should be done as soon as possible after the interview, as memory recall diminishes with time. Care should be taken that the interviewer does not complete the information by simply guessing.
When all forms are filled up completely and returned to headquarters, central editing is carried out. The editor may correct the obvious errors. If necessary, the respondent may be contacted for clarification. All the incorrect replies, which are obvious, must be deleted.
Coding of Data
The classes should be appropriate to the research problem being studied. They must be exhaustive and must be mutually exclusive so that the answer can be placed in one and only one cell in a given category. Further, every class must be defined in terms of only one concept.
Coding is necessary for the efficient analysis of data. The coding decisions should usually be taken at the designing stage of the questionnaire so that the likely responses to questions are pre-coded. This simplifies the computer tabulation of the data for further analysis.