The Veil of Social Identity in Data Collection

O. Venkat Sai

November 2021

The Premise of Social Identity in Data Collection

As someone who manages impact assessment studies, there are various things you come across in your time in the field. This blog reflects on my experiences of primary data collection processes undertaken across different geographies in India. It tries to contemplate the risks that organizations undertaking monitoring and evaluation and impact assessment studies have to deal with in order to procure unbiased primary data. Non partisan data is necessary in the creation of effective policy measures for the marginalized populations and to tackle developmental issues effectively (it can be argued that non partisan data is necessary in all fields but this blog will deal with issues in the development space).

India is an ethnically diverse country, and data collection processes especially the Population Census has been some what of a spectacle. The Census enumerators often cover difficult terrains to gather data on somewhat meagre resources, however, the Census 2021 will also see technology enter the foray where enumerators will carry out surveys using their mobile devices. However, this blog will not get into that rather it will try to reflect upon enumerators itself, their capacities and their own motivations to conduct data collection.

One of the reasons why I realized that this was necessary to discuss was due to the fact that in one study undertaken by me, one of the things that came out was an inherent bias that came through when a field supervisor was talking about interviewing respondents from a particular religious group. He expressed some difficulty in collecting data in that location where that particular group was dominant. He further re-iterated it by emphasizing on some obvious stereotypes which brought out a significant bias of the individual towards that particular group. This person was replaced, however, it did leave a significant question in my mind and it was that how unbiased is the data that is generally collected especially from the question of social identity and what can be done to avoid it. 

Digging Deeper 

In India caste, class, religion and more recently gender (though gender has always been a part of the narrative of development issues but the understanding of gender is now fast moving towards an intersectional approach and no longer limited to just ciswomen rights) play a significant role in influencing social thought and this is very prevalent in spaces in the need of developmental interventions. Researchers often take these elements of social identity into consideration before drafting a project proposal. However, enumerators who are actually interacting on ground with the respondents may not necessarily be trained to be sensitive to identity issues or rather may not even consider the importance of that particular element. This would mean that data might be incorrect because the enumerator may not be interested in engaging with a particular group hence the data coming in might be biased. Negative pre-conceived notions about a specific community/ group of people may play a huge role in influencing the amount of time spent in a locality, amount of time spent in an interview, effective probing, leading a respondent to particular answers and finally can effect the end result of a study.

In the development space, the way field work is conducted can determine the outcome of the research and in any project designated space it is important to build a rapport with the community. In India, this is why it sometimes becomes difficult for researchers from an upper class space to enter a rural space which is significantly segregated on the basis of caste and class. Researchers hailing from the city often are seen as outsiders and authority figures, due to this an inherent hierarchical dynamic is created. For example, if a researcher enters a vastly poor rural space in a car, the villagers recognize this as a person of power, the data hence coming from such a space can be biased towards a particular social intervention, compromising the authenticity of the data and jeopardizing the intervention. 

Reasoning and Suggestions

Social identity biases are the reason external enumerators who are familiar with the geographies of project interventions are brought in. However, these enumerators may also belong to those groups who have had access to education and not necessarily in social science. Therefore, specific training on research methods to include sensitivity towards identity while undertaking data collection might not necessarily be a priority. It is interesting because training before the data collection process plays a huge role on the larger study, the enumerators are generally trained on the questionnaire, operationalizing the study, the use of any digital methods (if being used) and familiarizing the enumerators with the research process. What is often missed out is this small but extremely vital portion on sensitivity towards different groups who may or may not be a part of the study. Gender is often addressed to and projects often try to maintain a balanced sex ratio in the cohort of the enumerators. However, caste and religion may not always be maintained. This means that there needs to be a dedicated session on these aspects. Capacities of the enumerators need to be built on understanding and approaching respondents keeping in mind their identities. Furthermore, ownership towards the research process needs to be built in. The absolute importance of the work being undertaken by these enumerators needs to be highlighted. And there needs to be a collective ownership among cohorts of enumerators that needs to be built. 


This is an opinion piece, which tries to merely broach a subject rife with multi layered complexities. All this blog tries to address is that there is a need to look at how capacities and ownership of the field enumerators and supervisors can be built in order to reduce bias in the data while interacting with different social groups. India has a thriving social sector however, the data collection processes have been called out due to unavailability or lack of authenticity. This blog piece perhaps just talks about the tip of the iceberg and but will help us to further build upon this topic over future posts. 

Development Solutions has looked into this and has always maintained the need to include different social identities into the foray while undertaking training. However, the team also believes that there is a massive scope of improvement in this aspect and it needs to be addressed at the earliest. A holistic curriculum detailing different social research methods should be prioritized while conducting training with the field teams. Moreover, participatory research methods should be actively explored by research experts to co-create and further innovate the existing methodologies, this should be done alongside the field enumerators who need to be a part of a more extensive training regime than the one they are already a part of. In order to drive policy changes in the development space, organizations depend on non partisan data coming from the grassroots hence it becomes vital to nurture the capacities and ownership of the enumerators around issues that may impede in procuring unbiased data hindering the effectiveness of the intervention delivery.

Venkat Sai is a Senior Project Coordinator at Development Solutions. He has worked in implementation and research spaces and has experience in qualitative research methods, capacity building and stakeholder management.