2015 Cyberelections: combining ethnography with big data analysis

koodiscreenshot3
Snippet of analysis code.

2015 Cyberlelections (Digivaalit 2015) project, a joint collaboration with Helsinki University CRC and Aalto University HIIT, started officially in January 2015. With a multidisciplinary team of social scientists and computer scientists, our overall purpose was to study the ways how agenda is built in the online public sphere during the Finnish Parliamentary elections 2015. Thus we studied the ways how actors online can influence the agenda of both social media and traditional media, i.e., the ways of influencing online.

For that purpose, we collected a big set of data consisting all candidate updates from different social media services (Twitter, Facebook, Instagram) as well as traditional media content from 19 different news media, in practice trying to extract everything that happened online during the elections. In total our full data set ranges to approximately 1,5 million messages.

A dataset of that size means that traditional or qualitative methods are not enough, because it’s an amount of data no person can analyze by hand. Therefore, we turned to computational social sciences approaches, i.e. using computers and written algorithms to analyze our data. Such methodological approach has been recently entitled as computational social science.

Computational social science is an approach that utilizes computational methods and algorithms in different stages of the research process from data collection to data preprocessing and data analysis. In practice this means all tools used in research are written for a unique study purposes since no ready made tools are available.

This is an approach with several advantages but also disadvantages. For instance, extracting the data as pure textual format gathered through the APIs we cannot fully understand the context where the data is born. As Lisa Gitelman puts it, raw data is an oxymoron, and has no value as such. Second, we easily end up in a situation where we have to blindly trust the results what our algorithms give us, and they are often quantified. What do these numbers mean? Third, there are choices that need to be made both during the data collection as well as during the analysis phase.

These choices and interpreting the results require contextual and theoretical knowledge. In the field of social physics, social phenomena have been studied using computational methods by computer scientist and physicists who do have the methodological knowledge but not the theoretical or contextual knowledge – and often no interest in that either.

Ethnography to the rescue!

In our project we aimed to tackle these limitations by combining computational social science with ethnography. Ethnography is a research approach that aims to create understanding and make sense of human life and social communities and practices within those communities. It is commonly conducted in the natural environments of human action. Ethnography is often characterized with a period of field work, a period of time when the researcher intensively immerses with the people and the culture she is studying, observing the practices and participating in the activities, writing field notes.

When ethnography moves online, it can be generally called online ethnography. There are several sub-approaches of online ethnography such webnography that focuses on web sites, network ethnography focusing on actor networks, netnography that focuses on communities, media ethnography where the researcher participates as a media user, trace ethnography, where log data of online platforms is used to trace user behavior patterns.

All these methods raise questions, (see Wittel 2000). For example, what counts as participatory observation online? When is a researcher participating, how do the research subjects know they are being studied? How can the researcher actually participate in the field, when necessarily a part of the physical context where the action takes place remains unseen and unreachable? How to understand the human actions taking place behind the online, in the so called real life?

Markham (2013) takes a reconciling approach to these questions and suggests that we just need to conceptualize the field in a bit different manner, not as a place, but as a flow or a process, and accept that as the forms of participation differ online, also forms of participatory observation can differ. Following her suggestions we proceeded with an ethnographic field work online for one full month before the election date, by three researchers. One focused on the left wing parties, one to right wing parties, and third to overall election-related communications cross-platform. Here the focus was rather wide, looking at the forming of the online agenda around the election, candidate communication styles as well as interaction with other actors. Field notes we written and screenshots and links saved on a daily basis.

Solving the burning questions of big data and social sciences

Based on our experiences, we propose a methodological approach of Data Augmented Ethnography, which overcomes many of the limitations of both methods. First, what comes to the context, we posit that using ethnography with computation social sciences enhances contextual framing. In the analysis phase, it is much easier to interpret the results when we can compare them to the field notes made during the field work. Further, the field notes can help us in crafting the algorithms to ask right questions in the first place. Ethnography also helps us already during the data collection phase so that we can be sure to include all the data we are interested in and for instance, modify our search queries on the go.

And why not use only ethnography? Because using computational methods allows us to use larger data sets and study the phenomena in larger extent than only snapshots of the case. Also, they allow for validation and generalization of our findings and observations.

Hence, we suggest supplementing ethnographic field work with computationally collected data, and simultaneously use the observations to modify the data collection. In the analysis phase we suggest both data sets to be used in parallel to complement the observations made in each of them. Further, in the best case, we suggest qualitative analysis on selected parts of the data would be conducted to go deeper with the observations. For instance in our research project, Mari Tuokko’s master’s thesis is an example of such approach.

There is no full data

As a final reminder it needs to be noted that data and observations always remain incomplete. The data that is visible for an observing researcher is always limited. Similarly any collected data sets, collected handles, or hashtags always remain incomplete – none of the application programming interfaces of Twitter for instance give the “full” data. Some parts of the interaction takes place on private arenas or outside the online. Therefore, in essence, the idea of having a full data is an oxymoron. With a mixed methods perspective such as data augmented ethnography, however, we can gain a more nuanced understanding of the social action that takes place online.

More about the project:

One thought on “2015 Cyberelections: combining ethnography with big data analysis

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s