‘Digitalization’ at Sociology Days: does ‘it’ exist and should we study ‘it’?

cropped-freepicto2-2

As part of the Finnish Sociology Days 2017, Rajapinta members Tuukka and Veikko organized a workshop on ‘Digitalization of Societies and Methods’. We wanted to discuss both ‘digitalization’ in terms of societal change and the ‘digitalization of methods’, that is, new digital and computational methodologies and (‘big’) datasets and their possibilities. We wanted to recognize that these are two different viewpoints largely driven by the same societal developments (‘digitalization’).

We believe that all social research must include the digital, but at the same time we must study the specificities of digital life: in what ways does the digital affect the social? How does the digitalization of everyday life, consumption and work affect our ways of life? On the other hand, Big Data and Computational Social Science are shaping social research, but are largely discussed by non-sociologists. With great data and method opportunities come some problems as well: how to get data scientists and social scientists to discuss with and understand each other, or should we rather teach digital methods to sociologists and sociological thinking to data scientists?

Our participants provided some preliminary answers to these questions. Many papers touched on the topic of whether ‘digitalization’ or ‘the digital’ is something that should be studied as is; ie. is ‘the digital’ actually something new or is it just another medium through which the same social structures, patterns and behaviour take place that used to, before ‘digitalization’, so to speak. To some extent, both are true, but in different cases and situations.

Screenshot 2017-03-28 17.01.28

In Veikko Eranti‘s presentation on citizen participation projects online and offline, largely the same things are happening in both media. Getting citizens to participate more has been a primary objective for many Western polities recently, and efforts have included both offline forms of participation (such as participatory budgeting) and online initiatives (such as online citizens’ initiative portals). But both have their caveats: if the initiative is designed to bolster tokenistic representation on everyday matters without true potential for change in any structures, what we get is citizens complaining on mundane issues rather than any meaningful participation(s).

Screenshot 2017-03-28 17.03.58But Mikael Brunila‘s presentation on the online spreading of the Soldiers of Odin extreme right brand (of which there is a great blog post here!) shows that the digital and the physical not always go hand in hand: something digital might not reflect something physical, in terms of political action. Still, online activism is not just ‘clicktivism’: spread of radicalist ideas does have real consequences whether or not they are accompanied by ‘boots on the ground’.

And Zhen Im‘s paper shows that digitalization has some very concrete structural societal effects in the shape of creating widespread economic and cultural precarity which partly explains the surge of the Western populist radical right (a thesis of ‘digitalization losers’ complementing that of ‘globalization losers’).

FTPTySC_Moreover, Salla-Maaria Laaksonen‘s work provides insight into how digital tools offer new methods of mobilization for anti-racist social movements as well, which may use social media to spread a ‘carnevalization’ of a physical event (an anti-immigrant street patrol confronted by humoristic ‘clown patrols’, the ‘Loldiers of Odin’). These tools for social movement mobilization are so concrete that state actors sometimes feel they have to intervene, as Markku Lonkila‘s presentation stated in the case of Russian political opposition and direct state repression that was directed against it. And they are used by a multiplicity of political actors: the logic of hybrid media also allows anti-immigration activists to question ‘official’ truth narratives and produce ‘counterknowledge’, ‘alternative facts’ and ‘post-truth politics’, as analysed by Tuukka Ylä-Anttila, by combining topic modeling with interpretive frame analysis.

Screenshot 2017-03-28 17.10.24

Also in regards to corporate use of individuals’ data, not just political actors, citizens are reacting to perceived misuse of their data and claiming ownership of that data, as Tuukka Lehtiniemi‘s paper assessed. And while studying these partly old, partly new phenomena, there are also new ethical challenges we have to take into account, like Aleksi Hupli argued.

All in all, we hope that both taking into account the digitalization of society and usage of digital methods will become more and more self-evident in sociology rather than a curiosity. While they are distinct phenomena, they are driven by same societal changes, which should be understood in all social research; rather than a separate ‘sub-field’ of digital or computational sociology.

2015 Cyberelections: combining ethnography with big data analysis

koodiscreenshot3
Snippet of analysis code.

2015 Cyberlelections (Digivaalit 2015) project, a joint collaboration with Helsinki University CRC and Aalto University HIIT, started officially in January 2015. With a multidisciplinary team of social scientists and computer scientists, our overall purpose was to study the ways how agenda is built in the online public sphere during the Finnish Parliamentary elections 2015. Thus we studied the ways how actors online can influence the agenda of both social media and traditional media, i.e., the ways of influencing online.

For that purpose, we collected a big set of data consisting all candidate updates from different social media services (Twitter, Facebook, Instagram) as well as traditional media content from 19 different news media, in practice trying to extract everything that happened online during the elections. In total our full data set ranges to approximately 1,5 million messages.

A dataset of that size means that traditional or qualitative methods are not enough, because it’s an amount of data no person can analyze by hand. Therefore, we turned to computational social sciences approaches, i.e. using computers and written algorithms to analyze our data. Such methodological approach has been recently entitled as computational social science.

Computational social science is an approach that utilizes computational methods and algorithms in different stages of the research process from data collection to data preprocessing and data analysis. In practice this means all tools used in research are written for a unique study purposes since no ready made tools are available.

This is an approach with several advantages but also disadvantages. For instance, extracting the data as pure textual format gathered through the APIs we cannot fully understand the context where the data is born. As Lisa Gitelman puts it, raw data is an oxymoron, and has no value as such. Second, we easily end up in a situation where we have to blindly trust the results what our algorithms give us, and they are often quantified. What do these numbers mean? Third, there are choices that need to be made both during the data collection as well as during the analysis phase.

These choices and interpreting the results require contextual and theoretical knowledge. In the field of social physics, social phenomena have been studied using computational methods by computer scientist and physicists who do have the methodological knowledge but not the theoretical or contextual knowledge – and often no interest in that either.

Ethnography to the rescue!

In our project we aimed to tackle these limitations by combining computational social science with ethnography. Ethnography is a research approach that aims to create understanding and make sense of human life and social communities and practices within those communities. It is commonly conducted in the natural environments of human action. Ethnography is often characterized with a period of field work, a period of time when the researcher intensively immerses with the people and the culture she is studying, observing the practices and participating in the activities, writing field notes.

When ethnography moves online, it can be generally called online ethnography. There are several sub-approaches of online ethnography such webnography that focuses on web sites, network ethnography focusing on actor networks, netnography that focuses on communities, media ethnography where the researcher participates as a media user, trace ethnography, where log data of online platforms is used to trace user behavior patterns.

All these methods raise questions, (see Wittel 2000). For example, what counts as participatory observation online? When is a researcher participating, how do the research subjects know they are being studied? How can the researcher actually participate in the field, when necessarily a part of the physical context where the action takes place remains unseen and unreachable? How to understand the human actions taking place behind the online, in the so called real life?

Markham (2013) takes a reconciling approach to these questions and suggests that we just need to conceptualize the field in a bit different manner, not as a place, but as a flow or a process, and accept that as the forms of participation differ online, also forms of participatory observation can differ. Following her suggestions we proceeded with an ethnographic field work online for one full month before the election date, by three researchers. One focused on the left wing parties, one to right wing parties, and third to overall election-related communications cross-platform. Here the focus was rather wide, looking at the forming of the online agenda around the election, candidate communication styles as well as interaction with other actors. Field notes we written and screenshots and links saved on a daily basis.

Solving the burning questions of big data and social sciences

Based on our experiences, we propose a methodological approach of Data Augmented Ethnography, which overcomes many of the limitations of both methods. First, what comes to the context, we posit that using ethnography with computation social sciences enhances contextual framing. In the analysis phase, it is much easier to interpret the results when we can compare them to the field notes made during the field work. Further, the field notes can help us in crafting the algorithms to ask right questions in the first place. Ethnography also helps us already during the data collection phase so that we can be sure to include all the data we are interested in and for instance, modify our search queries on the go.

And why not use only ethnography? Because using computational methods allows us to use larger data sets and study the phenomena in larger extent than only snapshots of the case. Also, they allow for validation and generalization of our findings and observations.

Hence, we suggest supplementing ethnographic field work with computationally collected data, and simultaneously use the observations to modify the data collection. In the analysis phase we suggest both data sets to be used in parallel to complement the observations made in each of them. Further, in the best case, we suggest qualitative analysis on selected parts of the data would be conducted to go deeper with the observations. For instance in our research project, Mari Tuokko’s master’s thesis is an example of such approach.

There is no full data

As a final reminder it needs to be noted that data and observations always remain incomplete. The data that is visible for an observing researcher is always limited. Similarly any collected data sets, collected handles, or hashtags always remain incomplete – none of the application programming interfaces of Twitter for instance give the “full” data. Some parts of the interaction takes place on private arenas or outside the online. Therefore, in essence, the idea of having a full data is an oxymoron. With a mixed methods perspective such as data augmented ethnography, however, we can gain a more nuanced understanding of the social action that takes place online.

More about the project:

Studying multimodality in the Digital Humanities Hackathon

In May 2015 I participated in the first edition of the Digital Humanities Hackathon at the University of Helsinki. During the week four multidisciplinary teams conducted small research project with different datasets. It was a super-intensive week with lot to learn – both about the methods, about coding Python, and for me as a social scientist about humanities too!

This blog post is a cross-posting from Day of Digital Humanities site and summarizes the work of our Multimodality group in the hackathon. Group members were Dragana Cvetanovic, Arja Karhumaa, Pasi Kojola, Salla-Maaria Laaksonen, Taina Laaksonen & Aaro Salosensaari, and our team was guided by Tuomo Hiippala. A summary of the whole week including other teams work can be found on here.

Background for the study: representations of Finland in Finnair’s in-flight magazines

There has emerged a strong interest in nation branding among both practitioners and academics (e.g. Aronczyk 2013). Media is one prominent environment where nation brands are built and maintained. What is notable is that media content is not exclusively linguistic. Newspapers and magazines, for instance, combine photographs, infographics and typography in a layout to communicate with the reader. This phenomenon is often referred to as multimodality.

Yet most studies have focused exclusively on the textual or visual aspects of representing a nation brand. Consequently, the joint contribution of language and images has been rarely given consideration in the study of nation branding.

During the Digital Humanities Hackathon our team adopted a multimodal approach to study how Finland and Finnishness are represented using multiple modes of communication in Blue Wings, the in-flight magazine of Finnair. We propose these articles convey an image of Finland to both business travellers and tourists. Our final research question thus was: what modes of communication are used to represent Finland in Finnair’s in-flight magazines?

Data and methods and lessons learned

Studying multimodality consumes both time and resources, because how language, images, layout and other modes of communication combine varies from page to page. We learned that when focus shifts to images or other visual content also methods get more complicated: what is easy to see for humans can be very difficult for computers. In practice, we had to code all our tools ourselves during the hackathon.

To locate the pages that mention Finland or Finnishness, we first extracted and searched the text contained in the data set. Having identified the relevant pages using a Python script, we returned to examine their layout. We then used a computer vision algorithm to identify elements on the page and applied machine learning to classify them into two categories: texts and images.

Dhh15_multimodal-copy copy

Preliminary results and exploration

First of all our preliminary analysis shows that most of the content representing Finland is in image format. In Figure one the different issues in our data set are presented in lines according to page number. The size of the bubble represents the pixel size of a particular clip on paper. Blue color is for images and black for text. As you can see textual parts concentrate to the end of each issue – on these pages in each number there is a section dedicated for in-flight information. Thus, while the visualization is hardly a research finding as such, it does imply that our computer vision algorithm is identifying images and text properly.

Screen Shot 2015-05-15 at 12.57.36

Figure 1. The content of ten different issues portrayed in bubble map arranged by page numbers. Visualized with RAW.

Next, to dive deeper into the textual content a LDA topic model was conducted to find different topics in the textual parts of the data. Using the algorithm we found twenty different topics. They are listed below in order of prominence.

  1. Life and family
  2. Finland/Finnair info
  3. Nature and ecotravel
  4. Business
  5. Cultural events
  6. Finnair services, aviation
  7. In-flight entertainment
  8. Politics & economy
  9. (broken parts of words only)
  10. Design
  11. Finnair services
  12. Food, eating in/out
  13. Sauna & other Finnish classics
  14. Environment
  15. Culture (music, artists, books)
  16. Customer loyalty program of Finnair
  17. Finnish sports
  18. Wellbeing
  19. Work, companies
  20. Editorial information

Finally, we wanted to look in which modes of communication these topics are represented, i.e. to study if some topics are more visual than others. During the hackathon we had only time to do preliminary exploration with some of topics just to check if the idea is applicable. Figures 2 and 3 below show some examples of the distribution between image and text in two different topics. The charts show that cultural events are presented with more and larger images than sports, and also that the layout on the cultural event pages is more scattered as the amount of clips is larger.

image2image

Future directions

After this preliminary study the developed methods could be used to study multimodality in all in-flight magazines of European or other state-owned airlines to investigate the ways how nationality and nation branding interact in these magazines. In this way we could build more comprehensive and comparative research setting.

In order to contribute to practice-based fields such as graphic design, the proposed method can be used to trace the development of design conventions, whose understanding is an important aspect of multimodal literacy. The development various genres of the magazine medium could be traced using the data available from Google Books and the National Library of Finland.

Moreover, the method could be trained to recognize specific elements in graphic design, in order to distinguish between different types of images (photographs, illustrations, information graphics), headers, captions and body text, etc.

But how to automatically examine the visual content? We could extract prominent nouns on each page and use them to retrieve a set of training images from ImageNet (www.image-net.org). ImageNet contains thousands of images for each noun, which can be used to evaluate the content to of the images found on the page.

Further reading: