Rajapinta January meetup: Organizing social movements, doing political science with computational methods

Our near-monthly Rajapinta meetup was organized on Monday 16.1. In addition to content presentations on themes of organizing social movements and doing political science with computational methods, we also worked on related things in practice: we set in motion the official process of incorporating Rajapinta as a registered association (more about this here), and worked out ways to combine the efforts of different research groups and projects in order to efficiently collect online data related to this year’s municipal elections.

In the first content presentation, Salla-Maaria Laaksonen presented a paper (co-authored with Merja Porttikivi) on the organizational elements of a social movement in Facebook. The case in point was the popular “Lisää kaupunkia Helsinkiin” group, an arena of lively discussion on urban planning in Helsinki with over 10 000 members at the moment. The group specifically focuses on the benefits of dense urban city planning, and tries to find out ways to make Helsinki more “city-like” in this sense. Arguably, it has had an effect on the discussion of city planning in Helsinki at large.

The data for the study were the private discussions between the administrators of the group, which allowed the examination of how and when the elements of organization appear. These tell-tale signs of organization include negotiations on membership, hierarchies, defining rules and monitoring them, and sanctioning. Summarizing the results, social movements online are highly constituted through communication and not so much through, for example, organizational structures. The “Lisää kaupunkia Helsinkiin” group initially seems to be an emergent, grass-roots level and network-like social order, but organizational elements are clearly present in the admin discussions. Salla-Maaria’s presentation slides can be found here (in Finnish).

In the second presentation of the day, professor Pertti Ahonen shared the experiences he has gathered over the years working on the edges of the mainstream research in political science. He has never identified with any particular school of thought and has gathered a lot of connections during his career to people working different paradigms. Long career has allowed him to get accustomed with multiple different philosophies of science and methodologies. Professor Ahonen has had his first contact to what has become “data science” very early. He reminisced the early days of R, which is now widely adopted open source tool in academia and business world. In the recent years, he has used computational tools in his research. As an example, he mentioned work where he applied topic modeling to study Finnish party programs.

Professor Ahonen stated that fitting social theory together with modern computational methods in a meaningful way is one of the key questions to be solved so that social sciences will be able to get the best out of computational research (Ahonen, 2015). In addition, Ahonen stated that computational methods should not be something to be afraid of: in many cases, one can also consider them as computationally assisted qualitative methods.

Rajapinta meetups will continue to be organized about once per month in the spring, details will follow. If you would like to take advantage of the chance to discuss your work with like-minded colleagues, let us know!

DCCS October meetup: topic models, data economy and computer vision

Last Friday our Rajapinta/DCCS meetup was organized for a third time. We were kindly hosted by Aleksi Kallio and CSC IT Center for Science. CSC is a non-profit company owned by the state of Finland and administered by the Ministry of Education and Culture. CSC maintains and develops the centralised IT infrastructure for research, libraries, archives, museums and culture. Their services have mostly been used by researchers in sciences or life sciences, but recently we have been discussing and collaborating with them in social sciences, especially computational social sciences as well. For instance, the data processing in Digivaalit 2015 was mostly done on CSC servers.

In the meetup we had three presentation each followed by a lively discussion.

dccs2810-matti.gifFirst, Matti Nelimarkka discussed topic models and the ways how to employ them in social sciences, and in particular the different ways of selecting the “k”, i.e. the number of topics you want to extract from the data.

Computer science uses measures such as loglikelihood, perplexity or gibbs sampler to find the best estimate for k. Social science people, however, often select a few k numbers, check and compare the results (i.e., word lists) and using some heuristics pick the one that seems best.

Matti ran an experiment to where he asked participants to examine topic model results from a given data set for 10-30 k’s and select the k that seemed to best with the given research problem. After this, the participants were interviewed about the process they used to select the k.

There were some general heuristics all participants seemed to use: they first, tried to avoid overlapping topics (if they existed, they cut down the number of topics) and second, tried to avoid topics that seem to include multiple themes (and increased the number of topics in such cases). Most importantly, all the five participants selected a different k with a large variance.

Hence, results show a sort of method opportunism in selecting the k of topics: depending on what people want to find from the data they perceive it differently. Matti’s suggestion is, that computational methods should be used to select the k.

*

dccs2810-tuukka.gifNext, Tuukka Lehtiniemi discussed the troublemakers of data economy based on a manuscript he’s preparing. As troublemakers he refers to players who disrupt the market and gain ground by acting against the normal way of doing things. In normal business markets such actors would be Spotify, Uber, Igglo, or Onnibus – or national broadcasting companies such as YLE for commercial media.

But what is the conventional mode or the market in data economy? The market is to a large extent defined by the large players known as “le GAFA”: Google, Amazon Facebook and Apple. Their business is mostly based on datafication, which means turning social behaviour into quantifiable data (see e.g., Mayer-Schönberger & Cukier, 2013). Such data is born online within these services based on the activities of the users. The markets that exist upon this data are largely base on selling audience data to advertisers and various third party data services. Tuukka, following Shoshana Zuboff’s thoughts, calls this surveillance capitalism.

In his paper, Tuukka examines three potential alternatives to the surveillance model: two commercial startup initiatives (Meeco and Cozy Cloud) and a research-originated one (OpenPDS developed at MIT). These cases are explored to identify overarching features they strive to achieve in relation the above questions. The identified new roles for users are data collector, intermediary of data between services, controller of data analysis, and source of subjective data.

A version of the related paper is available on the Oxford Internet Institute IPP conference site.

*

dccs2810-markus.gifIn the third presentation Markus Koskela from CSC presented some recent advances in automated image analysis tools – or as he neatly put it, analyzing the dark matter of the internet.

Automated image analysis is commonly done nowadays using machine learning and deep neural networks. A big leap forward has been taken around 201,2 made possible by first, the availability of open visual data, second availability computational resources, and third, some methodological advances. From a machine learning perspective there is nothing completely new but a few simple tricks to improve visual analysis.

Nowadays lots of open source tools are available for visual analysis: codes available in GitHub, pre-trained networks are openly available, several annotated datasets to use in the analysis (e.g. Imagenet, Google Open Images). Markus recommends Keras (keras.io) as his favorite choice, and mentioned TensorFlow and Theano as other usable tools.

As a final note of caution Markus reminded that researchers still haven’t solved what vision actually is about. It’s always that particular data set or a particular task, where a computer vision solution works, but generalization is very difficult. For example he presented some funny results of image recognition algorithms’ work in the sample images from Google Research’s automated caption generator: algorithm can’t tell the difference between a traffic sign with stickers and an open refrigerator, if the light sheds over the sign in a particular way (same pics available in this Techcrunch article)

*

Next DCCS meetup will be held in Tampere on November 25th in connection with the Social Psychology Days – stay tuned!