Hackathons in Finland: free labor or open innovation?

Those following the Finnish technology scene have most likely observed that hackathons are this year’s megatrend. Everyone seems to be organizing a hackathon to get developers work with problems. You know, digitalisation is coming and everyone – public sector, private sector – need solutions that revolutionalize their operations using digitalisation … and hackathon is one of the trendy ways to bring digitalization gurus to the organization.

As an old-time hackathon participant, I have certain doubts about the whole concept, but as they are trendy, I think I should have something to say about them. I think we’re using hackathons in rather interesting ways – such as supplement procurement of software in public administration – which have some merits. However, there are also cases which seem more as exploitation of participants. I will first shortly address the academic literature around hackathons after which I move to my rant about hackathons.

What do we know about hackathons?

Sadly, the scholarly literature is still emerging around the whole phenomena (and, will most likely be fully developed only after hackathons are passé). But, let’s give it a try anyway and see what we know about hackathon based on the existing works.

Overall, hackathons can be technology-oriented (i.e., focus on particular platforms) or focused on problems; in particular solving societal problems in issue-based hackathons (Lodato & DiSalvo, 2016). These issue-based hackathons can serve multiple purposes; Johnson & Robinson (2014) see these type of hackathons mixed of a procurement process, civic engagement and innovation taking place. What these findings indicate is that people have motivations beyond just hacking things together; like improving the society or to make their views more concrete.

In general, hackathons have three phases; pre-hackathon, hackathon, and post-hackathon. Hackathons are intense collaborations require participants to set up the goals of the hack and the means of collaboration, including work processes. The hackathon itself is a face-to-face activity where participants work together intensively, but can also seek help from others in the team and engage in iterative development and critique. The challenge with post-hackathon activities is to continue with the same team without the collocated settings, as often the hacks need more love to be ready (Lodato & DiSalvo, 2016, Trainer et al., 2016).

Matti’s rant about some recent hackathons and challenges in Finland

Hackathons are a great way to bring bright people together to create something cool. While the process itself is difficult – as seen in above discussed literature – it may be rewarding for participants. Furthermore, at least my experience, hackathons are a great way to get uninterrupted thinking time for a problem in a creative manner. Having space, catering and time reserved in the calendar make it easy to focus on the problem.

In my view, the best hackathons have a somewhat open goal, allowing the developers to take different angles to the problem and demonstrate a variety of approaches, or to invent something new. Usually, the organizers in these cases seek out questions like “what is possible?” and in best cases contribute their skills and knowledge to help hackers. However, I’ve recently seen the term hackathon being used for events I don’t think live up to my ideals.

Hackathons and challenges should not be cheap software development

Some hackathons are organized with a super-specific goal already defined. In the Open Finland Challenge this year, there was a challenge organized by Aller Media, with the goal of

We want to add location data for discussions by offering the user an opportunity to find relevant information about ones’ surroundings. [shortly translated]

When reading this, I think the jury already had a rather clear vision of what they want to get as an outcome. Naturally, you can break the rules – and I did ask this in when the challenges were made public – it’s OK to hack whatever you want. But the jury will naturally read the challenge also. Just compare the challenge made by YLE in the same challenge competition

YLE has opened Elävä arkisto Data through an API. What interesting can you build using this information – maybe a new service to a special user group or something totally new? [shortly translated]

I think this challenge is open-ended, allowing participants to work in rather creative ways with the data. This aligns more within the ideas like open innovation, exposing the company to new ideas and approaches, and the creativity of hackathons. The former instead sounded to me that they might just want to consult a software company to produce a prototype of their idea and test it.

Hackathons and challenges should not be cheap consulting

The more recent case of this was from LähiTapiola Hack, where the goal was to develop “new digital solutions for inspiring young people to save and invest money. During the 5-day business hackathon teams will develop a new business or product concept for LähiTapiola (LocalTapiola), and finally pitch it to a jury consisting of LähiTapiola executives and business angels.”

I think this is a nicely open-ended problem to hack with, giving rather free hands to work with. There is opportunity for true creativity. However, in a closer look at the hackathon policies showed that there was something fishy about the IPR.

In hackathons I’ve attended, the IPR usually belongs to participants or its considered to become public domain. In this hackathon, instead, the conceptual innovations (whatever those are) are explicitly stated to belong to hackathon organizers if they emerge from data and materials by LähiTapiola. This means that these guys get free business consulting and ideas by buying food, space and 5,000 € reward for the winning team. If you want my ideas, you can just contact my consulting firm and we’ll discuss my pricing in detail. Or as it seems, they are actually incubating some business opportunities and startups for them – weirdly called a hackathon.

How to move forward?

I think the first step is to ensure we don’t call all things hackathons or challenges just to look trendy. If you aim to incubate startups or run public procurement, are you really doing a hackathon or something else which may have similar characteristics of a hackathon – collocated fast-paced and solution centric work, aiming to produce some concrete outcome by the final day. I would even avoid the name hackathon for everything that’s not what I would call traditional hackathon, a day or two of hard work in open context – just to make sure you don’t market the event in a wrong manner and get weirdos like me attending.

Second, if you’re sure you’re organizing a traditional hackathon, check that your hackathon task is semi-open to participation. Naturally, seasoned hackathon participants know how to read the tasks in an open manner and produce something cool. But it might be more inviting even for them if they can see that the organizer is truly seeking something novel and cool. Remember that hackathons, as I see them, should be much about open collaboration, open innovation and facilitating great minds to come together.

Finally, have some answer to the question ‘what next?’ If there are ideas the hackathon participants want to move further, how can your organization support those moving forward? I do have good experiences of these, including seed investment from organisations to develop the quick proof of concept into a true product and even launching those. And if you have plans like this, remember to tell about those beforehand and check you continue to support teams throughout the further process as well.


I was motivated to write this post thanks to the poor case from LähiTapiola hack and discussions with my nerd friends in the #fixme-irc channel. All views presented in this text are naturally my own and may not reflect the #fixme-community, the Rajapinta-community, my employer, my supporters, nor the future self.

Cross-posted to my personal blog, Science & Industry.

Suomi24 Data Science Hackathon – results and afterthoughts

The availability of large data sets and digital material is changing the landscape of research within social sciences and humanities. At the same time, tools and the understanding necessary to utilize such data are often lacking. To tackle this problem, during the last weekend of May we organized a Data Science hackathon around a newly opened data set of Suomi24, the largest online discussion forum in Finland with 1.9 million monthly visitors.

The hackathon was organized by the Citizen Mindscapes research collective, University of Helsinki, Futurice Oy and Aller Media Oy. The event was also part of Nordic Open Data Week and organized in cooperation with Open Knowledge Foundation. The main goal of the event was to allow researchers and coders work together and find new ways of collaborating in the field of data science. We built four different teams consisting of coders and researchers to figure out research problems and create solutions and demos to find their answers.

The dataset used in the event was the almost entire database of Suomi24 online forum discussions ranging from 2001 to 2015, consisting of hundreds of thousands of posts and altogether over 123 million words – a set of data rather impossible to study comprehensively using traditional methods from social science or humanities. Below is a summary of the work and results discovered by the teams.

Rhythms of Human Life in Suomi24

This team was interested in the life cycle of topics in Suomi24. A typical way of studying topics is creating a list of words and querying the data with the words. As one exercise this team tracked the conversations related to jealousy using a list of fifteen related words. They noted that in general the talk about jealousy has increased during the time span of the data. Maybe people were not so used to talk about personal issues online but year by year it is getting more common? Further, the analysis shows that jealousy words peak during January and in May; on the contrary in December discussions on the topic are rare. The team hypothesized that this relates to the well-known phenomenon of finding a summer fling, or the aftermath of all the Christmas parties.


User Modeling and Micro Level Interactions

This team focused on tracking down different interaction types, recognizing positive/negative discussions, and finding out what words or linguistic features are predicting longer discussion threads. In essence these questions directly relate to a very practical problem of how to create interaction in the online sphere and produce text so that the writer can create engagement. The team decided to simply measure this using the length of the thread as the dependent variable, and using MDL (Minimum Description Length) started searching for the linguistic features that are typical to long or short conversation threads. Limiting the analysis to conversation sections related to babies and society, they identified some discreet words, topics and features of the text that are typical for short and long threads (see table below).

baby section: inconvenient topics (pregnancy, test, symptoms, miscarriage, periods)
society section: god, work, human
baby section: boy, kid, man, girl, mother, movie
society section: Jesus, forest, baptize
asking, short sentences, question mark, words indicating uncertainty (mikä mutta vai jos), colloquialism subordinate clauses, certain conjunctions (että, vaikka, ja), quotations, commas

Forecasting the Economy

Our forecasting team decided to study what words and topics get accentuated during a financial downturn, and to check whether the online discussions could be used as a tool to predict the economical situation. The theoretical idea behind this question comes from John Maynard Keynes’s notion of animal spirits; the instincts, fears and emotions ostensibly influence and guide human behavior, and through that also affect the economic cycle. In order to answer their questions the team obtained additional data sets regarding Finnish GDP and private household consumption from the National Statistics Finland. An index to measure economic uncertainty in the discussions by a set of key words was created using previous studies as a source. An OLS regression model was tested but didn’t have large explanatory power with this data set. Nevertheless, in the next part of the analysis the team  identified the words whose frequencies rose during the months of the crisis years 2008 and 2009. So, if the economical situation is going down, what are the words people use more often? The identified words were: bar, mother-in-law, poem, weapon, bank, electricity, unemployed, lonely, Easter, girlfriend. We do hope these words are not related to a single story!

Cats versus Dogs

Our last team decided to solve the old Internet dilemma of cats versus dogs once and for all. It is well known that Internet belongs to cats. But how about Suomi24? Are cats also the most prominent animals there? Different statistics were extracted from the data, but the situation kept looking bad for cats: dogs are mentioned more often across the data. Also the amount of users who talk about dogs versus cats is larger. A final analysis was conducted to see whether other topics that cat/dog persons talk about actually differ. The results show what cat people do talk more about mathematics, where as dog persons talk about poop. This whole exercise of course was just a humorous example of what to do with the data, and how to twist the data so that a needed answer can be found – it is just a matter of what to measure. A critical point to note is thus that one should be cautious of different black boxes of data analytics: there might have been other statistics behind the ones that you are shown.

Screen Shot 2015-06-03 at 16.02.51

Some afterthoughts

Apart from the fantastic results from the demos the whole event of course was a learning experience. Most important observation is the need for multidisciplinary knowledge and skills within the teams. Without a more general, wider knowledge about the societal phenomena that are affecting the creation of such social big data in the first place it is not possible to draw relevant conclusions. Our hypotheses of the jealousy discussion, for instance, are pure speculations for now, but probably a dwell into social psychology research on the topics would take us lot further.

Also there’s a clear need to better understand the context of the words studied, as their meaning can be heavily dependent on that. Based on the cat vs. dogs analysis, for instance, we can’t say whether the discussions about cats or dogs are actually pro-cats or pro-dogs or are people actually just complaining about the neighbors pet – this would need deeper analysis regarding the context and tone of the messages.

And of course during two days you probably will not learn that many new skills but rather utilize the old ones in a new context. So no two-day magic crash courses to python coding actually happened, but hopefully some broadening of mindscapes for researches both in social and computational sciences!

  • The Suomi24 data set can be explored through FinCLARIN’s Kielipankki Korp-interface. Full data set is available for download for research purposes.
  • Follow Citizen Mindscapes researcher collective in Twitter.
  • Team members: Rhythms team Pasi Karhu, Limae Phuah, Omar El-Bagawy, Jaakko Suominen, Krista Lagus, Minna Ruckenstein; User Modeling team Antti Rauhala, Krista Lagus; Forecasting team Kimmo Nevanlinna, Timo Nikkilä, Joonas Tuhkuri; Cat vs. Dogs group Matti Nelimarkka, Salla-Maaria Laaksonen.

This post is a cross-posting from Opennorcids.org

Studying multimodality in the Digital Humanities Hackathon

In May 2015 I participated in the first edition of the Digital Humanities Hackathon at the University of Helsinki. During the week four multidisciplinary teams conducted small research project with different datasets. It was a super-intensive week with lot to learn – both about the methods, about coding Python, and for me as a social scientist about humanities too!

This blog post is a cross-posting from Day of Digital Humanities site and summarizes the work of our Multimodality group in the hackathon. Group members were Dragana Cvetanovic, Arja Karhumaa, Pasi Kojola, Salla-Maaria Laaksonen, Taina Laaksonen & Aaro Salosensaari, and our team was guided by Tuomo Hiippala. A summary of the whole week including other teams work can be found on here.

Background for the study: representations of Finland in Finnair’s in-flight magazines

There has emerged a strong interest in nation branding among both practitioners and academics (e.g. Aronczyk 2013). Media is one prominent environment where nation brands are built and maintained. What is notable is that media content is not exclusively linguistic. Newspapers and magazines, for instance, combine photographs, infographics and typography in a layout to communicate with the reader. This phenomenon is often referred to as multimodality.

Yet most studies have focused exclusively on the textual or visual aspects of representing a nation brand. Consequently, the joint contribution of language and images has been rarely given consideration in the study of nation branding.

During the Digital Humanities Hackathon our team adopted a multimodal approach to study how Finland and Finnishness are represented using multiple modes of communication in Blue Wings, the in-flight magazine of Finnair. We propose these articles convey an image of Finland to both business travellers and tourists. Our final research question thus was: what modes of communication are used to represent Finland in Finnair’s in-flight magazines?

Data and methods and lessons learned

Studying multimodality consumes both time and resources, because how language, images, layout and other modes of communication combine varies from page to page. We learned that when focus shifts to images or other visual content also methods get more complicated: what is easy to see for humans can be very difficult for computers. In practice, we had to code all our tools ourselves during the hackathon.

To locate the pages that mention Finland or Finnishness, we first extracted and searched the text contained in the data set. Having identified the relevant pages using a Python script, we returned to examine their layout. We then used a computer vision algorithm to identify elements on the page and applied machine learning to classify them into two categories: texts and images.

Dhh15_multimodal-copy copy

Preliminary results and exploration

First of all our preliminary analysis shows that most of the content representing Finland is in image format. In Figure one the different issues in our data set are presented in lines according to page number. The size of the bubble represents the pixel size of a particular clip on paper. Blue color is for images and black for text. As you can see textual parts concentrate to the end of each issue – on these pages in each number there is a section dedicated for in-flight information. Thus, while the visualization is hardly a research finding as such, it does imply that our computer vision algorithm is identifying images and text properly.

Screen Shot 2015-05-15 at 12.57.36

Figure 1. The content of ten different issues portrayed in bubble map arranged by page numbers. Visualized with RAW.

Next, to dive deeper into the textual content a LDA topic model was conducted to find different topics in the textual parts of the data. Using the algorithm we found twenty different topics. They are listed below in order of prominence.

  1. Life and family
  2. Finland/Finnair info
  3. Nature and ecotravel
  4. Business
  5. Cultural events
  6. Finnair services, aviation
  7. In-flight entertainment
  8. Politics & economy
  9. (broken parts of words only)
  10. Design
  11. Finnair services
  12. Food, eating in/out
  13. Sauna & other Finnish classics
  14. Environment
  15. Culture (music, artists, books)
  16. Customer loyalty program of Finnair
  17. Finnish sports
  18. Wellbeing
  19. Work, companies
  20. Editorial information

Finally, we wanted to look in which modes of communication these topics are represented, i.e. to study if some topics are more visual than others. During the hackathon we had only time to do preliminary exploration with some of topics just to check if the idea is applicable. Figures 2 and 3 below show some examples of the distribution between image and text in two different topics. The charts show that cultural events are presented with more and larger images than sports, and also that the layout on the cultural event pages is more scattered as the amount of clips is larger.


Future directions

After this preliminary study the developed methods could be used to study multimodality in all in-flight magazines of European or other state-owned airlines to investigate the ways how nationality and nation branding interact in these magazines. In this way we could build more comprehensive and comparative research setting.

In order to contribute to practice-based fields such as graphic design, the proposed method can be used to trace the development of design conventions, whose understanding is an important aspect of multimodal literacy. The development various genres of the magazine medium could be traced using the data available from Google Books and the National Library of Finland.

Moreover, the method could be trained to recognize specific elements in graphic design, in order to distinguish between different types of images (photographs, illustrations, information graphics), headers, captions and body text, etc.

But how to automatically examine the visual content? We could extract prominent nouns on each page and use them to retrieve a set of training images from ImageNet (www.image-net.org). ImageNet contains thousands of images for each noun, which can be used to evaluate the content to of the images found on the page.

Further reading: