I’ve been lately considering why on earth we’re researching the Finnish society. Well, naturally the answer is social relevance of the research. Thinking about internet research and computational social science specially, we’re handicapped in several ways:
- First, the research framing is always a bit challenging in academia; formulating the questions so that they become interesting for a wider population. That means figuring out what is the hot current discussion where you want to take part, what are the theories and literature you engage with. This together with somewhat strange rewarding mechanisms in the academia are currently debated, but I’m not an expert on those.
In some cases, the Finnish society allows making some super-cool frames naturally; showing that a grant theory do not really work in some strange society, called Finland. In others, taking part in the discussion just means you need to contextualize the work to ensure everyone (that is, the reviewers) understand how the research was done and what are the relevant phenomenas etc.
But these are more practical issues, and not specific to those studying strange new worlds. That’s just everyday academia. There however are two more hands-on details which make our life harder than those in larger, more important countries.
- Collecting data is often a bit painful and takes resources: time and money. However, compared to our competitors in the United States, I just present few interesting examples.
In our Digivaalit 2015 project, we’ve been collecting news from Finnish media during elections. We’ve put some effort to write our own parsers for media sites and trying to hunt news. I know we don’t have them all, but we have a faire share of them. In the States, I would use LexisNexis or similar kind of services, which would just allow me to get the data. Well, that’s at least my idea on how it would work.
Similarly, many interesting basic statistics describing Internet-use (such as Pew Research) are easily available. We do have something similar by Statistics Finland, but they aren’t that timely nor covering newer emerging phenomena that often. Well, someone should gather the money to develop something along the lines of the Oxford Internet Survey and run those in Finland.
However, it’s not only bad news. The official registers in Finland are of high quality (like in Europe generally) and may allow interesting new research (e.g. Martikainen et al. 2005).
- But let’s not whine more about the data collection. Sadly, this is not the only area where we’re underdogs. In computational data analysis, and ah – ah so trendy text mining – data processing is not nice in Finnish language. Or as Arto and I wrote in our 2015 work: “Finnish is an agglutinative language, in which words are formed by joining affix morphemes to the stem, the language is complex to analyze”.
For example, my favourite language, Python has excellent tools for English language, but for Finnish, not so much. The pre-existing useful things, like sentiment analysis and word categorisations, á la LIWC, are there for many languages, but not for us. That means using these requires either nasty work before or just switching the context. I’ve started to do some of my analysis on English speech just because it is easier to star with them.
So, total together, I’m not sure how much leverage we give to our colleagues working on somewhat familiar context, not needing to put enormous effort to boring data collection, especially if it is somewhat peripheral to the research question in hand but makes argumentation easier, and can apply the newest and fanciest tools in computational data analysis. As boyd & Crawford (2012) suggest, one challenge of big data era in academia is data access. I continue that line of thought – it is also tools that are available, the infrastructures that are in place.