Situated Data Analysis
My latest paper, “Situated data analysis: a new method for analysing encoded power relationships in social media“, started out as an analysis of the data visualisations in Strava, but ended up as something more ambitious: a method that I think can be used to analyse all kinds of digital platform using personal data in different contexts. Here is a video teaser explaining the key points of situated data analysis:
This paper has been a long time in the works, and started off as part of the INDVIL project on data visualisations, where I was tasked with thinking about the epistemology of data visualisations. Working through revision after revision of my analyses of data visualisations in Strava I found that what really interested me about Strava was the many different ways that the personal data it collects from runners and cyclists are presented—or, more precisely, how the data are situated. Once I’d analysed the different ways the Strava data was situated, I realised that the same method could be applied to any social media platform, app or digital infrastructure that uses personal data. So I decided to change the focus of the paper so it was about the method, not just about Strava.
Donna Haraway coined the term situated knowledges in 1988 to demonstrate that knowledge can never be objective, that it is impossible to see the world (or anything) from a neutral, external standpoint. Haraway calls this fiction of objectivity “the god trick,” a fantasy that omniscience is possible.
With Facebook and Google Earth and smart homes and smartphones vastly more data is collected about us and our behaviour than when Haraway wrote about situated knowledge. The god trick as it occurs when big data are involved has been given many names by researchers of digital media: Anthony McCosker and Rowan Wilken write about the fantasy of “total knowledge” in data visualisations, José van Dijck warns against an uncritical, almost religious “dataism“: a belief that human behaviour can be quantified, and Lisa Gitelman points out that “Raw Data” is an Oxymoron in her anthology on the digital humanities. There is also an extensive body of work on algorithmic bias analysing how machine learning using immense datasets is not objective but reinforces biases in the data sets and inherent in the code itself (there are heaps of references to this in the paper itself if you’re curious!).
Situated data analysis provides us with a method for analysing how data is always situated, always partial and biased. In my paper I use Strava as an example, but let’s look at a different kind of data: how about selfies?
When you take a selfie and upload it to Facebook or Instagram, you’re creating a representation of yourself. You’ve probably taken selfies before, and you’ve probably learnt which camera angles and facial expressions tend to look best and get you the most likes. Maybe you’re more likely to get likes if you post a selfie of yourself with your friends, or in front of a tourist attraction, or wearing makeup – and probably what works best depends on your particular community or group of friends. You’ve internalised certain disciplinary norms that are reinforced by the likes and comments you get on your selfies. So at this level, there’s a disciplinary power of sorts. This is the first layer, where your selfie is situated as a representation of yourself that you share with friends or a broader audience.
Facebook and Instagram and other platforms will show your selfie to other people for their own purposes of course. Their goal is to earn money from advertisers by showing people content that will make them spend more time on the platform, and also by gathering personal data about users and their behaviour. Here your selfie is situated differently, as a piece of content shown to other users. There is an environmental power happening here – the environment (in this case the social media feed) is being altered in order to change peoples’ behaviour – for instance I might pause to look at your selfie and then happen to notice the ad next to it.
A third level at which the data of your selfie is situated happens when Clearview illegally scrapes your selfie, along with three million other selfies, and uses them as a dataset for training facial recognition algorithms. Next time you are at a protest or rally, the police or an extremist group might use Clearview to identify you. Perhaps in the future you’ll be banned from entering a shop or a concert or a country because your face was identified as being at that protest. Maybe a nightclub or a university will have a facial recognition gate and will only let in people without a history of attending protests. Obviously this was not something you had in mind when you uploaded that selfie – but it’s an example of how placing data in new contexts can make a huge difference to what the data means and what it can be used for. A facial recognition gate that refused entry to people would also be a kind of environmental power – the environment is simply changed so you can no longer act in certain ways.
Although I started this paper in the INDVIL project, I finished it while working on the Machine Vision project, and the selfie example is just one example of how situated data analysis is relevant for machine vision technologies.