How to model a social network of electronic literature
I’ve been following Lada Adamic’s MOOC* on Social Network Analysis for a few weeks now and am loving it. As I learn more about networks I’m realising how many ways there will be to visualise the connections in the ELMCIP Knowledge Base of Electronic Literature, which is the data we’re itching to analyse in our research group. We have records of authors, events, creative works, critical works, journals, syllabi and more, and they’re all cross referenced and have additional information such as which country an author lives in or an event was held in, which works were performed at which events or published in which journal, and which creative works were referenced by which critical works. So there’s lots of data, and we’re adding more every day.
The more I learn about social network analysis, however, the more I realise that there are many different ways of going about modelling a social network, and that different approaches may (or may not?) give quite different results.
First I thought that of course the authors are the nodes and the connections between them (the lines in the graph, which are called the edges in the course I’m taking) are the fact of their having presented at the same conference or collaborated on an article or a creative work.
But how do you get there from the data? I think it needs to be far simpler, at least to begin with:
- Conferences are nodes and are connected when people have attended both conference A and conference B.
- People are nodes and are connected if they have both collaborated on something – or if they have both attended the same event – or if they have both published in the same issue of a journal
- tags are nodes and are connected if they are used to describe the same work
I’m probably not even thinking quite straight about this yet, because I’m certainly a beginner in network analysis, but at least I’m beginning to see the sort of problems that need to be solved.
I came across a blog post by Neal Caren about two different approaches to modelling a citation network between articles in economic sociology, and his model which is a kind of response to a different model that Dan Wang had described in. Neal describes the two different strategies thus:
While Dan constructed edges based on two articles being mentioned in the same week of a syllabus, I went with a relationship based on two works being cited in the same journal article.
You can see Caren’s graph here, and if you hoved over the nodes you’ll see the title of the article. Caren doesn’t really discuss how or whether his results were particularly different from Wang’s, and I’m not a sociologist so it’s not too easy for me to interpret the networks. Basically, both models show clusters of articles that were either taught together or cited together and thus can be seen as closely related. There are also articles that are bridges or “brokers” between two communities or clusters in the network. At least in theory, these brokers should be where information flows between different subfields.
Here’s Dan Wang’s network structure of articles based on whether they were mentioned in the same week of a syllabus (from “Is there a Canon in Economic Sociology?” in ASA Economic Sociology Newsletter 11(2), May 2012:
Wang describes the network like this:
The largest cluster, at the bottom right of the figure, centers on Granovetter (1985), which is closely linked to work that has both reacted to and operationalized its ideas (Krippner 2002 and Uzzi 1997, respectively). On the left in figure 1 is a group of readings that are notably older, clustered around Polanyi’s The Great Transformation (1944), representing the intellectual antecedents of economic sociology. At the top of the network, a cluster of readings anchored by Zelizer (2005) and Zelizer (1978) represent cultural approaches to economic sociology. Finally, an island of readings linked to MacKenzie and Millo (2003) along with Callon (1998) represent perspectives on the performativity of markets in the upper right of Figure 1.
Just as important as these “hub” references above, however, are the “broker” references that link these different subfield camps together. For example, Geertz (1978), at the center of the network in Figure 1, plays an important role bridging relational/ embeddedness views to more cultural perspectives. As such, in culling a set of canonical references from this network representation, we privilege not only those references that are most emblematic of a given tradition, but also the bridging references that give these different territories of economic sociology some measure of coherence and mutual relevance.
Of course, the edges (connections between nodes) in neither of these models necessarily means that the author(s) of one article was aware of the other article. So what does it really mean that an article is a “broker” between two communities? For students or readers presumably these are the articles that are seen as relevant to more than one different sub-field of sociology. Does that make them more valuable? More general? Are they also the most-cited articles, or not?
Anyway, we could certainly do something like this for electronic literature, for instance looking at authors who presented at the same conferences rather than articles taught in the same week of a course. But how useful would it be?
Here’s a completely fictional little network of what I imagine the clustering of conferences might be like. I’m sure this is quite wrong, really, I just plonked a few conferences from the same time period into Gephi and drew some edges between them, imagining what a network might look like if we try to look at conferences as nodes and people presenting at two conferences as an edge:
Here the size of the conference increases with its betweenness centrality. Basically, betweenness measures whether or not a node is on the shortest route between two other nodes. The more “shortest paths” a node is on, the higher its betweenness centrality. High betweenness means that node is a broker between different communities.
So this is obviously a completely fictional network model at this point. But if we can generate something like this with the real data – will it tell us something interesting?
I suppose we’ll have to try it in order to find out. I’d love input from you if you have ideas or know resources I should be looking at!
* I first signed up for a MOOC because I wanted to see how terrible they are, but it turns out they can be really well done – this one has you fiddling around with NetLogo models of toy networks every few minutes to answer quiz questions in the middle of the video lecture. Lots of fun and I’m learning a lot.
Sorry, but comments from before December 2010 are lost in the database and I've not yet figured out how to display them properly.