Visting Stanford Literary Lab
Franco Moretti‘s work on “distant reading” (or macroanalysis) has been very inspiring as we have explored what kinds of research we can do using the ELMCIP Electronic Literature Knowledge Base, especially his book Graphs, Maps, Trees: Abstract Models for a Literary History. Moretti generously supported a grant application for developing visualizations of the data in the Knowledge Base last year (we didn’t get the grant, but are still working on it) and we were eager to visit the Stanford Literary Lab, which is led by Moretti and Matthew Jockers, to learn more about their work and hear some of their thoughts about our Knowledge Base.
After a beautiful drive up the Californian coastline from Santa Barbara we arrived at Stanford on Thursday afternoon and easily found the Literary Lab.
Matthew Jockers’ work is primarily on text mining of literary texts, and he immediately starting thinking about how and whether one could do text mining of electronic literature. This is something Scott and Jeremy Douglass also discussed earlier this week in San Diego. A problem is that works of electronic literature are very heterogenous in comparison to most traditional literary genres (Jockers noted that this is the case with modernism, too, which is why there is very little digital humanities work in the field, other than on individual authors) making it difficult to draw patterns out of large quantities of texts. On the other hand, as Scott and Jeremy Douglass had discussed, and Scott and Andrew Salway have discussed in Bergen, it would be possible to automatically extract and compare particular code structures, functions, use of visuals and so on. There are also genres of electronic literature that are (or have been) relatively stable, such as Flash poetry, interactive fiction or hypertext narratives.
But the ELMCIP Knowledge Base as it stands now is primarily focused on metadata rather than the texts themselves. We show the connections between creative works and events they were presented at, critical writing they are discussed by, organizations with which their authors are affiliated, journals in which they were published and so on, and being able to visualize all this in a social network graph would be immensely useful. And although the works themselves might be characterized as having “extreme morphological variation”, as Jockers or Moretti said, the network of connections is homogenous. For instance, as Moretti put it, all the authors in our database are humans, most are alive, and many live in Europe or the US.
Jockers suggested that simply showing a graph view where different categories of node were displayed in different colours (blue for creative works, green for critical works and so forth) would be useful. We’d also like to be able to see how nodes would cluster. How important are geographical locations in the community, for example?
“There’s not really a long enough time span to identify trends,” he said, “but would you say that there was a steep increase and then a stabilization?” The problem with our data, of course, is that we don’t really know how representative it is, other than that we most certainly do not have an entry for each actual work of electronic literature in the world. The data is skewed by the interests of contributors and by our knowledge. For instance, we know there is a great deal of Brazilian electronic literature but we have almost none of it registered in the Knowledge Base. Moretti pointed out that with enough data over time, you can link literary trends to events like wars or social upheaval, but you can’t really do that with a fifteen year time span.
Moretti did suggest comparing this graph to the publication of critical works, which we’ll certainly try next. Graphs like these are easy to generate by dumping a spreadsheet from the Knowledge Base and playing with it in Google Fusion or even Excel.
But if trends are difficult to track with such a brief timeframe, Moretti suggested that typologies would be easier to spot. While trends link literary history to social events outside of the texts themselves, typologies are internal to the data, and allow us to identify genres and themes. You need to work differently with the data with typologies than with trends, being much more precise.
Stanford Literary Lab mostly researches 19th century literature, because that is what they have good corpuses for. Until recently, we haven’t had access to corpuses or metadata for electronic literature, because there have been no systems for documenting these works. Libraries haven’t known what to do with it, apart from the few works that are published on disk or CD-ROM with an ISBN number. Hopefully the existence of data in the ELMCIP Knowledge Base will allow future scholars to engage more with electronic literature as well as the classics.