Learning R for visualising humanities data

JillFebruary 17, 2022June 16, 20223 Comments

I think you should learn R! No really – I’ve spent the last 6-7 weeks learning R so I can visualise the data we’ve collected in the Database of Machine Vision in Art, Games and Narratives, and it’s not as hard as I’d imagined, and I’m thrilled at all I can do with it.

Previously I’ve used Gephi and Excel, and I guess I thought it would just be too hard to learn a programming language, but honestly, it’s been a blast learning R and I wish I’d started ages ago. This blog post is a collection of the resources I’ve found most useful in teaching myself. I hope it helps other humanities scholars who would like to learn R.

But why learn R?

R is a powerful tool for creating data visualisations and analysing data. The basics are pretty simple, and depending on what you want to do you’ll find specialised tools for almost anything, from statistics, to network analysis, to mapping or interactive graphs and more.
The community base is huge and it’s pretty much the standard in data science both in academia and industry. There are lots of tutorials and examples. If you’re stuck, post your question on a site like StackOverflow and you’ll often get answers within minutes.
R makes the analysis process explicit. I can publish the scripts I used to generate a visualisation or a table, and other people can check my steps or build upon my work to do something similar with a different dataset. It makes it easy to follow FAIR principles: making data Findable, Accessible, Interoperable and Reusable. You can even publish an R notebook with embedded code so people can see (and reuse) exactly what you did. Here’s an example I’m working on with our data about machine vision in art, games and narratives. I’ve been learning a lot about FAIR data by working with Jenny Ostrop, who has been helping me work out how to best format and present our data when we deposit it. This is still in progress since I’m still figuring stuff out.
It’s fun! Kind of like the puzzle appeal of Wordle, but much more satisfying. Because each step is reasonably simple and you can find so many tutorials and examples, you keep feeling that rush of figuring out something new! And then you see something else you’d like to learn and get excited because you actually know what you need to do to figure that out.

If you want to see where I’m up to after six weeks, here is an R notebook with the code and visualisations I’ve been working on just to sort out my data, and here are the network visualisations I’ve been working on this week. It’s very much work-in-progress, but it shows how working through various tutorials but with my own data is a really productive way of exploring my data – and the network analysis is starting to give unexpected but generative findings that I want to explore more.

How to get started with R

Here’s my recommended progression for a humanities or social science scholar who wants to learn how to visualise categorical data or textual data rather than numeric data.

I started on Coursera, found some tutorials, and then was lucky enough to find Jeffrey Tharsen’s Data Analysis for Linguistic, Cultural, and Historical Research course at the University of Chicago, where I’m a visiting scholar this semester, and Jeffrey has let me audit the course, which has been great. There are lots of online courses and tutorials for general data science and for people with programming backgrounds, but when I was starting out (all of six weeks ago) I found it hard to figure out how to analyse textual data, and categorical data like we have in our database, because most of the standard tutorials use numerical datasets and do statistical analyses. This list is for people who are more interested in finding patterns and sorting through categories and words than people who want to figure out the mean or the standard deviation of census data.

Take the Coursera course The Data Scientist’s Toolbox to learn how to set up RStudio and Github. You don’t have to do this, but I was so glad that I did, it made everything else much simpler. Weeks 2 (setting up RStudio) and 3 (version control and Github) are the most important, although the stuff about R notebooks and sharing your process are really good too. Coursera has little videos and quizzes and this course is really pretty nicely set up. It says 18 hours over 4 weeks, but I used about 6-7 hours I think, doing most of the tasks. If you don’t want to take this course, the bare minimum you need is to install R and RStudio on your computer, and you can Google other tutorials for that.
Start visualising! Start with Chapter 3: Data visualisation in R for Data Science. This walks you through creating your first data visualisations – and it’s such a fun way to get started. They use a built-in dataset, mpg, which is used a lot in R tutorials. It’s very much numerical data, lots of stats about car models, so the kind of visualisations this does isn’t much like what many of us need in the humanities where we more often use textual or categorical data. But it gets you doing so much so fast.
(optional) Here is another tutorial specifically for digital humanities. I did this one before I found the Data visualisation in step 2, and that worked – I just think going straight to the visualisation instead of starting with data organisation would be more fun. This is definitely a helpful tutorial though.
Learn the difference between base R and the Tidyverse. Base R is the stuff that’s been in R for decades. Tidyverse is a package that includes really easy-to-use visualisation and analysis tools, and I highly recommend focusing on this. However, you want to learn a bit of the base syntax and what a function is and so on, even though you may not really need it much. This free Introduction to R course on Datacamp takes about 4 hours and goes through the basics in an easy-to-grasp way. You could start here instead of with my recommended step 1 and 2, but I think you’ll have more fun if you see the potentials before learning how to subset a data frame or write a function.
Work through chapter 4 Data Visualisation in R to learn how to organise your datasets using the Tidyverse system, and/or try Rob Kabakoff’s book Data Visualization with R, which has lots of good information. For my data, looking at how to visualise different kinds of categorical data was really helpful and Kabakoff has lots of examples. Kieran Healy’s Data Visualisation: A Practical Introduction is also excellent. The trick with any of these is to skim to identify the bits that look interesting to you, based on the kinds of data you want to work with. Then download the datasets they use and work through the examples, following the book exactly. If you have a dataset of your own, try adapting the same scripts to use on your data.
If you want to learn network visualisation in R, Katherine Ognyanova’s Network Visualization with R tutorial is brilliant. David Schoch’s Network Visualizations in R is good too, and if you like me want to convert bi-partite networks to one-mode networks, Phil Murphy and Brendan Knapp’s instructions from Bipartite/Two-Mode Networks in igraph will help. And of course you can look at the code I used for my analysis, and even try running the code with our dataset – though be aware it’s in progress!
If taking the full text of a novel and analysing that is your goal, you could try Matthew Jocker and Rosamond Thalken’s Text Analysis with R: For Students of Literature (see if your library has the second edition online), but be aware they mostly use base R instead of Tidyverse. I did a bit of this but it’s not my main interest so I haven’t delved deep.
This week the topic in Jeffrey Tharsen’s class is machine learning using R. I haven’t really tried this but plan to. Here is a tutorial Jeffrey recommended. I’ve read a few chapter of the textbook Jeffrey’s assigned (Brett Lantz: Machine Learning with R (Packt, 2015), but it’s not open access. If your university library has electronic access, you may want to take a look though.
When you’re stuck, post a question to Stack Overflow. Read a few questions first to see how to ask a question that’s easy to answer – you want to provide a “miniature” dataset and show the code that you’ve tried that’s not working.
Finally, I’m really enjoying using R Notebooks to write up my work-in-progress in a way where the code itself is embedded within regular text, and can easily be exported to HTML, PDF, Word or Latex. Here’s an in-progress version of the network analysis I’ve been working on this week – if you click on the CODE buttons in the right above each visualisation you can see exactly what I did, and even copy it and try it out yourself. Here is a detailed guide to details of how to do this, though the basics are very easy and don’t require much.

Btw, the comments on my blog aren’t working right now, but if you have questions or suggestions, feel free to ask me on Twitter – I’m @jilltxt.

Discover more from Jill Walker Rettberg

Subscribe to get the latest posts sent to your email.

3 thoughts on “Learning R for visualising humanities data”

June 7, 2022 00:39

jill

test

Log in to Reply
December 6, 2022 22:47

KDS

Python and R are currently the programming platforms of choice for data science in practically all domains, and rightfully so. By the way, Python visualization packages can do just about all the stuff R packages can do. Both programming languages are taught in the Linguistics bachelor’s program at the University of Bergen, mainly contextualized in language processing of course. Unfortunately, training in these tools is still lacking in most other undergraduate and graduate programs in the Humanities. Introductions to Python and R (or at least one of these) should be a freshman requirement for all disciplines, to provide a basis in algorithmic thinking, quantitative analysis and visualization that every student needs.

Log in to Reply
1. December 7, 2022 06:57
  
  Jill
  
  Thanks for the comment! We’ve been discussing which programming languages should be taught in the Digital Culture program. Currently it’s JavaScript and HTML and two hours of BASIC (for the history not because it’s a language they’ll need to code in). Quite possibly we should switch to or add Python. Btw, ChatGPT (and Codex) can also write code for you and help you troubleshoot code that doesn’t work. I don’t think that replaces learning how to program yourself, but it could be a wonderful support.
  
  Log in to Reply

Butterflies.ai: how to confuse a chatbot

At last week’s AI STORIES workshop, Gabriele de Seta led a workshop exploring Butterflies.ai, a social media platform where all the users are AIs. Gathering a group of researchers in a room for a few hours to explore and discuss a specific genAI platform turns out to be a really […]

AI and algorithmic culture

From 17th century book factories to AI-generated literature

When I studied literature we mostly read the classics. Great literature, the canon. But that’s not necessarily what most people actually read. What if instead of comparing AI-generated literature to the literary canon, we tried comparing it to super popular and commercial forms of literature instead? Like the folkebøker that […]

AI and algorithmic culture

Synthetic imaginaries and synthetic media

Synthetic media is a current popular term for AI-generated videos, texts and images. I think the first use was only a few years ago in 2018, but I couldn’t find an overview of its use so thought I’d cobble one together here, mostly because I like Elena Pilipets and her […]

AI and algorithmic culture

Can an AI agent really do “PhD level research”?

OpenAI plans to charge $20,000 (USD) a month for an AI agent that can do “PhD level research”. Maybe all the PhDs and postdocs recently fired by DOGE should band together and sell their services as “AI agents” – apparently some people will pay more for robots than people. At […]

AI and algorithmic culture Algorithmic bias

Fra demokrati til algoritmokrati

This is my original Norwegian draft of an essay published in the Danish foreign policy magazine Udenrigs today as part of a special issue on AI and foreign policy. I argue that AI is influencing the way we tell stories, and more seriously, that there is a risk of this […]

Uncategorized

Wikidata as research tool

In 2022 I learned about FAIR data, the movement to make research data Findable, Accessible, Interoperable and Reproducible. One of UiB’s brilliant research librarians, Jenny Ostrup, patiently helped me make the dataset from the Machine Vision project FAIR in 2022 – I wrote a little bit about that in my […]

Jill Walker Rettberg

Tags

Jill Walker Rettberg

Learning R for visualising humanities data

But why learn R?

How to get started with R

Related

Discover more from Jill Walker Rettberg

3 thoughts on “Learning R for visualising humanities data”

jill

KDS

Jill

Leave A Comment Cancel reply

Search Here ….

Tags

Learning R for visualising humanities data

But why learn R?

How to get started with R

Share this:

Related

Discover more from Jill Walker Rettberg

3 thoughts on “Learning R for visualising humanities data”

jill

KDS

Jill

Leave A Comment Cancel reply

Recommended Posts

Share this:

Share this:

Share this:

Share this:

Share this:

Share this: