Data analysis for a paper based on data from the Machine Vision in Art, Games and Narratives Database.

Jill Walker Rettberg

This is an “R Notebook”, which produces an HTML page that has code embedded in it. I am using this to document my analysis of data from the Machine Vision in Art, Games and Narratives database that our research team has developed.

These are characters that interact with machine vision technologies in the 500 creative works we have analysed in the database. For more information, check out the database!

Load the Tidyverse library

The first step is to load the Tidyverse library into R. If you’re not familiar with R and R notebooks like this one, don’t worry: you’ll see some code next that’s just what the console responds with when you load a library.

## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5     ✓ purrr   0.3.4
## ✓ tibble  3.1.6     ✓ dplyr   1.0.7
## ✓ tidyr   1.1.4     ✓ stringr 1.4.0
## ✓ readr   2.1.1     ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

Read the file Characters.csv into R

The next bit of code simply reads the file Characters.csv, while specifying “factors” for some of the columns. Factors are fixed vocabularies - they specify a fixed number of possible values that a variable (a column heading in this case) can have.

#Import characters file (Characters.csv)
#define column types and factors
Characters <- read_csv(
  col_types = cols(
    CharacterID = col_integer(),
    Character = col_character(),
    Species = col_factor(levels = c(
      "Animal", "Cyborg", "Fictional", 
      "Human", "Machine", "Unknown")),
    Gender = col_factor(levels = c(
      "Female","Male","Non-binary or Other", "Trans Woman",
    RaceOrEthnicity = col_factor(levels = c(
      "Asian", "Black", "White", "Person of Colour", "Immigrant", "Indigenous",
      "Complex", "Unknown")),
    Age = col_factor(levels = c(
      "Child", "Young Adult", "Adult", "Elderly", 
    Sexuality = col_factor(levels = c(
      "Homosexual", "Heterosexual", "Bi-sexual", "Other",
    IsGroup = col_logical(),
    IsCustomizable = col_logical()

Looking at the data

Now we have a “tibble”, which basically is a table the same as you’d see if you loaded the csv file into a spreadsheet editor. The tibble is called “Characters”.

If you type Characters in the R console window you’ll see the first lines of the table, which give you an overview of all the characters who interact with machine vision technologies in the 500 Creative works we analysed.


Different species

We can use the barplot function to generate a barchart showing the distribution between different species. Unsurprisingly humans dominate, but there are also various other species. There aren’t many cases where it’s not clear what the species of the character is.


Some of these are groups of characters, rather than individuals. 90 of the characters in the dataset are actually groups of people, and when there are people of different species in a group we tagged them as Unknown.

Let’s run the code again but without the group characters. IsGroup is a variable (i.e. a column) in Characters that has the value TRUE if the character is a group and FALSE if it’s an individual.

This code uses a subset of the Species column: adding the [Characters$IsGroup==FALSE] means only use the subset where the IsGroup value is FALSE.


Characters %>% 
 group_by(Species) %>% 

Filter out the Machine and Cyborg characters

Now we want to look at just the Machine and Cyborg characters. Let’s also group them by gender.

Machine_Cyborg_Characters <- filter(
  Characters, Species == "Cyborg"|Species =="Machine" & IsGroup==FALSE & IsCustomizable==FALSE)

Machine_Cyborg_Characters %>% 
  group_by(Gender) %>% 

Make a barchart showing the gender distribution of Machine and Cyborg characters

I’m also adding a few lines to the barplot() function to name the x and y axis, and to change the size of the variable names (that is, the column names in the original data). The last line sets the size of the axis labels which makes no difference since I set it to 1 but I wanted to keep it here so I remember how to do it later!

The names of the variables (the bars/columns) need to be tiny so as to fit - if they don’t fit, R simply leaves them out which is annoying.

        xlab = "Verbs", ylab = "Frequency", 
        cex.names = 0.6,