Working with ABC Radio’s API in R

This post is about using ABC Radio’s API to create a record of all the songs played on Triple J in a given time period. An API (application programming interface) is a connection between two computer programs. Many websites have API’s that allow users to access data from that website. ABC Radio has an API that lets users access a record of the songs played on the station Triple J since around May 2014. Below I’ll show how to access this information and how to transform it into a format that’s easier to work with. All code can be found on GitHub here.

Packages

To access the API in R I used the packages “httr” and “jsonlite”. To transform the data from the API I used the packages “tidyverse” and “lubridate”.

library(httr)
library(jsonlite)
library(tidyverse)
library(lubridate)

ABC Radio’s API

The URL for the ABC radio’s API is:

https://music.abcradio.net.au /api/v1/plays/search.json.

If you type this into a web browser, you can see a bunch of text which actually lists information about the ten most recently played songs on ABC’s different radio stations. To see only the songs played on Triple J, you can go to:

https://music.abcradio.net.au/api/v1/plays/search.json?station=triplej.

The extra text “station=triplej” is called a parameter. We can add more parameters such as “from” and “to”. The link:

https://music.abcradio.net.au/api/v1/plays/search.json?station=triplej&from=2021-01-16T00:00:00.000Z&to=2021-01-16T00:30:00.000Z

shows information about the songs played in the 30 minutes after midnight UTC on the 16th of January last year. The last parameter that we will use is limit which specifies the number of songs to include. The link:

https://music.abcradio.net.au/api/v1/plays/search.json?station=triplej&limit=1

includes information about the most recently played song. The default for limit is 10 and the largest possible value is 100. We’ll see later that this cut off at 100 makes downloading a lot of songs a little tricky. But first let’s see how we can access the information from ABC Radio’s API in R.

Accessing an API in R

We now know how to use ABC Radio’s API to get information about the songs played on Triple J but how do we use this information in R and how is the information stored. The function GET from the package “httr” enables us to access the API in R. The input to GET is simply the same sort of URL that we’ve been using to view the API online. The below code stores information about the 5 songs played on Triple J just before 5 am on the 5th of May 2015

res <- GET("https://music.abcradio.net.au/api/v1/plays/search.json?station=triplej&limit=5&to=2015-05-05T05:00:00.000Z")
res
# Response [https://music.abcradio.net.au/api/v1/plays/search.json?station=triplej&limit=5&to=2015-05-05T05:00:00.000Z]
# Date: 2022-03-02 23:25
# Status: 200
# Content-Type: application/json
# Size: 62.9 kB

The line “Status: 200” tells us that we have successfully grabbed some data from the API. There are many other status codes that mean different thing but for now we’ll be happy remembering that “Status: 200” means everything has worked.

The information from the API is stored within the object res. To change it from a JSON file to a list we can use the function “fromJSON” from the library jsonlite. This is done below

data <- fromJSON(rawToChar(res$content))
names(data)
# "total"  "offset" "items"

The information about the individual songs is stored under items. There is a lot of information about each song including song length, copyright information and a link to an image of the album cover. For now, we’ll just try to find the name of the song, the name of the artist and the time the song played. We can find each of these under “items”. Finding the song title and the played time are pretty straight forward:

data$items$recording$title
# "First Light" "Dog"         "Begin Again" "Rot"         "Back To You"
data$items$played_time
# "2015-05-05T04:55:56+00:00" "2015-05-05T04:52:21+00:00" "2015-05-05T04:47:52+00:00" "2015-05-05T04:41:00+00:00" "2015-05-05T04:38:36+00:00"

Finding the artist name is a bit trickier.

data$items$recording$artists
# [[1]]
# entity       arid          name artwork
# 1 Artist maQeOWJQe1 Django Django    NULL
# links
# 1 Link, mlD5AM2R5J, http://musicbrainz.org/artist/4bfce038-b1a0-4bc4-abe1-b679ab900f03, 4bfce038-b1a0-4bc4-abe1-b679ab900f03, MusicBrainz artist, NA, NA, NA, service, musicbrainz, TRUE
# is_australian    type role
# 1            NA primary   NA
# 
# [[2]]
# entity       arid      name artwork
# 1 Artist maXE59XXz7 Andy Bull    NULL
# links
# 1 Link, mlByoXMP5L, http://musicbrainz.org/artist/3a837db9-0602-4957-8340-05ae82bc39ef, 3a837db9-0602-4957-8340-05ae82bc39ef, MusicBrainz artist, NA, NA, NA, service, musicbrainz, TRUE
# is_australian    type role
# 1            NA primary   NA
# ....
# ....

We can see that each song actually has a lot of information about each artist but we’re only interested in the artist name. By using “map_chr()” from the library “tidyverse” we can grab each song’s artist name.

map_chr(data$items$recording$artists, ~.$name[1])
# [1] "Django Django" "Andy Bull"     "Purity Ring"   "Northlane"     "Twerps"

Using name[1] means that if there are multiple artists, then we only select the first one. With all of these in place we can create a tibble with the information about these five songs.

list <- data$items
tb <- tibble(
     song_title = list$recording$title,
     artist = map_chr(list$recording$artists, ~.$name[1]),
     played_time = ymd_hms(list$played_time)
) %>% 
  arrange(played_time)
tb
# # A tibble: 5 x 3
# song_title  artist        played_time        
# <chr>       <chr>         <dttm>             
# 1 Back To You Twerps        2015-05-05 04:38:36
# 2 Rot         Northlane     2015-05-05 04:41:00
# 3 Begin Again Purity Ring   2015-05-05 04:47:52
# 4 Dog         Andy Bull     2015-05-05 04:52:21
# 5 First Light Django Django 2015-05-05 04:55:56

Downloading lots of songs

The maximum number of songs we can access from the API at one time is 100. This means that if we want to download a lot of songs we’ll need to use some sort of loop. Below is some code which takes in two dates and times in UTC and fetches all the songs played between the two times. The idea is simply to grab all the songs played in a five hour interval and then move on to the next five hour interval. I found including an optional value “progress” useful for the debugging. This code is from the file get_songs.r on my GitHub.

download <- function(from, to){
  base <- "https://music.abcradio.net.au/api/v1/plays/search.json?limit=100&offset=0&page=0&station=triplej&from="
  from_char <- format(from, "%Y-%m-%dT%H:%M:%S.000Z")
  to_char <- format(to, "%Y-%m-%dT%H:%M:%S.000Z")
  url <- paste0(base,
                from_char,
                "&to=",
                to_char)
  res <- GET(url)
  if (res$status == 200) {
    data <- fromJSON(rawToChar(res$content))
    list <- data$items
    tb <- tibble(
      song_title = list$recording$title,
      artist = map_chr(list$recording$artists, ~.$name[1]),
      played_time = ymd_hms(list$played_time)
    ) %>% 
      arrange(played_time)
    return(tb)
  }
}

get_songs <- function(start, end, progress = FALSE){
  from <- ymd_hms(start)
  to <- from + dhours(5)
  end <- ymd_hms(end)
  
  songs <- NULL
  
  while (to < end) {
    if (progress) {
      print(from)
      print(to)
    }
    tb <- download(from, to)
    songs <- bind_rows(songs, tb)
    from <- to + dseconds(1)
    to <- to + dhours(5)
  }
  tb <- download(from, end)
  songs <- bind_rows(songs, tb)
  return(songs)
}

An example

Using the functions defined above, we can download all the sounds played in the year 2021 (measured in AEDT).

songs <- get_songs(ymd_hms("2020-12-31 13:00:01 UTC"),
                   ymd_hms("2021-12-31 12:59:59 UTC",
                   progress = TRUE)

This code takes a little while to run so I have saved the output as a .csv file. There are lots of things I hope to do with this data such as looking at the popularity of different songs over time. For example, here’s a plot showing the cumulative plays of Genesis Owusu’s top six songs.

I’ve also looked at using the data from last year to get a list of Triple J’s most played artists. Unfortunately, it doesn’t line up with the official list. The issue is that I’m only recording the primary artist on each song and not any of the secondary artists. Hopefully I’ll address this in a later blog post! I would also like to an analysis of how plays on Triple J correlate with song ranking in the Hottest 100.