Visualising Strava data with R

I’ve recently had some fun downloading and displaying my running data from Strava. I’ve been tracking my runs on Strava for the last five years and I thought it would be interesting to make a map showing where I run. Here is one of the plots I made. Each circle is a place in south east Australia where I ran in the given year. The size of the circle corresponds to how many hours I ran there.

I think the plot is a nice visual diary. Looking at the plot, you can see that most of my running took place in my hometown Canberra and that the time I spend running has been increasing. The plot also shows that most years I’ve spent some time running on the south coast and on my grandfather’s farm near Kempsey. You can also see my trips in 2019 and 2020 to the AMSI Summer School. In 2019, the summer school was hosted by UNSW in Sydney and in 2020 it was hosted by La Trobe University in Melbourne. You can also see a circle from this year at Mt Kosciuszko where I ran the Australian Alpine Ascent with my good friend Sarah.

I also made a plot of all my runs on a world map which shows my recent move to California. In this plot all the circles are the same size and I grouped all the runs across the five different years.

I learnt a few things creating these plots and so I thought I would document how I made them.


Creating the plots

  • Strava lets you download all your data by doing a bulk export. The export includes a zipped folder with all your activities in their original file format.
  • My activities where saved as .gpx files and I used this handy python library to convert them to .csv files which I could read into R. For the R code I used the packages “tidyverse”, “maps” and “ozmaps”.
  • Now I had a .csv files for each run. In these files each row corresponded to my location at a given second during the run. What I wanted was a single data frame where each row corresponded to a different run. I found the following way to read in and edit each .csv file:
files <- list.files(path = ".", pattern = "*.csv")
listcsv <- lapply(files, function(x) read_csv(paste0(x)) %>% 
                    select(lat, lon, time) %>% 
                    mutate(hours = n()/3600) %>% 
                    filter(row_number() == 1)
                  )

The first line creates a list of all the .csv files in the working directory. The second line then goes through the list of file names and converts each .csv file into a tibble. I then selected the rows with the time and my location and added a new column with the duration of the run in hours. Finally I removed all the rows except the first row which contains the information about where my run started.

  • Next I combined these separate tibbles into a single tibble using rbind(). I then added some new columns for grouping the runs. I added a column with the year and columns with the longitude and latitude rounded to the nearest whole number.
runs <- do.call(rbind, listcsv) %>% 
  mutate(year = format(time,"%Y"),
         approx_lat = round(lat),
         approx_lon = round(lon))
  • To create the plot where you can see where I ran each year, I grouped the runs by the approximate location and by year. I then calculated the total time spent running at each location each year and calculated the average longitude and latitude. I also removed the runs in the USA by only keeping the runs with a negative latitude.
run_counts_year <- runs %>% 
  group_by(approx_lat, approx_lon, year) %>% 
  summarise(hours = sum(hours),
            lat = mean(lat),
            lon = mean(lon),
            .groups = "drop") %>% 
  select(!approx_lat & !approx_lon)

oz_counts_year <- run_counts_year %>% 
  filter(lat < 0)
  • I then used the package “ozmaps” to plot my running locations on a map of the ACT, New South Wales and Victoria.
oz_states <- ozmaps::ozmap_states %>% 
  filter(NAME == "New South Wales" |
         NAME == "Victoria" |
         NAME == "Australian Capital Territory")

ggplot() + 
  geom_sf(data = oz_states) +
  coord_sf() +
  geom_point(data = oz_counts_year,
             mapping = aes(x = lon, 
                           y = lat,
                           size = hours),
            color = "blue",
            shape = 21) +
  facet_wrap(~ year) +
  theme_bw() 
  • Creating the world map was similar except I didn’t group by year and I kept the runs with positive latitude.
run_counts <- runs %>% 
  group_by(approx_lat, approx_lon) %>% 
  summarise(hours = sum(hours),
            lat = mean(lat),
            lon = mean(lon),
            .groups = "drop") %>% 
  select(!approx_lat & !approx_lon)

world <- map_data("world")
ggplot() +
  geom_polygon(data = world,
               mapping = aes(x = long, y = lat, group = group),
               fill = "lightgrey",
               color = "black",
               size = 0.1) +
  geom_point(data = run_counts,
             mapping = aes(x = lon, 
                           y = lat),
             size = 2,
             shape = 1,
             color = "blue") +
  theme_bw()