DataScience+ We publish R tutorials from scientists at academic and scientific institutions with a goal to give everyone in the world access to a free knowledge. Our tutorials cover different topics including statistics, data manipulation and visualization!
Visualizing Data

Metro Systems Over Time: Part 3

Note, at the time this writing the package gganimate requires the package cowplot. Be sure to install and load the package before continuing.

install.packages("cowplot")
library(cowplot)

Introduction

In Part 1 and Part 2 of this series we made maps of metro systems across four European cities, and then computed Delaunay triangulations and centroids for each city. In the third and final part, we’ll do these same steps but at multiple time points to make a .gif of how metro systems change over time.

Data

As a reminder, our data is the hand corrected values of the data we pulled down from Google. To see how we got the data go back to Part 1: Data.

Maps with Change Over Time

We now have a good sense of what each city’s current metro system looks like, but how did these systems come to be this way? Now we’ll look at how these systems have changed and grown over time. That’s why at the beginning we made a column for opened_year. At this point the code gets less elegant but we’ll go through it step by step. It’s all the same principles as when we made our figures earlier.

The main idea of the following code is that we’re going to create unique triangulations for each year within each city. As more metro stations get added each year the triangulation will change. Just as we had data_deldir_delsgs and data_deldir_cent, we’re going to start by creating two empty data frames time_deldir_delsgs and time_deldir_sum (remember that our centroid data frame was based on the summary data). With our empty data frames initialized we can make a for loop. We want to go through each year, but for each city separately, so our first for loop goes through each city, filtering our data to only the city in question. Next we have our second for loop going through each year starting with the minimum year in the data for that city and up to 2015, the maximum year for the full data set. For a given year we filter() to include only metro stops that were opened that year or earlier. We do equal to or less than because we don’t want to ignore metro stops from earlier years, we want the whole metro system as it exists for a given year. Note, we need at least three points to make a triangle, and you may think that a city wouldn’t ever have only one or two metro stops but you would be wrong (cough Barcelona cough), so we’re going to put a stop gap saying if the number of data points is less than three the loop should skip that year and move to the next one.

Okay, assuming there are at least three data points though we’re going to run the deldir() call and then save it in year_deldir. Then we create two new data frames. the first is year_deldir_delsgs which contains the delsgs information from deldir. We’re going to add two columns too, city and opened_year, so we know which city and year this data comes from. We then add this information to our existing time_deldir_delsgs data frame with a bind_rows() call. We then do the same thing to create year_deldir_sum, only we pull out the summary information from year_deldir instead of the delsgs information. We also add our city and opened_year columns and then bind_rows() it with time_deldir_sum. The loop does this for every city from the minimum year in the data up to 2015. See below the head of the two data frames we created.

time_deldir_delsgs = data.frame()

time_deldir_sum = data.frame()

for(c in c("Paris", "Berlin", "Barcelona", "Prague")) {
  data_city = filter(data, city == c)
  for(year in min(data_city$opened_year):2015) {
    data_year = filter(data_city, opened_year <= year)
    
    # Add condition to skip if number of stops less than 3
    if(dim(data_year)[1] < 3) next

      year_deldir = deldir(data_year$lon, data_year$lat)

      year_deldir_delsgs = year_deldir$delsgs %>%
        mutate(city = c) %>%
        mutate(opened_year = year)
    
      time_deldir_delsgs = bind_rows(time_deldir_delsgs, year_deldir_delsgs)
    
      year_deldir_sum = year_deldir$summary %>%
        mutate(city = c) %>%
        mutate(opened_year = year)
    
      time_deldir_sum = bind_rows(time_deldir_sum, year_deldir_sum)
  }
}

head(time_deldir_delsgs)
head(time_deldir_sum)
        x1       y1       x2       y2 ind1 ind2  city opened_year
1 2.358279 48.85343 2.369510 48.85302   11    2 Paris        1900
2 2.374377 48.84430 2.369510 48.85302    9    2 Paris        1900
3 2.374377 48.84430 2.358279 48.85343    9   11 Paris        1900
4 2.414484 48.84640 2.369510 48.85302   16    2 Paris        1900
5 2.414484 48.84640 2.374377 48.84430   16    9 Paris        1900
6 2.386581 48.84732 2.369510 48.85302   18    2 Paris        1900

         x        y n.tri del.area  del.wts n.tside nbpt dir.area  dir.wts  city opened_year
1 2.289364 48.87577     4  1.9e-05 0.012510       4    2 0.000066 0.009329 Paris        1900
2 2.369510 48.85302     5  7.0e-05 0.047518       4    2 0.000520 0.073787 Paris        1900
3 2.290121 48.86685     5  3.7e-05 0.024907       5    0 0.000074 0.010541 Paris        1900
4 2.313310 48.86750     4  3.7e-05 0.024692       3    4 0.000388 0.054946 Paris        1900
5 2.294900 48.87393     5  1.6e-05 0.010551       3    2 0.000061 0.008594 Paris        1900
6 2.347301 48.85869     4  1.1e-05 0.007359       3    4 0.000392 0.055588 Paris        1900

As you may recall though we’re not necessarily interested in all the summary information, we just want it to compute our centroid. So, we make a new data frame time_deldir_cent. The code is the same as our earlier code for computing centroids, the only difference is that we’ll also group by opened_year, not just city, since we want unique centroids for each year for each city. See part of the data frame of the centroids below.

time_deldir_cent = time_deldir_sum %>%
  group_by(city, opened_year) %>%
  summarise(cent_x = sum(x * del.wts),
            cent_y = sum(y * del.wts)) %>%
  ungroup()
head(time_deldir_cent)
# A tibble: 6 × 4
       city opened_year   cent_x   cent_y
      <chr>       <int>    <dbl>    <dbl>
1 Barcelona        1924 2.116839 41.38132
2 Barcelona        1925 2.120019 41.37782
3 Barcelona        1926 2.121921 41.37834
4 Barcelona        1927 2.121921 41.37834
5 Barcelona        1928 2.122325 41.37384
6 Barcelona        1929 2.113628 41.37543

There’s still one more thing I want to do before we make our figures. Right now the figures will have different start dates depending on when the first metro stop was built in a given city. Instead, I want all figures to start at the same year so we see them change over time with the same start date for each city. To do this we’ll make a new data frame called years that simply lists the years 1900 to 2015 four times, once for each city. We then do a left_join() with our data. As a result any time the opened_year in question is not found in the data frame for a given city an empty row will be added, empty except for the opened_year and city values. You’ll also notice that I filter()ed to only include decade years (1900, 1910, 1920, etc.), and the year 2015 so it includes the last year of our data. This is because if we include every year our gif will be very large and non-portable. Also it’s more dramatic to see changes every 10 years.

years = data.frame(opened_year = rep(seq(1900, 2015), 4),
                   city = c(rep("Paris", 116), rep("Berlin", 116),
                            rep("Barcelona", 116), rep("Prague", 116)))

data_time = left_join(years, data) %>%
  mutate(opened_by_year = ifelse(opened_year %% 10 == 0, opened_year,
                                 opened_year + (10 - (opened_year %% 10)))) %>%
  filter(opened_by_year %
  filter(opened_year %% 10 == 0 | opened_year == 2015)

time_deldir_cent_sub = time_deldir_cent %>%
  filter(opened_year %% 10 == 0 | opened_year == 2015)

I kept saying we were going to make maps showing the change over time, but how are we going to do that? Well instead of building a single static plot for each city we’re going to build an animation where as the year changes so will the map. To do this we’ll use the package gganimate which works on top of ggplot2 (which is useful since we’re already using ggmap which works on top of ggplot2). We build our plot just as we would any other ggplot2 figure, but for data we want to add the frame setting. The frame is the thing in the plot that changes, in our case opened_year. Also, while we only want to plot the triangulations and centroids specific to a given year, we want the points for the metro stops to be additive. For example, when frame is 2000 we still want the points from 1990 to be plotted. To do this we add cumulative = TRUE to the call for those points. Finally, since we updated our data to include empty rows so that all plots start on 1900, all plots will have a frame starting at 1900, even if there are no data points to plot. I’ve again made a function to make our plots. See below for the code for the Paris map as well as all four animations. Also, notice that in 1920 (actually 1912) Barcelona gets their first metro stop…but doesn’t get anymore until 1930 (actually 1924). Take a look to see if you can find any other interesting things about how the systems changed over time.

devtools::install_github("dgrtwo/gganimate")
library(gganimate)

time_plot = function(city_name, city_map){
  ggmap(city_map, extent = "device") +
    geom_segment(data = subset(time_deldir_delsgs_sub, city == city_name),
                 aes(x = x1, y = y1, xend = x2, yend = y2, frame = opened_year),
                 size = 1, color= "#92c5de") +
    geom_point(data = subset(data_time, city == city_name),
               aes(x = lon, y = lat, frame = opened_by_year, cumulative = TRUE),
               color = "#0571b0", size = 3) +
    geom_point(data = subset(time_deldir_cent_sub, city == city_name),
               aes(x = cent_x, y = cent_y, frame = opened_year),
               size = 6, color= "#ca0020") +
      theme(plot.title = element_text(hjust = 0.5))
}

paris_time.plot = time_plot("Paris", paris_map)
gganimate(paris_time.plot)

Plot for Paris:

Conclusion

In these three post we looked at how the metro systems of four European cities changed over time. To do this we used a lot of different packages. We used the packages dplyr, tidyr, purrr, and ggplot2, which are all now a part of the package tidyverse. We used used two other plotting packages that build upon ggplot2, ggmap and gganimate. Finally we used the deldir package to make Delaunay triangulations and compute centroids of city metro systems over time. All of these skills can be applied to any other type of spacial data with unique shapes, and can be used to make your very own gifs. Try your city as a practice exercise!