DataScience+ We publish R & Python tutorials from scientists at academic and scientific institutions with a goal to give everyone in the world access to a free knowledge. Our tutorials cover different topics including statistics, data manipulation and visualization!
Visualizing Data

Metro Systems Over Time: Part 2

In Part 1 of this series we collected geodata from Google of metro stops and plotted them on maps. In Part 2 we’ll be building Delaunay triangulations on top of those maps and computing the centroid of the network. This post will include some pretty advanced use of tidyverse packages. For more information on some of these calls look at the tidyverse documentation.

Data

As a reminder, our data is the hand corrected values of the data we pulled down from Google. To see how we got the data go back to Part 1: Data.

Maps with Delaunay Triangulations and Centroids

With our maps and data points in place let’s compute the Delaunay triangulation for each city. This will let us find the area a given city’s metro covers, and allows us to compute a center point, or centroid, for the metro system. We do this with the deldir package. First though, I am going to use a function from tidyr called nest() which allows me to collapse a bunch of data into a single cell. By nesting by city I get one row for each city and then the rest of the data for each column is a list of values in one cell. Additionally, I can collapse all of my other columns into a single column using .key, in this case this new column is called location_info. Think of it as a data frame tucked within a cell of a data frame. With my data nested I can make a new column called deldir that will have all of the information from my deldir() call. The deldir() call simply takes two lists of continuous data points. It then computes several things, including the area of the shape and the edges of all the segments connecting the points. How do we access this information though? We can pull this information out with a purrr call, map(). The map() call takes in some data and a function and applies the data to the function in an iterative fashion. For our purposes though we’re saying we want to take the data in the form of the column deldir and pull out the del.area. Thanks to the mutate() call we can then save it to a new column. We can do the same thing with delsgs (the segments of the shape) and summary (more information about the individual triangles). See the fully nested data frame below.

library(purrr)
library(deldir)

data_deldir = data %>%
  nest(-city, .key = location_info) %>%
  mutate(deldir = map(location_info, function(df) deldir(df$lon, df$lat))) %>%
  mutate(del.area = map(deldir, "del.area")) %>%
  mutate(delsgs = map(deldir, "delsgs")) %>%
  mutate(summary = map(deldir, "summary"))
data_deldir
# A tibble: 4 × 6
       city      location_info       deldir  del.area                 delsgs                summary
     <fctr>             <list>       <list>    <list>                 <list>                 <list>
1     Paris <tibble [298 × 9]> <S3: deldir> <dbl [1]> <data.frame [849 × 6]> <data.frame [287 × 9]>
2    Berlin <tibble [173 × 9]> <S3: deldir> <dbl [1]> <data.frame [499 × 6]> <data.frame [171 × 9]>
3 Barcelona <tibble [149 × 9]> <S3: deldir> <dbl [1]> <data.frame [433 × 6]> <data.frame [148 × 9]>
4    Prague  <tibble [58 × 9]> <S3: deldir> <dbl [1]> <data.frame [161 × 6]>  <data.frame [58 × 9]>

Based on these areas it looks like the Berlin metro covers the most area at 0.059279 while Barcelona covers the smallest area at 0.016332. Now that we have our nested data frame with all pertinent information, we’re going to unnest the data necessary for our new plots. First we need the delsgs data, which we use to draw the lines connecting the metro stops. To do this we’ll make a new data frame, dropping all columns except for city and delsgs. Then we unnest() the data frame. This will expand the delsgs column that had nested values, giving us many more rows and many more columns. The x1, y1, x2, and y1 values will be used later in our plot to draw the edges of our triangles. See part of the unnested data frame below.

data_deldir_delsgs = data_deldir %>%
  select(city, delsgs) %>%
  unnest()
head(data_deldir_delsgs)
# A tibble: 6 × 7
    city       x1       y1       x2       y2  ind1  ind2
  <fctr>    <dbl>    <dbl>    <dbl>    <dbl> <int> <int>
1  Paris 2.366928 48.78793 2.359279 48.79272   283   282
2  Paris 2.433489 48.77262 2.366928 48.78793    72   283
3  Paris 2.450590 48.78984 2.433489 48.77262    74    72
4  Paris 2.450590 48.78984 2.459319 48.77978    74    73
5  Paris 2.455281 48.76805 2.433489 48.77262   198    72
6  Paris 2.455281 48.76805 2.459319 48.77978   198    73

In addition to the edges of the shape, we also want the centroid. To do this we’ll first make a new data frame focusing on just the city and summary information. We then unnest() the data frame just as we did for the edges, however we don’t stop here. What we’re really interested in is the centroid, which we need to compute ourselves. To do this we’ll first group_by() city. Then we’re going to summarise() the data. To compute the x-value for the centroid, cent_x, we’re going to take the x column, which contains the x-coordinates of all of the points, and multiply each point by the del.wts column, which contains the weights of the areas of each triangle. By adding these numbers together we get the x-value of the centroid of the entire figure. We can do the same thing for the y-value. See the table below for the data summarised to give us the centroids for each city.

data_deldir_cent = data_deldir %>%
  select(city, summary) %>%
  unnest() %>%
  group_by(city) %>%
  summarise(cent_x = sum(x * del.wts),
            cent_y = sum(y * del.wts)) %>%
  ungroup()
data_deldir_cent
# A tibble: 4 × 3
       city    cent_x   cent_y
     <fctr>     <dbl>    <dbl>
1 Barcelona  2.137923 41.38708
2    Berlin 13.402654 52.51054
3     Paris  2.353365 48.85813
4    Prague 14.447439 50.07588

Now we can update our figures with the triangulations and centroids. I’ve again made a function to build the four maps. As before we start with ggmap() and our city specific map object. Next we’ll use geom_segment() to draw our edges. To do this we’ll use x1, y1, x2, and y2 from our data_deldir_delsgs data frame we made earlier. We then plot the actual metro stop points just as we did in our original map with geom_point(). Finally we end with one more geom_point() call, this time on our data_deldir_cent data frame to plot the centroid specific to each city. See the four updated maps below. Again, I’ve left the code visible for the Paris map to see how the function works and hidden the rest.

del_plot = function(city_name, city_map){
  ggmap(city_map, extent = "device") +
    geom_segment(data = subset(data_deldir_delsgs, city == city_name), aes(x = x1, y = y1, xend = x2, yend = y2),
                 size = 1, color= "#92c5de") +
    geom_point(data = subset(data, city == city_name), aes(x = lon, y = lat),
               color = "#0571b0", size = 3) +
    geom_point(data = subset(data_deldir_cent, city == city_name),
               aes(x = cent_x, y = cent_y),
               size = 6, color= "#ca0020")
}

paris_del.plot = del_plot("Paris", paris_map)
paris_del.plot

Plot for Paris:

Conclusion

In Part 2 of this series we computed Delaunay triangulations and centroids for each of our our city’s metro systems. This included some more complicated tidyverse calls such as nesting and unnesting our data. In the third and final part of this series we’ll look at how the systems change over time and show it with a .gif.