Animate Global Airports and Flights Using ggplot2 and gganimate
In this earlier article, we visualized the global flights and airports as a static graphic. This current work tweaks the static graphic into an animation to make the visualization much more dynamic and engaging. The early part of data wrangling is identical to the static graphic. If you’re already familiar with the data cleanup, you can skip directly to the edits designed for animation. 🌻
The data is sourced from open flights, and can be directly imported via the URL. We’ll process 2 datasets in a total of 3 major steps.
(1) Import the airport dataset, and update its column names based on the website’s description. To make it easier to navigate through the dataset, we’ll only select columns used in this visualization.
library(ggplot2)library(dplyr)library(tidyr)library(stringr) # read the dataseturl.airport <-"https://raw.githubusercontent.com/jpatokal/openflights/master/data/airports.dat" airport <-read.table( url.airport, sep =",", header =FALSE) %>%as_tibble() # Update column names based on website description; # use "lat" for latitude, and "long" for longitude, # to be consistent with the 'map_data' which is built in R colnames(airport) <-c("airport_ID", "name", "city", "country", "IATA", "ICAO", "lat", "long", "altitude", "timezone","DST", "Tz_database", "time_zone", "type") # select columns useful in this visualizationairport.selected <- airport %>%select(name, airport_ID, lat, long, country) %>%mutate(airport_ID =as.character(airport_ID)) head(airport.selected, n =3)
Output:
# A tibble: 3 × 5 name airport_ID lat long country <chr> <chr> <dbl> <dbl> <chr> 1 Goroka Airport 1 -6.08 145. Papua New Guinea 2 Madang Airport 2 -5.21 146. Papua New Guinea 3 Mount Hagen Kagamuga Airport 3 -5.83 144. Papua New Guinea
This airport.selected serves two purposes:
It will be directly used to create an airport scatterplot: the long variable (longitude) mapped to the x, and lat (latitude) to y.
It will merge with the airline dataset via the column key airport_ID, with flight lines drawn by connecting the associated airports.
For this airline dataset, we need to do two more critical steps:
Assign a unique ID to each flight. By this step, each row represents a unique flight line. This unique flight ID will be mapped to the group aesthetic in geom_line(), and specifies which two data points should be connected to make one flight line.
n <-nrow(airline)airline <- airline %>%mutate(flight_ID =1:n)
Pivot the dataset into a tidy structure, such that each unique flight occupies two rows: the source airport in one row, and the destination airport in another row. The airport_ID variable connects the airline and airport datasets, and serves as the key (shared) column to guide the data merge in the next step.
(3)Merge the airport and airline datasets using the shared column airport_ID. The merged dataset has information about each airline, including the flight ID, and associated departure (source) and destination airports noted with airports ID and latitude and longitude.
# flights air.all <- airline.tidy %>%left_join(airport.selected, by ="airport_ID") head(air.all, n =4)
Output:
# A tibble: 4 × 7 flight_ID direction airport_ID name lat long country <int> <chr> <chr> <chr> <dbl> <dbl> <chr> 1 1 source 2965 Sochi International Airp… 43.4 40.0 Russia 2 1 destination 2990 Kazan International Airp… 55.6 49.3 Russia 3 2 source 2966 Astrakhan Airport 46.3 48.0 Russia 4 2 destination 2990 Kazan International Airp… 55.6 49.3 Russia
The data wrangling above is identical to the creation of the static graphic. Edits 1-6 below the line are particularly catered for animation.
Edit 1: Create another variable, whichFrame, to determine which frames of animation the points and lines should be displayed in. Here let’s design each frame to display 10% of the total dataset. To achieve this, we assign each airline (and its associated airports) a frame index randomly drawn from 1 to 10. We’ll use this new dataset air.all.framed to visualize flight lines.
# A tibble: 6 × 8 # Groups: flight_ID [3] flight_ID direction airport_ID name lat long country whichFrame <int> <chr> <chr> <chr> <dbl> <dbl> <chr> <fct> 1 1 source 2965 Sochi Interna… 43.4 40.0 Russia 2 2 1 destination 2990 Kazan Interna… 55.6 49.3 Russia 2 3 2 source 2966 Astrakhan Air… 46.3 48.0 Russia 1 4 2 destination 2990 Kazan Interna… 55.6 49.3 Russia 1 5 3 source 2966 Astrakhan Air… 46.3 48.0 Russia 8 6 3 destination 2962 Mineralnyye V… 44.2 43.1 Russia 8
Edit 2: So far each frame contains a large number of duplicated airports shared by different flights. Here we only select unique airports to display in each frame. This reduces the number of data points to be plotted, and makes it faster to render the graphic.
# "airport.framed" will be mapped to airport scatterplotairport.framed <- air.all.framed %>%group_by(whichFrame) %>%distinct(name, lat, long) head(airport.framed, n =4)
Output:
# A tibble: 4 × 4 # Groups: whichFrame [2] whichFrame name lat long <fct> <chr> <dbl> <dbl> 1 2 Sochi International Airport 43.4 40.0 2 2 Kazan International Airport 55.6 49.3 3 1 Astrakhan Airport 46.3 48.0 4 1 Kazan International Airport 55.6 49.3
Edit 3: Remove rows containing NA values. Run air.all.framed[!complete.cases(air.all.framed), ] (with exclamation mark !), and it displays rows containing missing values. This test shows that all these rows have NA in lat and long, and will incur an error when rendered into animation (but not in static graphic). Here we remove all these rows with the code below (without!).
First create a static plot of all data, an overlap of all 10 different frames.
Edit 4: Instead of putting the map dataset my.world inside the ggplot() line (as in the static graphic), here we put it locally in geom_polygon. Meanwhile, we put dataset air.all.framed in the ggplot() line as the “global” dataset, such that transition_states() has access to the whichFrame variable to “facet” the plot into frames of animation.
Edit 5: As the entire data is divided into 10 separate frames, we want to make the data in each frame visually more prominent, and we do so by increasing the alpha of the lines and points from 0.6 to 0.8 or 1. As the current static plot is an overlap of all 10 different frames, it would appear over-plotted. Once displayed sequentially in animation, each frame will look just right.
my.world <-map_data("world") p.static <- air.all.framed %>%ggplot(aes(x = long, y = lat)) +# create a world map backgroundgeom_polygon(data = my.world, aes(group = group), color ="tomato", linewidth = .1, show.legend = F, fill ="black") +# a black themetheme_void() +theme(plot.background =element_rect(fill ="black")) +# add flight linesgeom_line(aes(group = flight_ID),color ="turquoise", linewidth = .1, alpha = .8) +# add airports, using the "airport.framed" dataset (duplicates removed)geom_point(data = airport.framed,color ="yellow", shape =".", alpha =1) +# add title at bottom left cornerannotate(geom ="text", x =-180, y =-60, hjust =0,label ="Airports and Flights", color ="snow3", size =4.5, fontface ="bold" ) +# add caption at bottom left cornerlabs(caption ="data source: OpenFlights https://openflights.org/data.html") +theme(plot.caption =element_text(color ="snow3", hjust = .05, margin =margin(b =5, unit ="pt"))) +# coordinatecoord_fixed(ratio =1.2, # adjust aspect ratioylim =c(-60, 90)) # remove Antarctica p.static
Edit 6: Use gganimate package to render the static ggplot2 object into an animation: using the whichFrame variable to “facet” the plot into subplots (the states), which are then sequentially displayed on a time scale. (It takes a while to render the animation, e.g., 2~3 minutes in an Apple M1 Pro computer)
library(gganimate) p.static +transition_states(# instant transition between different states (frames)transition_length =0,# the "faceting" variablestates = whichFrame, # transition the last state to the first state # to continue the animationwrap = T)
library(ggplot2)library(dplyr)library(tidyr)library(stringr) # (1) Import and clean up the airport dataseturl.airport <-"https://raw.githubusercontent.com/jpatokal/openflights/master/data/airports.dat" airport <-read.table( url.airport, sep =",", header =FALSE) %>%as_tibble() # Update column names based on website description; colnames(airport) <-c("airport_ID", "name", "city", "country", "IATA", "ICAO", "lat", "long", "altitude", "timezone","DST", "Tz_database", "time_zone", "type") # select columns useful in this visualizationairport.selected <- airport %>%select(name, airport_ID, lat, long, country) %>%mutate(airport_ID =as.character(airport_ID)) head(airport.selected, n =3) # (2) Load and clean up the airline dataset.url.airline <-"https://raw.githubusercontent.com/jpatokal/openflights/master/data/routes.dat" airline <-read.table( url.airline, sep =",", header = F) %>%as_tibble() # Update column names based on website descriptioncolnames(airline) <-c("airline", "airline_ID", "source_airport", "source_airport_ID", "destination_airport","destination_airport_ID", "Codeshare", "Stops", "Equipment") # select useful columnsairline <- airline %>%select(source_airport_ID, destination_airport_ID) # For this airline dataset, we need to do two more critical steps:## Assign a unique ID to each flight to be mapped to the 'group' aesthetic for 'geom_line'.n <-nrow(airline)airline <- airline %>%mutate(flight_ID =1:n) ## Pivot the dataset into a tidy structureairline.tidy <- airline %>%pivot_longer(-flight_ID, names_to ="direction", values_to ="airport_ID") %>%mutate(direction =str_remove(direction, "_airport_ID")) head(airline.tidy, n =4) # (3) Merge the airport and airline datasets.air.all <- airline.tidy %>%left_join(airport.selected, by ="airport_ID") head(air.all, n =4) # The code above is identical to that of a STATIC graphic. # Edits 1-6 below are particularly catered for ANIMATION. # Edit 1: Create another variable, `whichFrame`, to determine which frames of animation the points and lines should be displayed in.air.all.framed <- air.all %>%group_by(flight_ID) %>%mutate(whichFrame =sample(10, 1) %>%factor()) head(air.all.framed, n =6) # Edit 2: In each frame, remove duplicated airports (shared by different flights). airport.framed <- air.all.framed %>%group_by(whichFrame) %>%distinct(name, lat, long) # Edit 3: Remove rows containing `NA` values. air.all.framed <- air.all.framed[complete.cases(air.all.framed), ] # Make plots now! # First create a static plot of all data, an overlap of all 10 different frames: # Edit 4: Mapping the world map dataset `my.world` locally in `geom_polygon`, and map `air.all.framed` in the `ggplot` line as the "global" dataset. # Edit 5: Use alpha = 1 for points and lines. my.world <-map_data("world") p.static <- air.all.framed %>%ggplot(aes(x = long, y = lat)) +# create a world map backgroundgeom_polygon(data = my.world, aes(group = group), color ="tomato", linewidth = .1, show.legend = F, fill ="black") +# a black themetheme_void() +theme(plot.background =element_rect(fill ="black")) +# add flight linesgeom_line(aes(group = flight_ID),color ="turquoise", linewidth = .1, alpha = .8) +# add airports, using the "airport.framed" dataset (duplicates removed)geom_point(data = airport.framed, color ="yellow", shape =".", alpha =1) +# add title at bottom left cornerannotate(geom ="text", x =-180, y =-60, hjust =0,label ="Airports and Flights", color ="snow3", size =4.5, fontface ="bold" ) +# add caption at bottom left cornerlabs(caption ="data source: OpenFlights https://openflights.org/data.html") +theme(plot.caption =element_text(color ="snow3", hjust = .05, margin =margin(b =5, unit ="pt"))) +coord_fixed(ratio =1.2, # adjust aspect ratioylim =c(-60, 90)) # remove Antarctica p.static # Edit 6:** render the static plot into animation.# It takes several minutes to render the animation. library(gganimate) p.static +transition_states(# instant transition between different states (frames)transition_length =0,# the "faceting" variablestates = whichFrame, # transition the last state to the first state # to continue the animationwrap = T)