Visualize Global Airports and Flights with ggplot2
In this article, we’ll visualize the global flights and airports in the world map. Visualization with ggplot2 is quite straightforward. More efforts are made in the data clean up.
The data is sourced from open flights, and can be directly imported via the URL. We’ll process 2 datasets in a total of 3 major steps.
(1) Import the airport dataset, and update its column names based on the website’s description. To make it easier to navigate through the dataset, we’ll only select columns used in this visualization.
library(ggplot2)library(dplyr)library(tidyr)library(stringr) # read the dataseturl.airport <-"https://raw.githubusercontent.com/jpatokal/openflights/master/data/airports.dat" airport <-read.table(url.airport, sep =",", header =FALSE) %>%as_tibble() # Update column names based on website description; # use "lat" for latitude, and "long" for longitude, # to be consistent with the 'map_data' which is built in R colnames(airport) <-c("airport_ID", "name", "city", "country", "IATA", "ICAO", "lat", "long", "altitude", "timezone","DST", "Tz_database", "time_zone", "type") # select columns useful in this visualizationairport.selected <- airport %>%select(name, airport_ID, lat, long, country) %>%mutate(airport_ID =as.character(airport_ID)) head(airport.selected, n =3)
Output:
# A tibble: 3 × 5 name airport_ID lat long country <chr> <chr> <dbl> <dbl> <chr> 1 Goroka Airport 1 -6.08 145. Papua New Guinea 2 Madang Airport 2 -5.21 146. Papua New Guinea 3 Mount Hagen Kagamuga Airport 3 -5.83 144. Papua New Guinea
This airport.selected serves two purposes:
It will be directly used to create an airport scatterplot: the long variable (longitude) mapped to the x, and lat (latitude) to y.
It will merge with the airline dataset via the column key airport_ID, with flight lines drawn by connecting the associated airports.
For this airline dataset, we need to do two more critical steps:
Assign a unique ID to each flight. By this step, each row represents a unique flight line. This unique flight ID will be mapped to the group aesthetic in geom_line(), and specifies which two data points should be connected to make one flight line.
n <-nrow(airline)airline <- airline %>%mutate(flight_ID =1:n)
Pivot the dataset into a tidy structure, such that each unique flight occupies two rows: the source airport in one row, and the destination airport in another row. The airport_ID variable connects the airline and airport datasets, and serves as the key (shared) column to guide the data merge in the next step.
(3)Merge the airport and airline datasets using the shared column airport_ID. The merged dataset has information about each airline, including the flight ID, and associated departure (source) and destination airports noted with airports ID and latitude and longitude. It’s now ready for visualization!
# flights air.all <- airline.tidy %>%left_join(airport.selected, by ="airport_ID") head(air.all, n =4) # now ready for visualization!
Output:
# A tibble: 4 × 7 flight_ID direction airport_ID name lat long country <int> <chr> <chr> <chr> <dbl> <dbl> <chr> 1 1 source 2965 Sochi International Airp… 43.4 40.0 Russia 2 1 destination 2990 Kazan International Airp… 55.6 49.3 Russia 3 2 source 2966 Astrakhan Airport 46.3 48.0 Russia 4 2 destination 2990 Kazan International Airp… 55.6 49.3 Russia
Visualization
We start by creating a world map using the dataset map_data built in ggplot2. Both datasets map_data and air.all have the long (longitude) and lat (latitude) variables.
my.world <-map_data("world") p1 <- my.world %>%ggplot(aes(x = long, y = lat)) +# create a world map backgroundgeom_polygon(aes(group = group), color ="tomato", linewidth = .2, show.legend = F, fill ="black") +coord_fixed(ratio =1.2, # adjust aspect ratioylim =c(-60, 90)) +# remove Antarctica theme_void() +theme(plot.background =element_rect(fill ="black")) p1
Visualize the airlines using the air.all dataset. Airlines are created by connecting the associated source and destination airports, which are specified by the long and lat variables. (Due to the large size of the dataset, it could take several seconds to render the plot)
p4 <- p3 +# add title at bottom left cornerannotate(geom ="text", x =-180, y =-60, hjust =0,label ="Airports and Flights", color ="snow3", size =4.5, fontface ="bold" ) +# add caption at bottom left cornerlabs(caption ="data source: OpenFlights https://openflights.org/data ") +theme(plot.caption =element_text(color ="snow3", hjust = .05, margin =margin(b =5))) p4
Save the plot in the PNG format. By default, the most recently displayed plot (p4) will be saved. Here we save the graphic in the folder of “graphics” (which is in the same folder of the source code).