Visualize Global Airports and Flights with ggplot2

In this article, we’ll visualize the global flights and airports in the world map. Visualization with ggplot2 is quite straightforward. More efforts are made in the data clean up.

Key techniques covered in this article include:


Packages and data cleanup

The data is sourced from open flights, and can be directly imported via the URL. We’ll process 2 datasets in a total of 3 major steps.

(1) Import the airport dataset, and update its column names based on the website’s description. To make it easier to navigate through the dataset, we’ll only select columns used in this visualization.

library(ggplot2)library(dplyr)library(tidyr)library(stringr)
# read the dataseturl.airport <- "https://raw.githubusercontent.com/jpatokal/openflights/master/data/airports.dat"
airport <- read.table(url.airport, sep = ",", header = FALSE) %>% as_tibble()
# Update column names based on website description; # use "lat" for latitude, and "long" for longitude, # to be consistent with the 'map_data' which is built in R colnames(airport) <- c( "airport_ID", "name", "city", "country", "IATA", "ICAO", "lat", "long", "altitude", "timezone", "DST", "Tz_database", "time_zone", "type")
# select columns useful in this visualizationairport.selected <- airport %>% select(name, airport_ID, lat, long, country) %>% mutate(airport_ID = as.character(airport_ID))
head(airport.selected, n = 3)

Output:

# A tibble: 3 × 5
name airport_ID lat long country
<chr> <chr> <dbl> <dbl> <chr>
1 Goroka Airport 1 -6.08 145. Papua New Guinea
2 Madang Airport 2 -5.21 146. Papua New Guinea
3 Mount Hagen Kagamuga Airport 3 -5.83 144. Papua New Guinea

This airport.selected serves two purposes:

  • It will be directly used to create an airport scatterplot: the long variable (longitude) mapped to the x, and lat (latitude) to y.

  • It will merge with the airline dataset via the column key airport_ID, with flight lines drawn by connecting the associated airports.

(2) Load and clean up the airline dataset.

url.airline <-   "https://raw.githubusercontent.com/jpatokal/openflights/master/data/routes.dat"
airline <- read.table(url.airline, sep = ",", header = F) %>% as_tibble()
# Update column names based on website descriptioncolnames(airline) <- c( "airline", "airline_ID", "source_airport", "source_airport_ID", "destination_airport", "destination_airport_ID", "Codeshare", "Stops", "Equipment")
# select useful columnsairline <- airline %>% select(source_airport_ID, destination_airport_ID)
head(airline, 3)

Output:

# A tibble: 3 × 2
source_airport_ID destination_airport_ID
<chr> <chr>
1 2965 2990
2 2966 2990
3 2966 2962

For this airline dataset, we need to do two more critical steps:

  • Assign a unique ID to each flight. By this step, each row represents a unique flight line. This unique flight ID will be mapped to the group aesthetic in geom_line(), and specifies which two data points should be connected to make one flight line.
n <- nrow(airline)airline <- airline %>% mutate(flight_ID = 1:n)
  • Pivot the dataset into a tidy structure, such that each unique flight occupies two rows: the source airport in one row, and the destination airport in another row. The airport_ID variable connects the airline and airport datasets, and serves as the key (shared) column to guide the data merge in the next step.
airline.tidy <- airline %>%   pivot_longer(-flight_ID,                names_to = "direction",                values_to = "airport_ID") %>%   mutate(direction = str_remove(direction, "_airport_ID"))
head(airline.tidy, n = 4)

Output:

# A tibble: 4 × 3
flight_ID direction airport_ID
<int> <chr> <chr>
1 1 source 2965
2 1 destination 2990
3 2 source 2966
4 2 destination 2990

(3) Merge the airport and airline datasets using the shared column airport_ID. The merged dataset has information about each airline, including the flight ID, and associated departure (source) and destination airports noted with airports ID and latitude and longitude. It’s now ready for visualization!

# flights air.all <- airline.tidy %>%   left_join(airport.selected, by = "airport_ID")
head(air.all, n = 4) # now ready for visualization!

Output:

# A tibble: 4 × 7
flight_ID direction airport_ID name lat long country
<int> <chr> <chr> <chr> <dbl> <dbl> <chr>
1 1 source 2965 Sochi International Airp… 43.4 40.0 Russia
2 1 destination 2990 Kazan International Airp… 55.6 49.3 Russia
3 2 source 2966 Astrakhan Airport 46.3 48.0 Russia
4 2 destination 2990 Kazan International Airp… 55.6 49.3 Russia

Visualization

We start by creating a world map using the dataset map_data built in ggplot2. Both datasets map_data and air.all have the long (longitude) and lat (latitude) variables.

my.world <- map_data("world")
p1 <- my.world %>% ggplot(aes(x = long, y = lat)) + # create a world map background geom_polygon(aes(group = group), color = "tomato", linewidth = .2, show.legend = F, fill = "black") + coord_fixed(ratio = 1.2, # adjust aspect ratio ylim = c(-60, 90)) + # remove Antarctica theme_void() + theme(plot.background = element_rect(fill = "black"))
p1

Visualize the airlines using the air.all dataset. Airlines are created by connecting the associated source and destination airports, which are specified by the long and lat variables. (Due to the large size of the dataset, it could take several seconds to render the plot)

p2 <- p1 +   geom_line(data = air.all,             aes(group = flight_ID),            color = "turquoise", linewidth = .1, alpha = .1) p2

Visualize the airports using the airport.selected dataset.

p3 <- p2 +   geom_point(data = airport.selected,             color = "yellow", shape = ".",             alpha = .6)p3

Add plot title and acknowledge the data source.

p4 <- p3 +   # add title at bottom left corner  annotate(    geom = "text", x = -180, y = -60, hjust = 0,    label = "Airports and Flights",     color = "snow3", size = 4.5, fontface = "bold" ) +    # add caption at bottom left corner  labs(caption = "data source: OpenFlights https://openflights.org/data ") +  theme(plot.caption = element_text(color = "snow3", hjust = .05, margin = margin(b = 5)))
p4

Save the plot in the PNG format. By default, the most recently displayed plot (p4) will be saved. Here we save the graphic in the folder of “graphics” (which is in the same folder of the source code).

ggsave(filename = "worldmap airports and airlines.png",       path = "graphics",       width = 4.5, height = 3, dpi = 400)
library(ggplot2)library(dplyr)library(tidyr)library(stringr)
# import the airport dataseturl.airport <- "https://raw.githubusercontent.com/jpatokal/openflights/master/data/airports.dat"
airport <- read.table( url.airport, sep = ",", header = FALSE) %>% as_tibble()
# Update column names based on website description colnames(airport) <- c( "airport_ID", "name", "city", "country", "IATA", "ICAO", "lat", "long", "altitude", "timezone", "DST", "Tz_database", "time_zone", "type")
# select columns useful in this visualizationairport.selected <- airport %>% select(name, airport_ID, lat, long, country) %>% mutate(airport_ID = as.character(airport_ID))
head(airport.selected, n = 3)

# Load and clean up the airline dataset. url.airline <- "https://raw.githubusercontent.com/jpatokal/openflights/master/data/routes.dat"
airline <- read.table( url.airline, sep = ",", header = F) %>% as_tibble()
# Update column names based on website descriptioncolnames(airline) <- c( "airline", "airline_ID", "source_airport", "source_airport_ID", "destination_airport", "destination_airport_ID", "Codeshare", "Stops", "Equipment")
# select useful columnsairline <- airline %>% select(source_airport_ID, destination_airport_ID)
# For the airline dataset, we need to do two more critical steps:## i) Assign a unique ID to each flight to be mapped to 'group' aesthetic in 'geom_line'.n <- nrow(airline)airline <- airline %>% mutate(flight_ID = 1:n)
## ii) Pivot the dataset into a tidy structure. airline.tidy <- airline %>% pivot_longer(-flight_ID, names_to = "direction", values_to = "airport_ID") %>% mutate(direction = str_remove(direction, "_airport_ID"))
head(airline.tidy, n = 4)

# Merge the airport and airline datasets.air.all <- airline.tidy %>% left_join(airport.selected, by = "airport_ID")
head(air.all, n = 4) # now ready for visualization!

# Create a world map.my.world <- map_data("world")
p1 <- my.world %>% ggplot(aes(x = long, y = lat)) + # create a world map background geom_polygon(aes(group = group), color = "tomato", linewidth = .2, show.legend = F, fill = "black") + coord_fixed(ratio = 1.2, # adjust aspect ratio ylim = c(-60, 90)) + # remove Antarctica theme_void() + theme(plot.background = element_rect(fill = "black")) p1

# Visualize the airlines.p2 <- p1 + geom_line(data = air.all, aes(group = flight_ID), color = "turquoise", linewidth = .1, alpha = .1) p2

# Visualize the airports.p3 <- p2 + geom_point(data = airport.selected, color = "yellow", shape = ".", alpha = .6)p3

# Add plot title and the data source.p4 <- p3 + # add title at bottom left corner annotate( geom = "text", x = -180, y = -60, hjust = 0, label = "Airports and Flights", color = "snow3", size = 4.5, fontface = "bold" ) + # add caption at bottom left corner labs(caption = "data source: OpenFlights https://openflights.org/data ") + theme(plot.caption = element_text(color = "snow3", hjust = .05, margin = margin(b = 5)))p4

# save the plotggsave(filename = "worldmap airports and airlines.png", path = "graphics", width = 6, height = 4, dpi = 400)




Continue Exploring — 🚀 one level up!


In this follow-up article, we’ll update the script and turn static plot into animation using the gganimate package. Check it out!



In this article, we’ll create a world map combined with scatterplot to visualize the distribution of cities of different population sizes.