Geospatial Data
An easy-to-use library for plotting maps is ggmap. This chapter introduces how to create map visualizations with this library.
Last updated
Was this helpful?
An easy-to-use library for plotting maps is ggmap. This chapter introduces how to create map visualizations with this library.
Last updated
Was this helpful?
To use ggmap
, we first must install it in our project:
Maps created with ggmap
will show a background image of a map and assign the x- and y-axis longitude and latitude of the map section. To obtain map images, we can choose from the two supported providers, or . Stamen Maps do not require authentication with an API token (as Google Maps does), so in this introduction, we'll go with that.
To get a background image for our map, we need to specify the bounding box in terms of longitude and latitude coordinates. To obtain the background image with minimal overhead and directly from our R-program, we install the osmdata
package. This package accesses the service to get the bounding boxes:
To create a map, we go through the following steps:
for our desired background map image using osmdata::getbb
. We can simply pass a query such as "Osnabrück" to the function. Any query that works on their should also work with the function getbb
.
for our map with the ggmap::get_map
function, passing the bounding box from step 1.
with ggmap
.
, such as points or polygons, on top of the map.
Let's assume we want to visualize our Campusbier orders on a map of Osnabrück. The data set contains the longitude and latitude information about the customer's billing address. Following the steps above, we first get the bounding box (longitude and latitude coordinates for the map's boundaries) for Osnabrück:
We can now get the image by passing the bounding box to the get_map
function:
Note that the get_map function has several arguments that we can use to specify the type of map we want. Stamen Maps support the different map types "terrain", "toner", and "watercolor", which all product a different look. Play around with it to see what works best for you. With color = "bw"
, we can produce a grayscale version of the map.
We can display the map now using the ggmap
function:
The final step is adding the data layer on top of the map:
In this case, we draw points on the map for each order. The size of a point corresponds to the turnover (total_price
) for each order. Order with more turnover appear larger:
In the example above, we used points to map coordinates in our data to the map. Additionally, we display some relevant information, in this case the turnover, using a visual property like the size of each point. This type of map visualization is very common and useful when we want to show exact coordinates.
Occasionally, we wish to show more aggregated data on the level of areas, such as countries, states or zip code areas. We can draw these areas as shapes onto the map, and fill the area with a color to visualize certain properties (such as population density). This is called a choropleth map.
As an alternative to location-based maps, in which we draw points in certain positions (coordinates) of the map, we can draw shapes that correspond to an area of the map. Zip codes are an example, as every zip code area can be described by a polygon, which is a set of connected coordinates. To work with an example, we can download the zip code polygons for Germany along with other information about the zip code areas from the following link:
We can choose the format to download. A common format for GIS data (GIS = Graphical Information System) is the Shapefile format. This is a standardized format that most GIS systems support. There is also a package in R that supports this format, called sf
.
sf
-packageThere are numerous resources to get started with the sf-package. The package offers features for reading spatial data from data sources, such as files or database. But it can also create or manipulate whole new spatial structures. Have a look at the following to get an overview and to learn more:
st_read
functionThe st_read
-function can be configured to return the data as a tibble, which is what we want since we are working with the Tidyverse. When we look at the type of the zip
object, we can see that it has the classes data.frame
, tbl
and tbl_df
, as we would expect for a tibble. In addition, it also has the sf
class:
Since the zip object is a tibble, we can use the well-known functions to explore the data a bit. For example, we can get the list of column names:
Or filter the data based on one of the columns:
Because sf
brings its version of the R plot
function, we can directly pass the zip
object to see the geometries. To plot the geometry information, we apply the st_geometry
-function:
For our example of the Campusbier orders, we only want to plot the zip codes of Osnabrück. Tibble-style, we filter the data on the zip codes that start with "490":
As a result, we get 9 entries, which correspond to the zip codes in and around the city of Osnabrück.
Note that we pass the filtered data in zip_os
to the geom_sf
function and tell it not to use the aes
-mapping from ggmap
. This important because in our zip_os
object, there are no lon
and lat
fields. Instead, the geom_sf
-function creates its own aes
-mapping to draw the polygons.
The alpha = 0.8
makes sure we can still see the streets under the zip code shapes.
We can now use the filling of the polygons to convey information in the visualization. We could choose a different color for each zip code:
This does not convey any information about our customers and orders. Instead, we can leverage the data we have and fill the area for each zip code according to the turnover we made there. How can we do that? We need to join the geometry data with the sales data.
In the first step, we need a summary of our orders that contains the turnover per zip code:
To join the geometry data with the turnover summary from above, we make sure the column with the zip code is named zip
in both data frames. We can then use the left_join
with the by
attribute to merge the columns together:
We replace the object zip_os
with the new version with the turnover value included and use the turnover
column for the fill
aesthetic:
Note that we also changed the color scale with scale_fill_distiller
to a sequential palette from blue to white, where blue is more turnover and white is less.
The abbreviation "sf" is short for "Simple Feature", which is a "set of standards that specify a common storage and access model of geographic feature made of mostly two-dimensional geometries" (). The sf
-library makes working with these standards simple in R.
In this example, I have downloaded and unpacked the Shapefile with the German zip code data in my project folder under data/plz_germany
. Then, I can load the Shapefile as follows, using the 's function st_read
:
We can now re-use the map of Osnabrück we created and add the layer with the zip code polygons. The sf
-package extends ggplot2
and comes with its own geometry geom_sf
: