Step 1: Extract Road Network from OpenStreetMap (OSM)#

This notebook will show you how to go from an .osm.pbf file to a GOSTnets-compatible network format. In addition, this notebook will show you how to:

  • Clip the roads to an input polygon (here we demonstrate with an ESRI Shapefile, rek2.shp)

  • Filter the roads DataFrame by a list of accepted road types, as tagged in OSM (see the list accepted_road_types that we define in the code)

Import Libraries#

First, we import the necessary libraries. In this notebook, we use the following libraries:

  • geopandas to manipulate geospatial data (e.g. clipping the roads to a polygon)

  • os to define the path to the .osm.pbf file

  • GOSTnets to import the OSM data and to save the network in a GOSTnets-compatible format (.gpickle in this case)

import geopandas as gpd
import os
import GOSTnets as gn
GDAL is not installed - OGR functionality not available

The load_osm module is an optional GOSTNets submodule that is used to import OSM data. Make sure gdal, geopy, and boltons are installed first as it has additional dependencies (look in GOSTnets repo main README file if you need additional help with this). Then import the submodule.

from GOSTnets.load_osm import OSM_to_network
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In[2], line 1
----> 1 from GOSTnets.load_osm import OSM_to_network

File /opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/GOSTnets/load_osm.py:17
     14 import networkx as nx
     15 # import matplotlib.pyplot as plt
---> 17 from osgeo import ogr
     18 from rtree import index
     19 from shapely.geometry import LineString, MultiPoint

ModuleNotFoundError: No module named 'osgeo'

Download data and set filepaths#

We will use the osm.pbf file for the entire country of Iceland. You can download the file from Geofabrik, if you didn’t download it from the GOSTnets repository “tutorial_data” folder. Once you have downloaded the file, set the filepath to the .osm.pbf file, this can be a relative or absolute path. We demonstrate construction of a relative file path using the os.path module to define a relative path to the “tutorial_data” folder.

pth = "./"  # change this path to your working folder
fil = r"iceland-latest.osm.pbf"  # download this file from geofabrik: http://download.geofabrik.de/europe/iceland.html.

# be sure to place the .osm.pbf file in the 'tutorial data' folder.

f = os.path.join(pth, "tutorial_data", fil)
# check if the file exists, raise an exception if it does not
if os.path.isfile(f):
    print("File found.")
else:
    raise Exception("File not found.")
# print the file path
print(f)
File found.
./tutorial_data/iceland-latest.osm.pbf

Load the OSM data as a GOSTnets object#

GOSTNets can load OSM data as a special ‘OSM_to_network’ object. This object gets initialized with both a copy of the OSM file itself and the roads extracted from the OSM file in a GeoPandas DataFrame. This DataFrame is a property of the object called ‘roads_raw’ and is the starting point for our network.

# convert the .osm.pbf file to a GOSTnets object
iceland = OSM_to_network(f)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[5], line 2
      1 # convert the .osm.pbf file to a GOSTnets object
----> 2 iceland = OSM_to_network(f)

NameError: name 'OSM_to_network' is not defined
# show the doc-string for the OSM_to_network object
?iceland
Object `iceland` not found.
# display the first 5 rows of the extracted dataframe of the roads network
iceland.roads_raw.head()
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[7], line 2
      1 # display the first 5 rows of the extracted dataframe of the roads network
----> 2 iceland.roads_raw.head()

NameError: name 'iceland' is not defined
# show the different road types and counts
iceland.roads_raw.infra_type.value_counts()
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[8], line 2
      1 # show the different road types and counts
----> 2 iceland.roads_raw.infra_type.value_counts()

NameError: name 'iceland' is not defined

Define the accepted road types list#

We define a list of the types of roads from the above that we consider acceptable for our road network. This list is used to filter the roads DataFrame to only include the types specified.

accepted_road_types = [
    "residential",
    "unclassified",
    "track",
    "service",
    "tertiary",
    "road",
    "secondary",
    "primary",
    "trunk",
    "primary_link",
    "trunk_link",
    "tertiary_link",
    "secondary_link",
]

Filter the roads DataFrame#

We call the filterRoads method and pass it a list of acceptable road types. Then we can look at the infrastructure types in the filtered roads DataFrame (similar to what was done above), except now we have filtered the roads to only include the types we want.

iceland.filterRoads(acceptedRoads=accepted_road_types)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[10], line 1
----> 1 iceland.filterRoads(acceptedRoads=accepted_road_types)

NameError: name 'iceland' is not defined
iceland.roads_raw.infra_type.value_counts()
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[11], line 1
----> 1 iceland.roads_raw.infra_type.value_counts()

NameError: name 'iceland' is not defined

Clip the roads to a polygon#

Here, we import a shapefile for our area of interest. We want to make a map of Reykjavik, Iceland. We clip the extent of the country file to just the area around the capital (rek2.shp) below using standard GeoPandas functions:

# read the shapefile for the clip area
clip_shp = gpd.read_file(os.path.join(pth, "tutorial_data", "rek2.shp"))
clip_shp = clip_shp.to_crs("epsg:4326")
clip_shp_obj = clip_shp.geometry.iloc[0]
# visualize the clip area
clip_shp.plot()
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
File /opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/geopandas/plotting.py:673, in plot_dataframe(df, column, cmap, color, ax, cax, categorical, legend, scheme, k, vmin, vmax, markersize, figsize, legend_kwds, categories, classification_kwds, missing_kwds, aspect, **style_kwds)
    672 try:
--> 673     import matplotlib.pyplot as plt
    674 except ImportError:

ModuleNotFoundError: No module named 'matplotlib'

During handling of the above exception, another exception occurred:

ImportError                               Traceback (most recent call last)
Cell In[13], line 2
      1 # visualize the clip area
----> 2 clip_shp.plot()

File /opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/geopandas/plotting.py:979, in GeoplotAccessor.__call__(self, *args, **kwargs)
    977 kind = kwargs.pop("kind", "geo")
    978 if kind == "geo":
--> 979     return plot_dataframe(data, *args, **kwargs)
    980 if kind in self._pandas_kinds:
    981     # Access pandas plots
    982     return PlotAccessor(data)(kind=kind, **kwargs)

File /opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/geopandas/plotting.py:675, in plot_dataframe(df, column, cmap, color, ax, cax, categorical, legend, scheme, k, vmin, vmax, markersize, figsize, legend_kwds, categories, classification_kwds, missing_kwds, aspect, **style_kwds)
    673     import matplotlib.pyplot as plt
    674 except ImportError:
--> 675     raise ImportError(
    676         "The matplotlib package is required for plotting in geopandas. "
    677         "You can install it using 'conda install -c conda-forge matplotlib' or "
    678         "'pip install matplotlib'."
    679     )
    681 if ax is None:
    682     if cax is not None:

ImportError: The matplotlib package is required for plotting in geopandas. You can install it using 'conda install -c conda-forge matplotlib' or 'pip install matplotlib'.
# this is the geometry object (shapely) from clip_shp
clip_shp_obj
../_images/c215179840c767af32035240966a9b139cda2d9e3ec210575031b7034bdbb921.svg

Check to ensure some roads intersect the AOI#

We check to see everything lines up by running intersect and counting the True / False returns. The count of the True values are the number of roads that intersect the AOI

# intersects is a Shapely function that returns True if the boundary or interior of the object intersect in any way with those of the other
iceland.roads_raw.geometry.intersects(clip_shp_obj).value_counts()
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[15], line 2
      1 # intersects is a Shapely function that returns True if the boundary or interior of the object intersect in any way with those of the other
----> 2 iceland.roads_raw.geometry.intersects(clip_shp_obj).value_counts()

NameError: name 'iceland' is not defined

Next we remove any roads that don’t intersect the capital area, by modifying the roads_raw DataFrame contained within the OSM_to_network object

iceland.roads_raw = iceland.roads_raw.loc[
    iceland.roads_raw.geometry.intersects(clip_shp_obj) == True  # noqa: E712
]
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[16], line 1
----> 1 iceland.roads_raw = iceland.roads_raw.loc[
      2     iceland.roads_raw.geometry.intersects(clip_shp_obj) == True  # noqa: E712
      3 ]

NameError: name 'iceland' is not defined
# reprint the intersects value counts (should only be True now)
iceland.roads_raw.geometry.intersects(clip_shp_obj).value_counts()
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[17], line 2
      1 # reprint the intersects value counts (should only be True now)
----> 2 iceland.roads_raw.geometry.intersects(clip_shp_obj).value_counts()

NameError: name 'iceland' is not defined
# view the first 5 rows of the newly created roads GeoDataFrame
iceland.roadsGPD.head()
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[18], line 2
      1 # view the first 5 rows of the newly created roads GeoDataFrame
----> 2 iceland.roadsGPD.head()

NameError: name 'iceland' is not defined

Generate the RoadsGPD object#

Having narrowed the DataFrame to the roads we are interested in, we generate the RoadsGPD object, which is stored as a property of the OSM_to_network object. The RoadsGPD object is a GeoDataFrame that further processes the roads. This includes splitting the edges where intersections occur, adding unique edge IDs, and adding to/from columns to the GeoDataFrame.

# generate the roads GeoDataFrame, may take a few minutes
iceland.generateRoadsGDF(verbose=False)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[19], line 2
      1 # generate the roads GeoDataFrame, may take a few minutes
----> 2 iceland.generateRoadsGDF(verbose=False)

NameError: name 'iceland' is not defined
# view the first 5 rows of the newly created roads GeoDataFrame
iceland.roadsGPD.head()
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[20], line 2
      1 # view the first 5 rows of the newly created roads GeoDataFrame
----> 2 iceland.roadsGPD.head()

NameError: name 'iceland' is not defined

Generate the GOSTnets graph object#

With the initialReadIn() method, we transform the roads GeoDataFrame into a graph object

iceland.initialReadIn()
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[21], line 1
----> 1 iceland.initialReadIn()

NameError: name 'iceland' is not defined

Save the graph object to file#

We save this graph object down to file using gn.save(). The save function produces three outputs: a node GeoDataFrame as a CSV, an edge GeoDataFrame as a CSV, and a graph object saved as a pickle.

gn.save?
gn.save(iceland.network, "iceland_unclean", os.path.join(pth, "tutorial_outputs"))
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[23], line 1
----> 1 gn.save(iceland.network, "iceland_unclean", os.path.join(pth, "tutorial_outputs"))

NameError: name 'iceland' is not defined

Move on to Step 2 to see how we clean up this rough network.