Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Step 2: Cleaning an OSM Road Network

This notebook contains further post-processing, also found within the network_clean GOSTnets submodule. This function cleans the network by removing excessive nodes, and ensures all edges are bi-directional (except in the case of one-way roads).

WARNING: The network_clean function is a computationally expensive function, so it may take a while to run. It outputs a pickled graph object, a dataframe of the edges, and a dataframe of the nodes. The expectation is that this function will only be run once.

Setup the Notebook

First we need to import the necessary libraries and set the file paths.

Import Libraries

We will use the following libraries:

  • os to set the file paths

  • time to time the function

  • networkx to work with the graph object

  • pickle to load the graph object

  • GOSTnets to use the network_clean function

import os, sys
import time
import networkx as nx
import pickle

# import the GOSTnets library
sys.path.insert(0, r"C:\WBG\Work\Code\GOSTnets\src")
import GOSTnets as gn

Set File Paths

We will set the file paths to read the output from the “Step 1” tutorial.

pth = "./"  # change this path to your working folder
data_pth = os.path.join(pth, "tutorial_outputs")

# read back your graph from step 1 from you saved pickle
G = pickle.load(open(os.path.join(data_pth, "iceland_overture_sample.pickle"), "rb"))

Inspect and Clean the Network

We will inspect the network and then clean it using the network_clean function.

# inspect the graph
nodes = list(G.nodes(data=True))
edges = list(G.edges(data=True))
print(len(nodes))
print(nodes[0])
print(len(edges))
print(edges[0])
1410
(0, {'x': -21.840717, 'y': 64.0975477})
1818
(0, 487, {'infra_type': 'living_street', 'one_way': None, 'osm_id': 'a503f6f4-f86d-4027-9377-8e56572438cc', 'key': 'edge_581', 'length': 0.03979436803394712, 'Wkt': <LINESTRING (-21.841 64.098, -21.84 64.097)>})
# you can also print general graph information with networkx
print(G)
MultiDiGraph with 1410 nodes and 1818 edges
# To become familiar with the function read the doc string
gn.clean_network?
Signature:
gn.clean_network(
    G,
    wpath='',
    output_file_name='',
    UTM='epsg:3857',
    WGS='epsg:4326',
    junctdist=50,
    verbose=False,
)
Docstring:
Topologically simplifies an input graph object by collapsing junctions and removing interstital nodes

Parameters
----------
G : networkx.graph object
    a graph object containing nodes and edges. Edges should have a property called 'Wkt' containing geometry objects describing the roads.
wpath : str
    the write path - a drive directory for inputs and output
output_file_name : str
    This will be the output file name with '_processed' appended
UTM : dict
    The epsg code of the projection, in metres, to apply the junctdist
WGS : dict
    the current crs of the graph's geometry properties. By default, assumes WGS 84 (epsg 4326)
junctdist : int, float
    distance within which to collapse neighboring nodes. simplifies junctions. Set to 0.1 if not simplification desired. 50m good for national (primary / secondary) networks
verbose : boolean
    if True, saves down intermediate stages for dissection

Returns
-------
nx.MultiDiGraph
    A simplified graph object
File:      c:\wbg\work\code\gostnets\src\gostnets\network_clean.py
Type:      function

Now we set up some parameters for the CleanNetwork function

Iceland_UTMZ = "epsg:32627"

WGS = "epsg:4326"  # Input CRS; both OSRM and Overture use WGS84

Run the clean_network Function

Changing the keyword verbose to True will write the outputs in the specified wpath.

print("start: %s\n" % time.ctime())
G_clean = gn.clean_network(
    G, UTM=Iceland_UTMZ, WGS="epsg:4326", junctdist=10, verbose=False
)

# using verbose = True:
# G_clean = gn.clean_network(G, wpath = data_pth, output_file_name = 'iceland_network', UTM = Iceland_UTMZ, WGS = {'init': 'epsg:4326'}, junctdist = 10, verbose = True)
print("\nend: %s" % time.ctime())
print("\n--- processing complete")
start: Wed Jan  7 09:43:54 2026

C:\WBG\Work\Code\GOSTnets\src\GOSTnets\core.py:2147: UserWarning: Geometry is in a geographic CRS. Results from 'centroid' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.

  juncs_gdf_unproj["centroid"] = juncs_gdf_unproj.centroid
1818
completed processing 3636 edges
1616
completed processing 3232 edges
Edge reduction: 1818 to 3232 (-77 percent)

end: Wed Jan  7 09:43:56 2026

--- processing complete
# let's print info on our clean version
print(G_clean)
MultiDiGraph with 1184 nodes and 3232 edges

The clean_network function helps snapping points that are very close to one another. However, it does not conduct any check on whether the network is fully connected.

Optional step: Only use the largest sub-graph

Network analysis is often done on only connected graphs. Disconnected graphs can result in paths that cannot reach their destination. Also, you can evaluate how connected your network is and have the option of going back and making more edits.

# Identify only the largest graph

# compatible with NetworkX 2.4
list_of_subgraphs = list(
    G_clean.subgraph(c).copy() for c in nx.strongly_connected_components(G_clean)
)
max_graph = None
max_edges = 0
for i in list_of_subgraphs:
    if i.number_of_edges() > max_edges:
        max_edges = i.number_of_edges()
        max_graph = i

# set your graph equal to the largest sub-graph
G_largest = max_graph
# print info about the largest sub-graph
print(G_largest)
MultiDiGraph with 1139 nodes and 3140 edges

The majority of the network was captured by the largest subgraph. That’s pretty good. It means the quality of OSM data for this city is quite good.

Save this prepared graph in your output folder:

gn.save(G_largest, "iceland_network_clean", data_pth)

How many subgraphs would you guess there are?

len(list_of_subgraphs)
12

Move on to Step 3 to see how we can use this network for some travel time analysis!

Optional: Compare networks (original / clean-version / largest subgraph)

OSMNX is one of the key libraries that GostNETS is based on. Here, we load it to access graph-plotting functions.

# import the OSMnx library and matplotlib library for visualizations
import osmnx as ox
import matplotlib.pyplot as plt
# plotting functions only work if the graphs have a name and a crs attribute
G.graph["crs"] = "epsg:32646"
G.graph["name"] = "Iceland"

# original graph
fig, ax = plt.subplots(figsize=(10, 14))
ax.set_facecolor("k")
ax.set_title("Iceland - Original Network")
fig, ax = ox.plot_graph(G, ax=ax, edge_linewidth=1, node_size=7)
plt.show()
G_clean.graph["crs"] = "epsg:32646"
G_clean.graph["name"] = "Iceland"

# cleaned graph
fig, ax = plt.subplots(figsize=(10, 14))
ax.set_facecolor("k")
ax.set_title("Iceland - Cleaned Network")
fig, ax = ox.plot_graph(G_clean, ax=ax, edge_linewidth=1, node_size=7)
plt.show()
G_largest.graph["crs"] = "epsg:32646"
G_largest.graph["name"] = "Iceland"

# largest subgraph
fig, ax = plt.subplots(figsize=(10, 14))
ax.set_facecolor("k")
ax.set_title("Iceland - The Largest Subgraph")
fig, ax = ox.plot_graph(G_largest, ax=ax, edge_linewidth=1, node_size=7)
plt.show()