Step 2: Cleaning an OSM Road Network#

This notebook contains further post-processing, also found within the network_clean GOSTnets submodule. This function cleans the network by removing excessive nodes, and ensures all edges are bi-directional (except in the case of one-way roads).

WARNING: The network_clean function is a computationally expensive function, so it may take a while to run. It outputs a pickled graph object, a dataframe of the edges, and a dataframe of the nodes. The expectation is that this function will only be run once.

Setup the Notebook#

First we need to import the necessary libraries and set the file paths.

Import Libraries#

We will use the following libraries:

  • os to set the file paths

  • time to time the function

  • networkx to work with the graph object

  • pickle to load the graph object

  • GOSTnets to use the network_clean function

import os
import time
import networkx as nx
import pickle

# import the GOSTnets library
import GOSTnets as gn
GDAL is not installed - OGR functionality not available

Set File Paths#

We will set the file paths to read the output from the “Step 1” tutorial.

pth = "./"  # change this path to your working folder
data_pth = os.path.join(pth, "tutorial_outputs")

# read back your graph from step 1 from you saved pickle
G = pickle.load(open(os.path.join(data_pth, "iceland_unclean.pickle"), "rb"))
---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
Cell In[2], line 5
      2 data_pth = os.path.join(pth, "tutorial_outputs")
      4 # read back your graph from step 1 from you saved pickle
----> 5 G = pickle.load(open(os.path.join(data_pth, "iceland_unclean.pickle"), "rb"))

File /opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/IPython/core/interactiveshell.py:324, in _modified_open(file, *args, **kwargs)
    317 if file in {0, 1, 2}:
    318     raise ValueError(
    319         f"IPython won't let you open fd={file} by default "
    320         "as it is likely to crash IPython. If you know what you are doing, "
    321         "you can use builtins' open."
    322     )
--> 324 return io_open(file, *args, **kwargs)

FileNotFoundError: [Errno 2] No such file or directory: './tutorial_outputs/iceland_unclean.pickle'

Inspect and Clean the Network#

We will inspect the network and then clean it using the network_clean function.

# inspect the graph
nodes = list(G.nodes(data=True))
edges = list(G.edges(data=True))
print(len(nodes))
print(nodes[0])
print(len(edges))
print(edges[0])
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[3], line 2
      1 # inspect the graph
----> 2 nodes = list(G.nodes(data=True))
      3 edges = list(G.edges(data=True))
      4 print(len(nodes))

NameError: name 'G' is not defined
# you can also print general graph information with networkx
print(G)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[4], line 2
      1 # you can also print general graph information with networkx
----> 2 print(G)

NameError: name 'G' is not defined
# To become familiar with the function read the doc string
gn.clean_network?

Now we set up some parameters for the CleanNetwork function

Iceland_UTMZ = "epsg:32627"

WGS = "epsg:4326"  # do not adjust. OSM natively comes in ESPG 4326 (Web Mercator)

Run the clean_network Function#

Changing the keyword verbose to True will write the outputs in the specified wpath.

print("start: %s\n" % time.ctime())
G_clean = gn.clean_network(
    G, UTM=Iceland_UTMZ, WGS="epsg:4326", junctdist=10, verbose=False
)

# using verbose = True:
# G_clean = gn.clean_network(G, wpath = data_pth, output_file_name = 'iceland_network', UTM = Iceland_UTMZ, WGS = {'init': 'epsg:4326'}, junctdist = 10, verbose = True)
print("\nend: %s" % time.ctime())
print("\n--- processing complete")
start: Thu Aug 15 00:39:29 2024
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[7], line 3
      1 print("start: %s\n" % time.ctime())
      2 G_clean = gn.clean_network(
----> 3     G, UTM=Iceland_UTMZ, WGS="epsg:4326", junctdist=10, verbose=False
      4 )
      6 # using verbose = True:
      7 # G_clean = gn.clean_network(G, wpath = data_pth, output_file_name = 'iceland_network', UTM = Iceland_UTMZ, WGS = {'init': 'epsg:4326'}, junctdist = 10, verbose = True)
      8 print("\nend: %s" % time.ctime())

NameError: name 'G' is not defined
# let's print info on our clean version
print(G_clean)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[8], line 2
      1 # let's print info on our clean version
----> 2 print(G_clean)

NameError: name 'G_clean' is not defined

The clean_network function helps snapping points that are very close to one another. However, it does not conduct any check on whether the network is fully connected.

Optional step: Only use the largest sub-graph#

Network analysis is often done on only connected graphs. Disconnected graphs can result in paths that cannot reach their destination. Also, you can evaluate how connected your network is and have the option of going back and making more edits.

# Identify only the largest graph

# compatible with NetworkX 2.4
list_of_subgraphs = list(
    G_clean.subgraph(c).copy() for c in nx.strongly_connected_components(G_clean)
)
max_graph = None
max_edges = 0
for i in list_of_subgraphs:
    if i.number_of_edges() > max_edges:
        max_edges = i.number_of_edges()
        max_graph = i

# set your graph equal to the largest sub-graph
G_largest = max_graph
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[9], line 5
      1 # Identify only the largest graph
      2 
      3 # compatible with NetworkX 2.4
      4 list_of_subgraphs = list(
----> 5     G_clean.subgraph(c).copy() for c in nx.strongly_connected_components(G_clean)
      6 )
      7 max_graph = None
      8 max_edges = 0

NameError: name 'G_clean' is not defined
# print info about the largest sub-graph
print(G_largest)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[10], line 2
      1 # print info about the largest sub-graph
----> 2 print(G_largest)

NameError: name 'G_largest' is not defined

The majority of the network was captured by the largest subgraph. That’s pretty good. It means the quality of OSM data for this city is quite good.

Save this prepared graph in your output folder:

gn.save(G_largest, "iceland_network_clean", data_pth)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[11], line 1
----> 1 gn.save(G_largest, "iceland_network_clean", data_pth)

NameError: name 'G_largest' is not defined

How many subgraphs would you guess there are?

len(list_of_subgraphs)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[12], line 1
----> 1 len(list_of_subgraphs)

NameError: name 'list_of_subgraphs' is not defined

Move on to Step 3 to see how we can use this network for some travel time analysis!

Optional: Compare networks (original / clean-version / largest subgraph)#

OSMNX is one of the key libraries that GostNETS is based on. Here, we load it to access graph-plotting functions.

# import the OSMnx library and matplotlib library for visualizations
import osmnx as ox
import matplotlib.pyplot as plt
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In[13], line 3
      1 # import the OSMnx library and matplotlib library for visualizations
      2 import osmnx as ox
----> 3 import matplotlib.pyplot as plt

ModuleNotFoundError: No module named 'matplotlib'
# plotting functions only work if the graphs have a name and a crs attribute
G.graph["crs"] = "epsg:32646"
G.graph["name"] = "Iceland"

# original graph
fig, ax = plt.subplots(figsize=(10, 14))
ax.set_facecolor("k")
ax.set_title("Iceland - Original Network")
fig, ax = ox.plot_graph(G, ax=ax, edge_linewidth=1, node_size=7)
plt.show()
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[14], line 2
      1 # plotting functions only work if the graphs have a name and a crs attribute
----> 2 G.graph["crs"] = "epsg:32646"
      3 G.graph["name"] = "Iceland"
      5 # original graph

NameError: name 'G' is not defined
G_clean.graph["crs"] = "epsg:32646"
G_clean.graph["name"] = "Iceland"

# cleaned graph
fig, ax = plt.subplots(figsize=(10, 14))
ax.set_facecolor("k")
ax.set_title("Iceland - Cleaned Network")
fig, ax = ox.plot_graph(G_clean, ax=ax, edge_linewidth=1, node_size=7)
plt.show()
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[15], line 1
----> 1 G_clean.graph["crs"] = "epsg:32646"
      2 G_clean.graph["name"] = "Iceland"
      4 # cleaned graph

NameError: name 'G_clean' is not defined
G_largest.graph["crs"] = "epsg:32646"
G_largest.graph["name"] = "Iceland"

# largest subgraph
fig, ax = plt.subplots(figsize=(10, 14))
ax.set_facecolor("k")
ax.set_title("Iceland - The Largest Subgraph")
fig, ax = ox.plot_graph(G_largest, ax=ax, edge_linewidth=1, node_size=7)
plt.show()
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[16], line 1
----> 1 G_largest.graph["crs"] = "epsg:32646"
      2 G_largest.graph["name"] = "Iceland"
      4 # largest subgraph

NameError: name 'G_largest' is not defined