Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Calculate connections

For most network analysis we do not consider actual geographic connections - this notebook focuses on how to calculate those direct connections between points and collect the geospatial information

Import Libraries and Define Input Data

First we’ll import the necessary libraries and define the input data.

Libraries

We’ll use the following libraries:

  • os for setting the working directory

  • pickle to load the data

  • networkx to calculate travel times through the graph

  • geopandas to work with the geospatial data

  • pandas to work with the data

  • shapely.geometry to work with the geometry of the data

  • GOSTnets to apply custom functions to the network

import os
import pickle as pkl
import networkx as nx
import geopandas as gpd
import pandas as pd

from shapely.geometry import MultiLineString

# import the GOSTnet library
import GOSTnets as gn

Load Input Data

We define the path to the Iceland network tutorial data, and load the network graph information. Then we load the origins and destination data, so that we’re ready to calculate travel times between the points.

# Define input data
pth = "./"
# Read in cleaned pickle from earlier analysis and convert to time
G = pkl.load(
    open(os.path.join(pth, "tutorial_outputs", r"iceland_network_clean.pickle"), "rb")
)
G_time = gn.convert_network_to_time(
    G, distance_tag="length", road_col="infra_type", factor=1000
)
# Define origins and destinations files
rek_grid_file = os.path.join(pth, "tutorial_data", "rek_grid.shp")
rek_pop_grid_file = rek_grid_file.replace(".shp", "_pop.shp")
churches_file = os.path.join(pth, "tutorial_data", "churches.shp")
# Read in origins and destinations files
rek_grid = gpd.read_file(rek_pop_grid_file)
in_churches = gpd.read_file(churches_file)
in_churches = in_churches.to_crs(rek_grid.crs)

Calculate Shortest Paths Between Loaded Origins and Destinations

We’ll calculate the shortest paths between the loaded origins and destinations. To do this we first need to snap the origins and destinations to the network, only then can we calculate the shortest paths between them.

# calculate the origins and destinations by snapping to the road network
origins_df = gn.pandana_snap_c(
    G_time,
    rek_grid,
    source_crs="epsg:4326",
    target_crs="epsg:4326",
    add_dist_to_node_col=True,
)
origins = list(set(origins_df["NN"]))
destinations_df = gn.pandana_snap_c(
    G_time,
    in_churches,
    source_crs="epsg:4326",
    target_crs="epsg:4326",
    add_dist_to_node_col=True,
)
destinations = list(set(destinations_df["NN"]))
nodes_gdf = gn.node_gdf_from_graph(G_time)
edges_gdf = gn.edge_gdf_from_graph(G_time)
obj_nodes = nx.shortest_path(
    G_time, source=origins[0], target=destinations[0], weight="time"
)
print(origins[0])
print(destinations[0])
obj_nodes  # this is a list of the nodes that connected make the shortest path from the origin to the destination

Calculate line strings connecting all origins to all destinations

We’ll calculate the line strings connecting all origins to all destinations. For the sake of the example, we’ll only calculate the line strings between the first 10 origins and the first 10 destinations.

# for the sake of the example we truncate the number of origins and destinations
# to make the computation faster. We will only use the first 10 of each
origins = origins[:10]
destinations = destinations[:10]

Practically we perform this calculation by looping through all origins and destinations and calculating the line string between them.

# for the sake of the example we truncate the number of origins and destinations
# to make the computation faster. We will only use the first 10 of each
origins = origins[:10]
destinations = destinations[:10]
all_res = []
all_connections = []
oIdx = 0
for org in origins:
    oIdx = oIdx + 1
    print(f"{oIdx} of {len(origins)}")
    for dest in destinations:
        obj_nodes = nx.shortest_path(G_time, source=org, target=dest, weight="time")
        all_edges = []
        for idx in range(0, len(obj_nodes) - 1):
            start_node = obj_nodes[idx]
            end_node = obj_nodes[idx + 1]
            cur_edge = edges_gdf.loc[
                (edges_gdf["stnode"] == start_node)
                & (edges_gdf["endnode"] == end_node),
                "geometry",
            ].iloc[0]
            all_edges.append(cur_edge)
            all_connections.append([start_node, end_node, cur_edge])
        all_res.append([org, dest, MultiLineString(all_edges)])

Write the data to a file

Finally, we’ll write the data to a file. First we write all connection data to a CSV file.

# Write all connections to file
all_results = pd.DataFrame(all_res, columns=["O", "D", "geometry"])
all_results.to_csv(os.path.join(pth, "tutorial_outputs", "all_OD_links.csv"))

Then we write the connections information out to a CSV file.

# Tabulate usage of individual links and write to file
all_conn = pd.DataFrame(all_connections, columns=["start", "node", "geometry"])
all_connections_count = pd.DataFrame(all_conn.groupby(["start", "node"]).count())
all_connections_count.reset_index(inplace=True)
all_connections_first = pd.DataFrame(all_conn.groupby(["start", "node"]).first())
all_connections_first.reset_index(inplace=True)
all_connections_first["count"] = all_connections_count["geometry"]
all_connections_first.to_csv(
    os.path.join(pth, "tutorial_outputs", "OD_links_usage.csv")
)