20.4. Chokepoints Monitor - Methodology#

The Data Lab at the World Bank used IMF PortWatch to study the Red Sea shipping crisis. This event began in October 2023, when missile attacks on ships and tankers traversing the Red Sea caused hundreds of vessels to avoid the Suez Canal. The attacks are concentrated near the Bab al-Mandab Strait, a 20-mile-wide chokepoint for maritime traffic.

The following example is based on the commented study.

20.4.1. Chokepoints of interest#

The first step is to detect the chokepoints of interest. Since the example is based on the Red Sea crisis, it studies the following chokepoints:

  • Bab el-Mandeb Strait

  • Cape of Good Hope

  • Suez Canal

# Use already downloaded chokepoints
cpoi = ["Bab el-Mandeb Strait", "Cape of Good Hope", "Suez Canal"]
chokepoints_rs = chokepoints_gdf[chokepoints_gdf['portname'].isin(cpoi)]
chokepoints_rs
portid portname country ISO3 continent fullname lat lon vessel_count_total vessel_count_container vessel_count_dry_bulk vessel_count_general_cargo vessel_count_RoRo vessel_count_tanker industry_top1 industry_top2 industry_top3 share_country_maritime_import share_country_maritime_export LOCODE pageid countrynoaccents ObjectId geometry
0 chokepoint1 Suez Canal None None None Suez Canal 30.593346 32.436882 22217 6455 5870 2097 876 6919 Mineral Products Vegetable Products Chemical & Allied Industries None None None c57c79bf612b4372b08a9c6ea9c97ef0 None 1 POINT (32.43688 30.59335)
3 chokepoint4 Bab el-Mandeb Strait None None None Bab el-Mandeb Strait 12.788597 43.349545 22519 6280 5940 1839 1074 7386 Mineral Products Chemical & Allied Industries Vegetable Products None None None 6b1814d64903461b98144a6cc25eb79c None 4 POINT (43.34954 12.7886)
6 chokepoint7 Cape of Good Hope None None None Cape of Good Hope -34.927286 20.882737 17332 2018 10277 709 355 3973 Mineral Products Vegetable Products Prepared Foodstuffs & Beverages None None None edf18f455a2b4637a3632b6af201abe9 None 7 POINT (20.88274 -34.92729)
chokepoints_rs[
    [
        "geometry",
        "portname",
        "vessel_count_total",
        "vessel_count_container",
        "vessel_count_dry_bulk",
        "vessel_count_general_cargo",
        "vessel_count_RoRo",
        "vessel_count_tanker",
        "industry_top1",
        "industry_top2",
        "industry_top3",
    ]
].explore(
    column="portname",
    cmap="Dark2",
    marker_kwds={"radius": 15},
    tiles="Esri Ocean Basemap",
    legend_kwds={"loc": "upper right", "caption": "Choke Points"},
)
Make this Notebook Trusted to load map: File -> Trust Notebook

20.4.2. Retrieve the data#

The example processes the daily transit calls and estimated volume (capacity) since 2019 for the three chokepoints of interest: Bab el-Mandeb Strait, Cape of Good Hope, Suez Canal.

To build the URL for the API call, first, go to IMF PortWatch site and build a sample URL. The one below was created to retrieve information for chokepoint 1, 4, and 7. This will be useful for creating a function to retrieve the daily calls for the chokepoints of interest.

https://services9.arcgis.com/weJ1QsnbMYJlCHdG/arcgis/rest/services/Daily_Chokepoints_Data/FeatureServer/0/query?where=portid %3D ‘CHOKEPOINT1’ OR portid %3D ‘CHOKEPOINT4’ OR portid %3D ‘CHOKEPOINT7’&outFields=*&outSR=4326&f=json

The function also needs to take into account that IMF PortWatch only returns 1000 records per call.

def get_chokepoint_data(chokepoints, url_base):
    for chokepoint in chokepoints:
        if chokepoint == chokepoints[-1]:
            url_base += f"portid%3D%20'{chokepoint}'%20&outFields=*&outSR=4326&f=json&resultOffset=0"
        else:
            url_base += f"portid%3D%20'{chokepoint}'%20+OR+"
    res = requests.get(url_base)
    df = pd.DataFrame([d["attributes"] for d in res.json()["features"]])
    offset = 1000
    while len(df) % 1000 == 0: #retrieve 1000 records
        res = requests.get(url_base.replace("resultOffset=0", f"resultOffset={offset}"))
        df2 = pd.DataFrame([d["attributes"] for d in res.json()["features"]])
        df = pd.concat([df, df2])
        offset += 1000
    df.reset_index(inplace=True, drop=True)
    df["date"] = df.date.apply(lambda x: datetime.fromtimestamp(x / 1000))
    df.sort_values(["portid", "date"], inplace=True)
    return df
url_base = "https://services9.arcgis.com/weJ1QsnbMYJlCHdG/arcgis/rest/services/Daily_Chokepoints_Data/FeatureServer/0/query?where="
df_chokepoints = get_chokepoint_data(list(chokepoints[chokepoints['portname'].isin(cpoi)].portid.unique()), url_base)
df_chokepoints.loc[:, "ymd"] = df_chokepoints.date.dt.strftime("%Y-%m-%d")
df_chokepoints.head()
date year month day portid portname n_container n_dry_bulk n_general_cargo n_roro n_tanker n_cargo n_total capacity_container capacity_dry_bulk capacity_general_cargo capacity_roro capacity_tanker capacity_cargo capacity ObjectId ymd
0 2018-12-31 21:00:00 2019 1 1 chokepoint1 Suez Canal 23 22 14 4 22 63 85 1.937886e+06 9.439567e+05 174972.926920 46461.102534 1.326360e+06 3.103276e+06 4.429637e+06 1 2018-12-31
2 2019-01-01 21:00:00 2019 1 2 chokepoint1 Suez Canal 26 4 6 5 10 41 51 2.647842e+06 2.389135e+05 33158.163236 99397.705404 6.181574e+05 3.019311e+06 3.637469e+06 3 2019-01-01
4 2019-01-02 21:00:00 2019 1 3 chokepoint1 Suez Canal 12 16 10 3 24 41 65 7.566682e+05 1.050450e+06 81769.437679 50061.205601 1.069216e+06 1.938949e+06 3.008165e+06 5 2019-01-02
5 2019-01-03 21:00:00 2019 1 4 chokepoint1 Suez Canal 18 13 1 1 15 33 48 1.668678e+06 5.830442e+05 1028.904676 16193.424839 1.102152e+06 2.268945e+06 3.371097e+06 6 2019-01-03
7 2019-01-04 21:00:00 2019 1 5 chokepoint1 Suez Canal 20 9 1 1 13 31 44 1.982599e+06 7.714141e+05 1542.093497 16313.755096 6.854480e+05 2.771869e+06 3.457317e+06 8 2019-01-04

20.4.3. Data Smoothening#

A 7-day moving average was created to smoothen out the data and mitigate the effect of daily anomalies.

df_chokepoints = df_chokepoints.loc[df_chokepoints.date >= "2019-01-01"].copy()
df_chokepoints = (
    df_chokepoints.groupby("portname")[["n_tanker", "n_cargo", "n_total", "capacity", "date"]]
    .rolling(7, center=True, min_periods=1, on="date")
    .mean()
)
df_chokepoints.reset_index(inplace=True)
df_chokepoints.drop("level_1", axis=1, inplace=True)
df_chokepoints.loc[:, "ymd"] = df_chokepoints.date.dt.strftime("%Y-%m-%d")
df_chokepoints.loc[:, "md"] = df_chokepoints.date.dt.strftime("%m-%d")
df_chokepoints.head()
portname capacity n_cargo n_tanker n_total date ymd md
0 Bab el-Mandeb Strait 3.888476e+06 42.000000 18.250000 60.250000 2019-01-01 21:00:00 2019-01-01 01-01
1 Bab el-Mandeb Strait 3.866908e+06 40.600000 18.200000 58.800000 2019-01-02 21:00:00 2019-01-02 01-02
2 Bab el-Mandeb Strait 3.822667e+06 41.000000 18.166667 59.166667 2019-01-03 21:00:00 2019-01-03 01-03
3 Bab el-Mandeb Strait 3.693581e+06 40.428571 17.714286 58.142857 2019-01-04 21:00:00 2019-01-04 01-04
4 Bab el-Mandeb Strait 3.614515e+06 38.857143 17.714286 56.571429 2019-01-05 21:00:00 2019-01-05 01-05

20.4.4. AIS Transit Calls in Key Areas, Historical#

This notebook uses plotnine, a package that allows to use the grammar of graphics in Python. The library is based on the famous ggplot2 library in R.

conflict_date = "2023-10-07"
crisis_date = "2023-11-17"
p0 = (
    ggplot(df_chokepoints, aes(x="date", y="n_total", group="portname", color="portname"))  #
    + geom_line(alpha=1, size=0.4)
    + geom_vline(xintercept=conflict_date, linetype="dashed", color="red")
    + geom_vline(xintercept=crisis_date, linetype="dashed", color="red")
    + labs(
        x="",
        y="",
        subtitle="Number of Vessels",
        title="AIS Transit Calls in Key Areas",
        color="Area of Interest",
    )
    + theme_minimal()
    + scale_x_datetime(breaks=date_breaks("6 month"), labels=date_format("%Y-%m"))
    + scale_color_brewer(type="qual", palette=2)
    + theme(
        text=element_text(size=13),
        plot_title=element_text(size=16, weight="bold"),
        axis_text_x=element_text(rotation=45, hjust=1),
        legend_position="none",
    )
    + facet_wrap("~portname", scales="fixed", ncol=1)
)
display(p0)
p0.save(
    filename="transit-calls-chokepoints-historical.jpeg",
    dpi=300,
)
../../_images/9341c1ddfa58454e7d7e71ab24653c0f3c6f851fa566eec45dd63ca825f44b9b.png
/home/sol/venv/lib/python3.10/site-packages/plotnine/ggplot.py:606: PlotnineWarning: Saving 10 x 8 in image.
/home/sol/venv/lib/python3.10/site-packages/plotnine/ggplot.py:607: PlotnineWarning: Filename: transit-calls-chokepoints-historical.jpeg

20.4.5. Reference Period#

Given the volatility of 2021, the team defined the reference period as January 1st 2022 up to October 6th 2023. Daily averages were calculated based on this time period.

Periods:

  • Baseline: 2021, 2022, 2023 (January 1st – October 6th)

  • Middle East Conflict: 2023 (October 7th - November 16th)

  • Red Sea Crisis: November 17th - January 31st, 2024

start_reference_date = "2022-01-01"
conflict_date = "2023-10-07"
crisis_date = "2023-11-17"

Create the reference data by day and month.

df_ref = df_chokepoints.loc[(df_chokepoints.date >= start_reference_date) & (df_chokepoints.date < conflict_date)].copy()
df_ref = df_ref.groupby(["portname", "md"])[
    ["n_tanker", "n_cargo", "n_total", "capacity"]
].mean()
df_ref.reset_index(inplace=True)
df_ref.rename(
    columns={
        "n_tanker": "n_tanker_ref",
        "n_cargo": "n_cargo_ref",
        "n_total": "n_total_ref",
        "capacity": "capacity_ref",
    },
    inplace=True,
)

Filter recent data (2023 onwards) and merge it with the reference values by day and month.

df_filt = df_chokepoints.loc[(df_chokepoints.date >= "2023-01-01")].copy()
df_filt = df_filt.merge(df_ref, on=["portname", "md"], how="left", validate="m:1")

Calculate percentage change.

df_filt.loc[:, "n_total_pct_ch"] = df_filt.apply(
    lambda x: (x.n_total - x.n_total_ref) / (x.n_total_ref), axis=1
)
p1 = (
    ggplot(df_filt, aes(x="ymd", y="n_total", group="portname", color="portname"))  #
    + geom_line(alpha=1)
    + geom_vline(xintercept=conflict_date, linetype="dashed", color="red")
    + geom_vline(xintercept=crisis_date, linetype="dashed", color="red")
    + labs(
        x="",
        y="",
        subtitle="Number of Vessels",
        title="AIS Transit Calls in Key Areas",
        color="Area of Interest",
    )
    + theme_minimal()
    + theme(
        text=element_text(size=13),
        plot_title=element_text(size=16, weight="bold"),
        axis_text_x=element_text(rotation=45, hjust=1),
        legend_position="none",
    )
    + scale_x_datetime(breaks=date_breaks("2 month"), labels=date_format("%Y-%m"))
    + scale_color_brewer(type="qual", palette=2)
    + theme(legend_position="bottom")
)
display(p1)
../../_images/c42cc468599698a0e8d71a3b8abd89fc4fb458e73ab86701163c969a309cf3d2.png

20.4.5.1. AIS Transit Calls Relative to Historical Average#

The following chart separates transit calls for each area and includes a black line, which signals the historical average for each area.

p2 = (
    ggplot(df_filt, aes(x="date", y="n_total", group="portname", color="portname"))  
    + geom_line(size=1)
    + geom_vline(xintercept=conflict_date, linetype="dashed", color="red")
    + geom_vline(xintercept=crisis_date, linetype="dashed", color="red")
    + geom_line(
        aes(x="date", y="n_total_ref", group="portname"),
        color="black",
        size=0.75,
        alpha=3 / 4,
    )
    + labs(
        x="",
        y="",
        subtitle="Number of Vessels",
        title="AIS Transit Calls Relative to Historical Average",
        color="Area of Interest",
    )
    + theme_minimal()
    + theme(
        text=element_text(size=13),
        plot_title=element_text(size=16, weight="bold"),
        axis_text_x=element_text(rotation=45, hjust=1),
        legend_position="none",
    )
    + scale_x_datetime(breaks=date_breaks("2 month"), labels=date_format("%Y-%m"))
    + scale_color_brewer(type="qual", palette=2)
    + facet_wrap("~portname", scales="fixed", ncol=1)
)
display(p2)
../../_images/fe73b8a1ad7d46c04e9dcf3d77561ec66b428cb9dc7e23eb42f0fed4a23897b2.png

20.4.5.2. AIS Transit Calls % Change from Historical Average#

The percentage change between recent daily values and the baseline daily average was calculated.

p3 = (
    ggplot(
        df_filt.loc[df_filt.date >= "2023-08-01"],
        aes(x="ymd", y="n_total_pct_ch", group="portname", color="portname"),
    )  #
    + geom_line(size=0.8)
    + geom_vline(xintercept=conflict_date, linetype="dashed", color="red")
    + geom_vline(xintercept=crisis_date, linetype="dashed", color="red")
    + geom_hline(yintercept=0, color="black", alpha=3 / 4, size=0.3)
    + labs(
        x="",
        y="",
        subtitle="% Change",
        title="AIS Transit Calls in Key Areas",
        color="Area of Interest",
    )
    + theme_minimal()
    + theme(
        text=element_text(size=13),
        plot_title=element_text(size=16, weight="bold"),
        axis_text_x=element_text(rotation=45, hjust=1),
        legend_position="none",
    )
    + scale_x_datetime(breaks=date_breaks("1 month"), labels=date_format("%Y-%m"))
    + scale_color_brewer(type="qual", palette=2)
    + scale_y_continuous(labels=percent_format())
    + facet_wrap("~portname", scales="fixed", ncol=1)
)
display(p3)
../../_images/b8b1ff59f843379196fe75a537169391664924132a9e8d3ec0ad2aba34e1e631.png

20.4.6. Summary Statistics#

Finally, aggregate statistics (average values) per area for each period of interest were calculated.

  • Baseline: January 1st, 2022 – October 6th, 2023

  • Middle East Conflict: October 7th, 2023 - November 16th, 2023

  • Red Sea Crisis: November 17th, 2023 - February 19th, 2024

# Define the periods
df_chokepoints.loc[:, "period"] = ""
df_chokepoints.loc[
    (df_chokepoints.date >= start_reference_date) & (df_chokepoints.date < crisis_date), "period"
] = "Reference"
df_chokepoints.loc[
    (df_chokepoints.date >= conflict_date) & (df_chokepoints.date < crisis_date), "period"
] = "Middle East Conflict"
df_chokepoints.loc[(df_chokepoints.date >= crisis_date), "period"] = "Red Sea Crisis"
# Aggregate the data
df_agg = (
    df_chokepoints.loc[df_chokepoints.period != ""]
    .groupby(["portname", "period"])[["n_tanker", "n_cargo", "n_total", "capacity"]]
    .mean()
)

# Change order of rows
df_agg = df_agg.reindex(
    ["Reference", "Middle East Conflict", "Red Sea Crisis"], level=1
)

20.4.6.1. Table: Daily Average Values by Time Period#

table = df_agg.copy()
# Format column numbers to 2 decimal places only for first three columns
table.iloc[:, :3] = table.iloc[:, :3].apply(lambda x: round(x, 2))
# Format last column numbers to thousands
table.loc[:, "capacity"] = table.capacity.apply(lambda x: "{:,.0f}".format(x))

table.rename(
    columns={
        "n_tanker": "Tankers",
        "n_cargo": "Cargo",
        "n_total": "Total",
        "capacity": "Capacity",
    },
    inplace=True,
)
table.index.names = ["Area of Interest", "Period"]
display(table)
/tmp/ipykernel_36513/3454507845.py:5: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value '['4,386,509' '4,832,362' '1,551,873' '3,848,942' '4,192,709' '6,372,572'
 '4,308,488' '4,773,776' '1,881,587']' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
Tankers Cargo Total Capacity
Area of Interest Period
Bab el-Mandeb Strait Reference 23.96 46.91 70.87 4,386,509
Middle East Conflict 25.95 51.54 77.49 4,832,362
Red Sea Crisis 10.67 20.94 31.62 1,551,873
Cape of Good Hope Reference 9.35 36.69 46.04 3,848,942
Middle East Conflict 10.03 39.13 49.15 4,192,709
Red Sea Crisis 16.85 60.11 76.95 6,372,572
Suez Canal Reference 23.13 46.98 70.12 4,308,488
Middle East Conflict 25.20 51.54 76.74 4,773,776
Red Sea Crisis 12.84 25.43 38.27 1,881,587

20.4.6.2. Table: Daily Average Values by Time Period, % Change from Baseline#

df_agg_copy = df_agg.copy()
res = []
for aoi in cpoi:
    df_sub = df_agg_copy.loc[(aoi), :].transpose().copy()
    df_sub.loc[:, "Middle East Conflict"] = (
        df_sub.loc[:, "Middle East Conflict"] - df_sub.loc[:, "Reference"]
    ) / df_sub.loc[:, "Reference"]
    df_sub.loc[:, "Red Sea Crisis"] = (
        df_sub.loc[:, "Red Sea Crisis"] - df_sub.loc[:, "Reference"]
    ) / df_sub.loc[:, "Reference"]
    df_sub2 = df_sub.transpose()
    df_sub2.drop("Reference", inplace=True)
    df_sub2.loc[:, "portname"] = aoi
    res.append(df_sub2)
df_agg_pct = pd.concat(res)
df_agg_pct.reset_index(inplace=True)
df_agg_pct.set_index(["portname", "period"], inplace=True)

# Format columns as pct
df_agg_pct = df_agg_pct.applymap(lambda x: "{:.2%}".format(x))
df_agg_pct.rename(
    columns={
        "n_tanker": "Tankers",
        "n_cargo": "Cargo",
        "n_total": "Total",
        "capacity": "Capacity",
    },
    inplace=True,
)
df_agg_pct.index.names = ["Area of Interest", "Period"]
display(df_agg_pct)
/tmp/ipykernel_36513/3184510119.py:5: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.
Tankers Cargo Total Capacity
Area of Interest Period
Bab el-Mandeb Strait Middle East Conflict 8.29% 9.86% 9.33% 10.16%
Red Sea Crisis -55.46% -55.36% -55.39% -64.62%
Cape of Good Hope Middle East Conflict 7.22% 6.65% 6.77% 8.93%
Red Sea Crisis 80.14% 63.84% 67.15% 65.57%
Suez Canal Middle East Conflict 8.92% 9.70% 9.45% 10.80%
Red Sea Crisis -44.50% -45.87% -45.42% -56.33%