Estimating Activity based on Visits to Points of Interest through Mobility Data#

Estimating visits within or in the proximity to points of interest, such as residential, commercial and industrial zones can potentially shed light into the economic impacts brought by the earthquakes near Nurdağı, Türkiye. Similarly to Google Community Mobility Reports (now discontinued), the following working methodology seeks to capture the change in mobility (i.e., number of visits) within OpenStreetMap points of interest compared to a baseline (Jan-23).

Data#

In this section, we import from the data sources, available either publicly or via foundational-data. Please that data is a placeholder location where the data must be placed. When using non-public data, please carefully abide by your organization’s data privacy policy, data classification and the terms and conditions.

Area of Interest#

In this study, the area of interest is defined as the affected 11 provinces in Türkiye and Syria as shown below.

Make this Notebook Trusted to load map: File -> Trust Notebook

Points of Interest#

Using the Humanitarian OpenStreetMap’s Export Tool, the project team obtained OpenStreetMap landuse points of interest within a clipping boundary defined by the area of interest between Türkiye and Syria defined above.

POI = dask_geopandas.read_file(
    "../../data/external/export.hotosm.org/hotosm_worldbank_tur_points_of_interest_gpkg.zip!hotosm_worldbank_tur_points_of_interest.gpkg",
    npartitions=16,
)

To illustrate, we visualize below the points of interest tagged with landuse=industrial.

POI[POI[TAG].isin(["industrial"])].compute().explore(color="red")
Make this Notebook Trusted to load map: File -> Trust Notebook

Mobility Data#

Veraset Movement provides a panel of human mobility data, based on data collection of GPS-enabled devices location.

ddf = dd.read_parquet(
    f"../../data/final/panels/{PANEL}",
    # filters=[("country", "=", COUNTRY)],
)

Calculating date from the date and time the points were collected.

ddf["date"] = dd.to_datetime(ddf["datetime"].dt.date)

Methodology#

Similarly to Google Community Mobility Reports (now discontinued), the following working methodology seeks to capture the change in mobility (i.e., number of visits) within OpenStreetMap points of interest compared to a baseline (Jan-23). Note that the mobility data represents a subset of the total population in an area, specifically only users that turned on the Location Services device setting on their mobile device. This is not the total population density.

Using Dask-GeoPandas, we calculate the number of devices seen within the points of interest. Subsequently, we aggregate the device count spatiotemporally (by H3 and daily) and by landuse classication (e.g, residential)

gddf = dask_geopandas.from_dask_dataframe(
    ddf,
    geometry=dask_geopandas.points_from_xy(ddf, "longitude", "latitude"),
).set_crs("EPSG:4326")

We execute a spatial join (without H3) to aggregate the device count for each spatial and temporal bin. In this case, we use H3 resolution 6 and a daily aggregation.

result = (
    gddf.sjoin(POI, predicate="within")
    .groupby([TAG, "hex_id", "date"])["uid"]
    .nunique()
    .to_frame(name="count")
    .reset_index()
    .compute()
)

Finally, the results joined into administrative divisions.

result = result.merge(
    AOI[["hex_id", "ADM0_PCODE", "ADM1_PCODE", "ADM2_PCODE"]], on="hex_id"
).sort_values(["date"])

Results#

In this section, we visualize the daily number of devices detected within each of the following OSM landuse classification.

FEATURES = [
    "residential",
    "commercial",
    "industrial",
    "education",
    "farmland",
    "construction",
]
Hide code cell source
def plot_visits(data, title="Visits to Point of Interest"):
    """Plot number of visits to OSM points of interest"""

    p = figure(
        title=title,
        width=650,
        height=600,
        x_axis_label="Date",
        x_axis_type="datetime",
        y_axis_label="Number of devices",
        y_axis_type="log",
        y_range=(0.75, 10**6),
        tools="pan,wheel_zoom,box_zoom,reset,save,box_select",
    )
    p.add_layout(Legend(), "right")
    p.add_layout(
        Title(
            text=f"Number of devices detected within OSM '{TAG}' for each 3-day time window",
            text_font_size="12pt",
            text_font_style="italic",
        ),
        "above",
    )
    p.add_layout(
        Title(
            text=f"Source: Veraset Movement. Creation date: {datetime.today().strftime('%d %B %Y')}. Feedback: datalab@worldbank.org.",
            text_font_size="10pt",
            text_font_style="italic",
        ),
        "below",
    )

    # plot spans
    p.renderers.extend(
        [
            Span(
                location=datetime(2023, 2, 6),
                dimension="height",
                line_color="grey",
                line_width=2,
                line_dash=(4, 4),
            ),
            Span(
                location=datetime(2023, 2, 20),
                dimension="height",
                line_color="grey",
                line_width=2,
                line_dash=(4, 4),
            ),
        ]
    )

    # plot lines
    renderers = []
    for column, color in zip(FEATURES, COLORS):
        try:
            r = p.line(
                data.index[:-1],
                data[column][:-1],
                legend_label=column,
                line_color=color,
                line_width=2,
            )
            renderers.append(r)
        except:
            pass

    p.add_tools(
        HoverTool(
            tooltips="date: @x{%F}, devices: @y",
            formatters={"@x": "datetime"},
        )
    )

    p.legend.location = "bottom_left"
    p.legend.click_policy = "hide"
    p.title.text_font_size = "16pt"
    p.sizing_mode = "scale_both"
    return p

By aggregating the visits tally, we show a smoothed representation of how many users were detected within the total area for each 3-day time period.

data = (
    result.pivot_table("count", index=["date"], columns=[TAG], aggfunc="sum")
    .groupby(pd.Grouper(freq="3D"))
    .sum()
)
BokehJS 3.2.2 successfully loaded.

By first-level administrative divisions#

By aggregating the visits tally, we show a smoothed representation of how many users were detected within each first-level administrative division and for each 3-day time period.

Syria#

Hide code cell source
tabs = []

# iterate over first-level administrative divisions
for pcode in sorted(result[result["ADM0_PCODE"] == "SY"]["ADM1_PCODE"].unique()):
    data = (
        result[result["ADM1_PCODE"] == pcode]
        .pivot_table("count", index=["date"], columns=[TAG], aggfunc="sum")
        .groupby(pd.Grouper(freq="3D"))
        .sum()
    )
    tabs.append(
        TabPanel(
            child=plot_visits(
                data, title=f"Points of Interest Visits in {NAMES.get(pcode)} ({pcode})"
            ),
            title=pcode,
        )
    )

# plot panel
tabs = Tabs(tabs=tabs, sizing_mode="scale_both")
show(tabs)

Türkiye#

Hide code cell source
tabs = []

# iterate over first-level administrative divisions
for pcode in sorted(result[result["ADM0_PCODE"] == "TR"]["ADM1_PCODE"].unique()):
    data = (
        result[result["ADM1_PCODE"] == pcode]
        .pivot_table("count", index=["date"], columns=[TAG], aggfunc="sum")
        .groupby(pd.Grouper(freq="3D"))
        .sum()
    )
    tabs.append(
        TabPanel(
            child=plot_visits(
                data, title=f"Points of Interest Visits in {NAMES.get(pcode)} ({pcode})"
            ),
            title=pcode,
        )
    )

# plot panel
tabs = Tabs(tabs=tabs, sizing_mode="scale_both")
show(tabs)

Limitations#

Limitations of using mobility data to estimate economic activity#

Warning

  • Sample Bias: The sampled population is composed of GPS-enabled devices drawn out from a longituginal mobility data panel. It is important to emphasize the sampled population is obtained via convenience sampling and that the mobility data panel represents only a subset of the total population in an area at a time, specifically only users that turned on location tracking on their mobile device. Thus, derived metrics do not represent the total population density.

  • Incomplete Coverage: Mobility data is typically collected from sources such as mobile phone networks, GPS devices, or transportation systems. These sources may not be representative of the entire population or all economic activities, leading to sample bias and potentially inaccurate estimations.Not all individuals or businesses have access to devices or services that generate mobility data. This can result in incomplete coverage and potential underrepresentation of certain demographic groups or economic sectors.

  • Lack of Contextual Information: Mobility data primarily captures movement patterns and geolocation information. It may lack other crucial contextual information, such as transactional data, business types, or specific economic activities, which are essential for accurate estimation of economic activity.

Limitations of using points of interest database from OpenStreetMap#

Warning

  • Data Quality: OpenStreetMap (OSM) relies on user contributions, which can vary in quality and may not always be up-to-date. The accuracy and completeness of the points of interest (POI) database in OSM can vary significantly across different regions and categories.

  • Bias and Incompleteness: OSM data can be biased towards areas or categories that attract more active contributors. Certain regions or types of businesses may be underrepresented, leading to incomplete or skewed data, especially in less-populated or less-developed areas.

  • Lack of Standardization: OSM does not enforce strict data standards, resulting in variations in the format, categorization, and attribute information of POIs. This lack of standardization can make it challenging to compare and analyze data consistently across different regions or time periods.

  • Verification and Validation: While OSM relies on community-driven efforts for data verification, the absence of a centralized authority or rigorous validation process can introduce errors and inaccuracies. It may be difficult to ascertain the reliability of the information contained in the POI database.

  • Limited Contextual Information: The OSM database primarily focuses on geospatial information, such as coordinates and basic attributes of POIs. It may lack additional contextual information, such as detailed business descriptions, operational hours, or transactional data, which can limit its usefulness for comprehensive economic analysis.