Estimating Activity through Mobility Data#
Less movement typically means less economic activity. Understanding where and when population movement occurs can help inform public policy and disaster response, especially during crises.
Similarly to COVID-19 Community Mobility Reports, Facebook Population During Crisis and Mapbox Movement Data, we generate a series of crisis-relevant indicators, including a baseline and ongoing densities (i.e., n_baseline and count), percent change and z-score. The indicators are calculated by tallying the device count in each tile and at each time period. The devices are drawn out from longitudinal mobility data.
It is important to emphasize the significant Limitations of such an approach. In particular, mobility data is primarily collected through convenience sampling and lacks the controlled methodology of randomized trials.
Data#
In this section, we import from the data sources, available either publicly or via foundational-data.
Area of Interest#
In this step, we import the clipping boundary and the H3 tessellation defined by area(s) of interest below.
Show code cell source
AOI = geopandas.read_file("../../data/final/tessellation/SYRTUR_tessellation.gpkg")
AOI[["geometry", "hex_id", "distance_bin", "distance"]].explore(
column="distance_bin",
cmap="seismic_r",
style_kwds={"stroke": True, "fillOpacity": 0.05},
)

Fig. 20 Visualization of the area of interest centered at the earthquake’s epicenter. The distance (in Km) to the epicenter is calculated for each on H3 (resolution 6) tile.#
Mobility Data#
The WB Data Lab team obtained longitudinal human mobility data. The data consisted of anonymized timestamped geographical points generated by GPS-enabled devices, located in Türkiye and Syria and spanning the period shown below.
The project team obtained longitudinal mobility data. We use the mobile location data to compute a baseline and ongoing densities (i.e., n_baseline and count), percent change and z-score. The indicators are calculated by tallying the device count in each tile and at each time period. The devices are drawn out from longitudinal mobility data. For additional information, please see Data and Methodology.
ddf = dd.read_parquet(
f"../../data/final/panels/{PANEL}",
columns=["hex_id", "longitude", "latitude", "datetime", "uid", "month"],
)
Note
Due to the data volume and velocity (updated daily), the panel’s computation from the raw mobility data took place on AWS. The resulting named dataset above is available on the project’s folder.
First, we calculate the cardinality,
Show code cell source
len(ddf)
369077444
Now, we calculate the temporal extent,
Show code cell source
print(
"From",
ddf["datetime"].min().compute().strftime("%b %d, %Y"),
"to",
ddf["datetime"].max().compute().strftime("%b %d, %Y"),
)
From Jul 01, 2022 to Nov 01, 2023
And visualize the mobility data panel’s spatial density.

Fig. 21 Visualization of the mobility data panel’s spatial distribution. The panel is composed of approximately 200 million points. Source: Veraset Movement.#
Methodology#
The methodology presented consists of generating a series of crisis-relevant metrics, including the baseline(sample) population density
, percent change
and z-score
based on the number of devices in an area at a time. The device count is determined for each tile and for each time period, as defined by data standards and the spatial and temporal aggregations below. Similar approaches have been adopted, such as in []. The metrics may reveal movement trends in the sampled population that may indicate more or less activity.
Data Standards#
Population Sample#
The sampled population is composed of GPS-enabled devices drawn out from longituginal mobility data. It is important to emphasize the sampled population is obtained via convenience sampling and that the mobility data panel represents only a subset of the total population in an area at a time, specifically only users that turned on location tracking on their mobile device. Thus, derived metrics do not represent the total population density.
Spatial Aggregation#
The indicators are aggregated spatially on H3 resolution 6 tiles. This is equivalent to approximately to an area of

Fig. 22 Illustration of H3 (resolution 6) tiles near Gaziantep, Türkiye. Gaziantep is among the most affected areas by the 2023 Türkiye–Syria Earthquake; a 2200-year-old Gaziantep Castle was destroyed after the seismic episodes.#
Temporal Aggregation#
The indicators are aggregated daily on the localized date in the Europe/Istanbul (UTC+3) timezone.
Implementation#
Calculate ACTIVITY
#
In this step, we calculate ACTIVITY
as a density. In other words, we calculate the total of number of devices that were detected within each area of interest aggregated into a daily tally. Please note that a spatial join is used (whether a device was once at least once with an area of interest), which is a simplistic approach compared to, for example, estimating stay locations and visits.
ACTIVITY = (
ddf.assign(date=lambda x: dd.to_datetime(ddf["datetime"].dt.date))
.groupby(["hex_id", "date"])["uid"]
.nunique()
.to_frame("count")
.reset_index()
.compute()
)
Additionally, we create a column weekday
that will come handy later on when standardizing.
ACTIVITY["weekday"] = ACTIVITY["date"].dt.weekday
Calculate BASELINE
#
In this step, we choose the period spanning July 1, 2022 to December 31, 2022 as the baseline. The baseline is calculated for each tile and for each time period, according to the spatial and temporal aggregations.
BASELINE = ACTIVITY[ACTIVITY["date"].between("2022-07-01", "2022-12-31")]
In fact, the result are 7 different baselines for each tile. We calculate the mean device density for each tile and for each day of the week (Mon-Sun).
MEAN = BASELINE.groupby(["hex_id", "weekday"]).agg({"count": ["mean", "std"]})
Taking a sneak peek,
MEAN[MEAN.index.get_level_values("hex_id").isin(["862da898fffffff"])]
count.mean | count.std | ||
---|---|---|---|
hex_id | weekday | ||
862da898fffffff | 0 | 8867.653846 | 9441.786543 |
1 | 8641.500000 | 9341.744035 | |
2 | 8100.192308 | 8794.041446 | |
3 | 8858.307692 | 8130.096180 | |
4 | 10231.888889 | 10199.128712 | |
5 | 10072.407407 | 10230.396328 | |
6 | 9946.384615 | 9898.669483 |
Calculate Z-Score
and Percent Change#
A z-score is a statistical measure that tells how above or below a particular data point is from the mean (average) of a group of data points, in terms of standard deviations. A z-score is particularly useful to standardize and make meaningful comparisons between different sets of data. By examining the z-scores, one can assess how closely a data set diverts from the mean, considering the variance. On the other hand, a percent change may be easier to interpret, but does not provide this information.
Creating StandardScaler
for each hex_id
,
scalers = {}
for hex_id in BASELINE["hex_id"].unique():
scaler = StandardScaler()
scaler.fit(BASELINE[BASELINE["hex_id"] == hex_id][["count"]])
scalers[hex_id] = scaler
Joining with the area of interest (AOI
),
ACTIVITY = ACTIVITY.merge(AOI, how="left", on="hex_id").drop(["geometry"], axis=1)
Finally, merging with the (mean) baseline,
ACTIVITY = pd.merge(ACTIVITY, MEAN, on=["hex_id", "weekday"], how="left")
Calculating the z_score
for each tile,
for hex_id, scaler in scalers.items():
try:
predicate = ACTIVITY["hex_id"] == hex_id
score = scaler.transform(ACTIVITY[predicate][["count"]])
ACTIVITY.loc[predicate, "z_score"] = score
except:
pass
Additionally, we calculate the percent change. While the z-score offers more robustness to outliers and numerical stability, the percent change can be used when interpretability is most important. Thus, preparing columns,
ACTIVITY["n_baseline"] = ACTIVITY["count.mean"]
ACTIVITY["n_difference"] = ACTIVITY["count"] - ACTIVITY["n_baseline"]
ACTIVITY["percent_change"] = 100 * (ACTIVITY["count"] / (ACTIVITY["n_baseline"]) - 1)
Taking a sneak peek,
hex_id | date | count | n_baseline | n_difference | percent_change | z_score | ADM0_PCODE | ADM1_PCODE | ADM2_PCODE | |
---|---|---|---|---|---|---|---|---|---|---|
729784 | 862db3bafffffff | 2023-10-31 | 1 | 13.923077 | -12.923077 | -92.817680 | -3.693753 | SY | SY12 | SY1200 |
729783 | 862db3bafffffff | 2023-10-30 | 1 | 13.846154 | -12.846154 | -92.777778 | -3.693753 | SY | SY12 | SY1200 |
729782 | 862db3bafffffff | 2023-10-29 | 1 | 11.884615 | -10.884615 | -91.585761 | -3.693753 | SY | SY12 | SY1200 |
729781 | 862db3bafffffff | 2023-10-26 | 1 | 14.440000 | -13.440000 | -93.074792 | -3.693753 | SY | SY12 | SY1200 |
729780 | 862db3bafffffff | 2023-10-25 | 1 | 11.961538 | -10.961538 | -91.639871 | -3.693753 | SY | SY12 | SY1200 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
33712 | 862c14807ffffff | 2022-07-14 | 11 | 6.166667 | 4.833333 | 78.378378 | 0.908132 | SY | SY08 | SY0803 |
25394 | 862c14807ffffff | 2022-07-05 | 11 | 6.500000 | 4.500000 | 69.230769 | 0.908132 | SY | SY08 | SY0803 |
25393 | 862c14807ffffff | 2022-07-04 | 19 | 7.750000 | 11.250000 | 145.161290 | 2.617558 | SY | SY08 | SY0803 |
25392 | 862c14807ffffff | 2022-07-03 | 11 | 8.272727 | 2.727273 | 32.967033 | 0.908132 | SY | SY08 | SY0803 |
23026 | 862c14807ffffff | 2022-07-01 | 11 | 9.500000 | 1.500000 | 15.789474 | 0.908132 | SY | SY08 | SY0803 |
236398 rows × 10 columns
Findings#
Less movement typically means less economic activity. A potential use of movement “activity” indicators could be to see the their evolution in time and correlatation to other features. We present the results (i.e., percent_change
and z_score
) on both first-level administrative division (governorate and provinces) and selected areas.
Percent Change#
Percent Change (ADM 1)#
In this section, we visualize the mean percent_change
aggregated for each first-level administrative division.
Show code cell source
p = figure(
title="Activity Trends (Percent Change)",
width=800,
height=700,
x_axis_label="Date",
x_axis_type="datetime",
y_axis_label="Percent change (based on device density)",
tools="pan,wheel_zoom,box_zoom,reset,save,box_select",
)
# p.y_range = Range1d(-250, 2000, bounds=(0, None))
p.add_layout(
Title(
text=f"",
text_font_size="12pt",
text_font_style="italic",
),
"above",
)
p.add_layout(
Title(
text=f"Percent change in device density for each time window and each first-level administrative division",
text_font_size="12pt",
text_font_style="italic",
),
"above",
)
p.add_layout(
Title(
text=f"Source: Veraset Movement. Creation date: {datetime.today().strftime('%d %B %Y')}. Feedback: datalab@worldbank.org.",
text_font_size="10pt",
text_font_style="italic",
),
"below",
)
p.add_layout(Legend(), "right")
p.renderers.extend(
[
Span(
location=datetime(2023, 2, 6),
dimension="height",
line_color="grey",
line_width=2,
line_dash=(4, 4),
),
]
)
p.add_tools(
HoverTool(
tooltips="Date: @x{%F}, Percent Change: @y{00.0}%",
formatters={"@x": "datetime"},
)
)
renderers = []
for column, color in zip(data.columns, COLORS):
try:
r = p.line(
data.index,
data[column],
legend_label=NAMES.get(column),
line_color=color,
line_width=2,
)
renderers.append(r)
except:
pass
p.legend.location = "bottom_left"
p.legend.click_policy = "hide"
p.title.text_font_size = "16pt"
p.sizing_mode = "scale_both"
Percent Change (Selected Areas)#
In this section, we visualize the mean percent_change
for each select area. For example, Aleppo, Syria.
AREAS = ["Aleppo, SY", "Idlib, SY", "Sahinbey, TR", "Sehitkamil, TR"]
Aleppo, SY | Idlib, SY | Sahinbey, TR | Sehitkamil, TR | |
---|---|---|---|---|
date | ||||
2022-07-01 | -44.266469 | 100.041040 | 93.918238 | 171.873760 |
2022-07-02 | -6.688375 | -13.880445 | 85.930355 | 128.654067 |
2022-07-03 | -60.996682 | 70.546710 | 91.653900 | 143.649685 |
2022-07-04 | -12.510691 | 138.147037 | 136.855498 | 138.633144 |
2022-07-05 | -19.370003 | 36.786124 | 100.424886 | 111.776318 |
... | ... | ... | ... | ... |
2023-10-28 | -91.808989 | -87.043350 | -73.099934 | 86.979500 |
2023-10-29 | -92.734049 | -94.128882 | -87.115446 | 102.765045 |
2023-10-30 | -93.881265 | -90.154318 | -82.134036 | 45.738152 |
2023-10-31 | -93.883027 | -95.050423 | -86.393535 | -28.208391 |
2023-11-01 | -97.823924 | -98.849558 | -98.516320 | -98.379058 |
477 rows × 4 columns
And we visualize the time series,
Z-Score#
Z-Score(ADM 1)#
In this section, we visualize the mean z_score
aggregated for each first-level administrative division.

Fig. 23 The map above shows the z-score for each H3 tile and each time period. The z-score shows the number of standard deviations that the data point diverges from the mean; in other words, whether the change in population for that area is statistically different from the baseline period. Click to see it on Foursquare Studio#
Now, we visualize below the z_score
indicator presented and aggregated (mean) for each first-level administrative division.
Show code cell source
data = ACTIVITY.groupby(["date", "ADM1_PCODE"])["z_score"].mean().to_frame()
data = data.pivot_table(values=["z_score"], index=["date"], columns=["ADM1_PCODE"])
data.columns = [x[1] for x in data.columns]
data = data.groupby(pd.Grouper(freq=FREQ)).mean()
Limitations#
The methodology presented is an exploratory analysis pilot aiming to shed light on the economic situation in Syria and Türkiye leveraging alternative data, especially when we are confronted with the absence of traditional data and methods. Mobility data, like any other type of data, comes with limitations and underlying assumptions that should be considered when interpreting and using the data.
Caution
Here are some common limitations and assumptions associated with mobility data:
Limitations:
Sampling Bias: Mobility data is primarily collected through convenience sampling and lacks the controlled methodology of randomized trials.
Selection Bias: Users who opt to share their mobility data may not be representative of the entire population, potentially introducing selection bias.
Privacy Concerns: The collection of mobility data may raise privacy issues, as it can sometimes be linked to individuals, potentially violating their privacy.
Data Quality: Data quality can vary, and errors, inaccuracies, or missing data points may be present, which can affect the reliability of analyses.
Temporal and Spatial Resolution: Mobility data may not capture all movements or may lack fine-grained temporal or spatial resolution, limiting its utility for some applications.
Lack of Contextual Information: Mobility data primarily captures movement patterns and geolocation information. It may lack other crucial contextual information, such as transactional data, business types, or specific economic activities, which are essential for accurate estimation of economic activity.
Private Intent Data: The methodology relies on private intent data. In other words, the input data, i.e. the mobility data, was not produced or collected to analyze the population of interest or address the research question as its primary objective but it was repurposed for the public good. The benefits and caveats when using private intent data have been discussed extensively in the World Development Report 2021 [].
Assumptions:
Homogeneity: Mobility data often assumes that the mobility patterns of individuals or groups are relatively consistent over time and space, which may not always be the case.
Consistency in Data Sources: Mobility data may assume consistency in data sources and methodologies across different regions or datasets, which may not always hold true.
User Behavior: Assumptions about user behavior, such as the purpose of travel or preferred routes, are often made when interpreting mobility data.
Implicit Data Interpretation: Interpretation of mobility data often assumes that certain behaviors or patterns observed in the data have a specific meaning, which may not always be accurate without additional context.
App Usage as a Proxy: In some cases, the use of specific apps or devices may be used as a proxy for mobility data, assuming that it accurately represents individual movements.
It’s important to be aware of these limitations and assumptions when working with mobility data and to consider their potential impact on the conclusions drawn from the data. Additionally, researchers and analysts should explore ways to address these limitations and validate assumptions when conducting mobility data analyses.
See also
For further discussion on limitations and assumptions, please check out the Development Data Partnership Documentation on Mobility Data.