15.5. Ookla#
Ookla Speedtests are a way for users to find out the download speed, upload speed and latency of their internet connection.
15.5.1. Ookla for Good#
Ookla for Good™ is an initiative to provide data, analysis, and content to organizations that are seeking to improve people’s lives through internet accessibility. Ookla partners with organizations whose goals align with ours to provide unbiased information about the state of networks worldwide.
Ookla has an open dataset which can be downloaded from AWS S3. They also provide an interactive map to explore the data, here. This data is available quarterly at quadkey zoom level 16. It has Global Fixed Broadband & Mobile Network Maps.
Through the Development Data Partnership, we have access to a more spatio-temporal granular data that helps us understand the impact of and recovery after a crisis event.
In this class we are going to work with the flood from May 2024 in Rio Grande do Sul, Brazil. The class is organized in two parts:
Review the open access dataset.
Show the methodology that the Data Lab had followed in the analysis of the Ookla dataset in the past.
15.5.1.1. Ookla Open Dataset#
Observing changes due to a crisis might be harder with this type of dataset because of its granularity. However, by working with it one can understand the potential that the private dataset can have.
15.5.1.1.1. Download the data#
For downloading the data follow these steps:
Go to this link.
Install AWS CLI (if necessary) on your notebook.
!pip install awscliRun this command to explore the datasets.
!aws s3 ls --no-sign-request s3://ookla-open-data/Download the necessary file.
import geopandas as gpd
import pandas as pd
import mercantile
import folium
admin2 = gpd.read_file('../../data/mapping-monitoring-floods/gadm41_BRA_2.json')
# Filter boundaries in Rio Grande do Sul
rgds = admin2[admin2['NAME_1']=='RioGrandedoSul'].copy()
bounds = rgds.bounds
minx = bounds.minx.min()
miny = bounds.miny.min()
maxx = bounds.maxx.max()
maxy = bounds.maxy.max()
zoom_level = 16
tiles = mercantile.tiles(minx, miny, maxx, maxy, zooms=zoom_level)
quadkeys = [mercantile.quadkey(tile).rjust(16, '0') for tile in tiles]
path = '/home/sol/gitrepo/alternative-data-for-crisis/data/internet-connectivity/'
15.5.1.1.2. Mobile Network#
q1 = gpd.read_file(path + 'quarter=1/2024-01-01_performance_mobile_tiles.zip')
q2 = gpd.read_file(path + 'quarter=2/2024-04-01_performance_mobile_tiles.zip')
q3 = gpd.read_file(path + 'quarter=3/2024-07-01_performance_mobile_tiles.zip')
q1.head()
| quadkey | avg_d_kbps | avg_u_kbps | avg_lat_ms | tests | devices | geometry | |
|---|---|---|---|---|---|---|---|
| 0 | 0022133222312323 | 60189 | 18677 | 69 | 1 | 1 | POLYGON ((-160.02136 70.64359, -160.01587 70.6... |
| 1 | 0022133222330102 | 7878 | 15619 | 103 | 1 | 1 | POLYGON ((-160.02686 70.63995, -160.02136 70.6... |
| 2 | 0022332203013333 | 15985 | 4315 | 60 | 2 | 1 | POLYGON ((-162.60315 66.89775, -162.59766 66.8... |
| 3 | 0022332203102213 | 257763 | 36399 | 49 | 1 | 1 | POLYGON ((-162.58118 66.90206, -162.57568 66.9... |
| 4 | 0022332203102221 | 226464 | 52992 | 53 | 1 | 1 | POLYGON ((-162.59216 66.89991, -162.58667 66.8... |
q1_rgds = q1[q1['quadkey'].isin(quadkeys)].copy()
q2_rgds = q2[q2['quadkey'].isin(quadkeys)].copy()
q3_rgds = q3[q3['quadkey'].isin(quadkeys)].copy()
idx = list(set(set(q1_rgds.quadkey).union(set(q2_rgds.quadkey))).union(q3_rgds.quadkey))
df = pd.DataFrame(index = idx)
df['q1_count'] = q1_rgds.set_index('quadkey')['devices']
df['q2_count'] = q2_rgds.set_index('quadkey')['devices']
df['q3_count'] = q3_rgds.set_index('quadkey')['devices']
df.fillna(0, inplace = True)
df['change_3_2'] = df['q3_count'] - df['q2_count']
geoms = pd.concat([q1_rgds, q2_rgds, q3_rgds]).drop_duplicates('geometry').set_index('quadkey')
df['geometry'] = geoms['geometry']
gdf = gpd.GeoDataFrame(data = df, crs = 'epsg:4326', geometry = df['geometry'])
15.5.1.1.2.1. Change in number of observed devices between Quarter 2 (flood) and Quarter 3 - Mobile Network#
gdf.explore(column = 'change_3_2')