Space2Stats#

The Space2Stats program is designed to provide academics, statisticians, and data scientists with easier access to regularly requested geospatial aggregate data. The primary deliverable is a database of geospatial aggregates at two official scales:

  1. Official World Bank boundaries at admin level 2

  2. A global database of h3 hexagons at level 6 (~36km2)

The Space2Stats program is funded by the World Bank’s Global Data Facility, which is a World-Bank hosted funding instrument for the world’s most critical data impact opportunities.

GDF

Core Datasets#

The World Bank’s GOST team is responsible for curating key geographic datasets to populate the Space2Stats database. Raw geographic data is typically available in raster format and has been processed to a consistent spatial grid (h3) using zonal statistics.

The database currently contains four datasets with global coverage:

Our STAC Metadata contains key information about each data source and the variables available in the database. Additional datasets under consideration are listed in our Annex.

API#

Space2Stats data is publicly available through an API built and hosted by Development Seed. The API supports querying the space2stats data by location. The API is a FastAPI application that accesses the underlying PostgreSQL database. Documentation for the API endpoints and expected parameters can be found at https://space2stats.ds.io/docs.

Below are some examples on how to use the API endpoints using Python or R. The base URL is always https://space2stats.ds.io.

/fields#

Returns a list of all fields in the database.

import requests

BASE_URL = "https://space2stats.ds.io"
FIELDS_ENDPOINT = f"{BASE_URL}/fields"

response = requests.get(FIELDS_ENDPOINT)
if response.status_code != 200:
    raise Exception(f"Failed to get fields: {response.text}")

available_fields = response.json()
print("Available Fields:", available_fields)
library(httr2)
base_url <- "https://space2stats.ds.io"

# Set up the request to fetch available fields
req <- request(base_url) |>
  req_url_path_append("fields")  # Append the correct endpoint

# Perform the request and get the response
resp <- req |> req_perform()

# Check the status code
if (resp_status(resp) != 200) {
  stop("Failed to get fields: ", resp_body_string(resp))
}

# Parse the response body as JSON
available_fields <- resp |> resp_body_json()

# Print the available fields in a simplified format
print("Available Fields:")
print(unlist(available_fields))

Sample response:

Available Fields: ['sum_pop_2020', 'ogc_fid', 'sum_pop_f_0_2020', 'sum_pop_f_10_2020', 'sum_pop_f_15_2020', 'sum_pop_f_1_2020', 'sum_pop_f_20_2020', 'sum_pop_f_25_2020', 'sum_pop_f_30_2020', 'sum_pop_f_35_2020', 'sum_pop_f_40_2020', 'sum_pop_f_45_2020', 'sum_pop_f_50_2020', 'sum_pop_f_55_2020', 'sum_pop_f_5_2020', 'sum_pop_f_60_2020', 'sum_pop_f_65_2020', 'sum_pop_f_70_2020', 'sum_pop_f_75_2020', 'sum_pop_f_80_2020', 'sum_pop_m_0_2020', 'sum_pop_m_10_2020', 'sum_pop_m_15_2020', 'sum_pop_m_1_2020', 'sum_pop_m_20_2020', 'sum_pop_m_25_2020', 'sum_pop_m_30_2020', 'sum_pop_m_35_2020', 'sum_pop_m_40_2020', 'sum_pop_m_45_2020', 'sum_pop_m_50_2020', 'sum_pop_m_55_2020', 'sum_pop_m_5_2020', 'sum_pop_m_60_2020', 'sum_pop_m_65_2020', 'sum_pop_m_70_2020', 'sum_pop_m_75_2020', 'sum_pop_m_80_2020', 'sum_pop_m_2020', 'sum_pop_f_2020']

/summary#

The summary endpoint returns data at the h3 level for a specified area of interest (AOI). A summary json can be retrieved with the following parameters included as part of the request body:

  • aoi: The Area of Interest, either as a Feature or an instance of AoiModel.

  • spatial_join_method (Literal["touches", "centroid", "within"]): The method to use for performing the spatial join between the AOI and H3 cells.

    • "touches": Includes H3 cells that touch the AOI.

    • "centroid": Includes H3 cells where the centroid falls within the AOI.

    • "within": Includes H3 cells entirely within the AOI.

  • fields (List[str]): A list of field names to retrieve from the statistics table.

  • geometry (Optional[Literal["polygon", "point"]]): Specifies if the H3 geometries should be included in the response. It can be either "polygon" or "point". If None, geometries are not included.

import requests
import pandas as pd

BASE_URL = "https://space2stats.ds.io"
SUMMARY_ENDPOINT = f"{BASE_URL}/summary"

# Bounding box around Kenya
aoi = {
    "type": "Feature",
    "geometry": {
        "type": "Polygon",
        "coordinates": [
            [
                [33.78593974945852, 5.115816884114494],
                [33.78593974945852, -4.725410543134203],
                [41.94362577283266, -4.725410543134203],
                [41.94362577283266, 5.115816884114494],
                [33.78593974945852, 5.115816884114494],
            ]
        ],
    },
    "properties": {"name": "Updated AOI"},
}

# Define the Request Payload
request_payload = {
    "aoi": aoi,
    "spatial_join_method": "touches",
    "fields": ["sum_pop_2020"],
    "geometry": "polygon",
}

# Get Summary Data
response = requests.post(SUMMARY_ENDPOINT, json=request_payload)
if response.status_code != 200:
    raise Exception(f"Failed to get summary: {response.text}")

summary_data = response.json()
df = pd.DataFrame(summary_data)

df.head()
library(httr2)
library(jsonlite)

base_url <- "https://space2stats.ds.io"

# Bounding box around Kenya
aoi <- list(
  type = "Feature",
  properties = NULL,  # Empty properties
  geometry = list(
    type = "Polygon",
    coordinates = list(
      list(
        c(33.78593974945852, 5.115816884114494),
        c(33.78593974945852, -4.725410543134203),
        c(41.94362577283266, -4.725410543134203),
        c(41.94362577283266, 5.115816884114494),
        c(33.78593974945852, 5.115816884114494)
      )
    )
  )
)

request_payload <- list(
  aoi = aoi,
  spatial_join_method = "centroid",
  fields = list("sum_pop_2020"),
  geometry = "polygon"
)

# Set up the base URL and create the request
req <- request(base_url) |>
  req_url_path_append("summary") |>
  req_body_json(request_payload)

# Perform the request and get the response
resp <- req |> req_perform()

# Turn response into a data frame
summary_data <- resp |> resp_body_string() |> fromJSON(flatten = TRUE)

head(summary_data)

The expected response is a JSON containing the hexagon ID and the requested fields:

            hex_id                                           geometry  \
0  866a4a00fffffff  {"type":"Polygon","coordinates":[[[36.20299996...   
1  866a4a017ffffff  {"type":"Polygon","coordinates":[[[36.10071731...   
2  866a4a01fffffff  {"type":"Polygon","coordinates":[[[36.15684403...   
3  866a4a047ffffff  {"type":"Polygon","coordinates":[[[36.30522474...   
4  866a4a04fffffff  {"type":"Polygon","coordinates":[[[36.36131294...   

   sum_pop_2020  
0    476.538185  
1    676.912804  
2    347.182722  
3    380.988678  
4    285.943490  

/aggregate#

The aggregate endpoint is very similar to the summary endpoint, but it returns an aggregate statistic for the entire area, based on an additional aggregation type function (‘sum’, ‘avg’, ‘count’, ‘max’ or ‘min’). The request body is the same as the summary endpoint, with the addition of the aggregation_type field.

This example uses an admin-1 province boundary from GeoBoundaries, retrieved as a geopandas geodataframe or simple feature (r).

import requests
import geopandas as gpd

BASE_URL = "https://space2stats.ds.io"
AGGREGATION_ENDPOINT = f"{BASE_URL}/aggregate"

def fetch_admin_boundaries(iso3: str, adm: str) -> gpd.GeoDataFrame:
    """Fetch administrative boundaries from GeoBoundaries API."""
    url = f"https://www.geoboundaries.org/api/current/gbOpen/{iso3}/{adm}/"
    res = requests.get(url).json()
    return gpd.read_file(res["gjDownloadURL"])

ISO3 = "KEN"
ADM = "ADM1"
adm_boundaries = fetch_admin_boundaries(ISO3, ADM)
row = adm_boundaries.iloc[0]

request_payload = {
    "aoi": {
        "type": "Feature",
        "geometry": row.geometry.__geo_interface__,
        "properties": {},
    },
    "spatial_join_method": "touches",
    "fields": ["sum_pop_2020"],
    "aggregation_type": "sum",
}

response = requests.post(AGGREGATION_ENDPOINT, json=request_payload)

if response.status_code == 200:
    result = response.json()
    print(result)
else:
    print(response.content)
library(httr2)
library(sf)
library(jsonlite)
library(geojsonsf)

base_url <- "https://space2stats.ds.io"

fetch_admin_boundaries <- function(iso3, adm) {
  # Fetch administrative boundaries from GeoBoundaries API
  url <- sprintf("https://www.geoboundaries.org/api/current/gbOpen/%s/%s/", iso3, adm)
  
  response <- request(url) %>%
    req_perform() %>%
    resp_body_json()
  
  sf::read_sf(response$gjDownloadURL)
}

ISO3 <- "KEN"
ADM <- "ADM1"
adm_boundaries <- fetch_admin_boundaries(ISO3, ADM)

# Select the first row from the adm_boundaries
row <- adm_boundaries[1, ]

sf_geoj <- sf_geojson(row, atomise=T)
geojson_list <- fromJSON(sf_geoj[1])

# Create the request payload
request_payload <- list(
  aoi = geojson_list,
  spatial_join_method = "touches",
  fields = list("sum_pop_2020"),
  aggregation_type = "sum"
)

# Set up the base URL and create the request
req <- request(base_url) |>
  req_url_path_append("aggregate") |>
  req_body_json(request_payload)

# Perform the request and get the response
resp <- req |> req_perform()

# Turn response into a data frame
aggregate_data <- resp |> resp_body_string() |> fromJSON(flatten = TRUE)

print(aggregate_data)

The expected response is a JSON containing the requested aggregate statistic for the area:

{'sum_pop_2020': 1374175.833772784}

Notebook Examples#

StatsTable Python Package#

In addition to the API, the StatsTable python package provides the API’s underlying functionality as a set of functions (fields, summaries, and aggregate). The package enables researchers to work with the Space2Stats database directly and conduct faster queries and scale research applications.

Note

This package is still under development. Currently, users need to set credential parameters to connect to the database. Reach out to gost@worldbank.org to request credentials.

Setup and Installation#

Install the package via pip:

pip install "git+https://github.com/worldbank/DECAT_Space2Stats.git#subdirectory=space2stats_api/src"

Or, using poetry:

conda create -n s2s python=3.11
conda activate s2s
pip install poetry
cd space2stats_api/src
poetry install

Create a db.env file in the root directory with the following content:

PGHOST=
PGPORT=
PGDATABASE=
PGUSER=
PGPASSWORD=
PGTABLENAME=space2stats

Connect to the database and use package functions (e.g., fields, summaries, aggregate). Additional documentation for these is available here.

from space2stats import StatsTable

with StatsTable.connect() as stats_table:
    ...

Connection parameters may be explicitly set:

from space2stats import StatsTable

with StatsTable.connect(
    PGHOST="localhost",
    PGPORT="5432",
    PGUSER="postgres",
    PGPASSWORD="changeme",
    PGDATABASE="postgis",
    PGTABLENAME="space2stats",
) as stats_table:
    ...