Geospatial Enhancement of Surveys

Geospatial Enhancement of Surveys#

This repository contain code and documentation about a collection of activities whose overarching goal is to add geospatial variables to locations from household surveys. For example, given a completed household survey in a country, we can generate anonymized household level coordinates (or enumeration area level coordinates which will be centroids) and link them with variables coming from geospatial data such as precipitation, vegetation indices and more which are otherwise not avaibale in the survey itself. Thus, geoenhancement is a way to enrich survey data with geospatial variables so that analysts can conduct more extended analysis. The repository provides the following:

Survey geo-enhacment process. In-depth information about how the geovariables are generated, rationale for selection of data sources and other design decisions. In addition, we also document best practices for this type of data processing.
Data generation for specific surveys. All the required documentation about each survey which has gone through this geo-enhancement is fully covered in this repo. This includes what geovariables were generated, where to find the output geovariables and more.
Spatial anonymization. As you will note from the survey geo-enhancement process, the survey coordinates need to be anonymized first before they are used in the ge-enhancement process and the associated geovariables publicly disseminated. As such, the work covered in this repository included development of tools for robust spatial anonymization. A Python package: [spatial-anoanonymization] (worldbank/Spatial-Anonymization) for this prupose is being developed. In this regard, information about this package and other tools for spatial anonymization and bets practices will also be provided.

Repository setup#

The repository has documentation about the geoenhcament proces as a whole but also specific documentation about data generation for surveys in specific countries. In this regard, all the folders mentioned below have subfolders reflecting survey projects for that particular country.

Survey projects naming convention#

A survey project is a any survey from a country which is being worked on and will have a folder in this repository. The naming style is as follows: ISO_{survey_name}_survey_wave and each of these components are described below.

ISO - The country 3 letter ISO code
Survey name - The name of the survey abbreviated based on country convention. For example, the 2020 Integrated Household Survey (IHS)
Survey wave - For longitudinal surveys, the waves refer to follow ups from previous surveys. This is represented as a w{number}. For example, w3 means wave-3. The survey wave information can be detrmined from the years the survey was conducted.

As an example, we can have the following names:

TZA_nps_w3 - Tanzania, National Panel Survey(2012-2013).
TZA_nps_w4 - Tanzania, National Panel Survey(2015-2016).
ETH_ess_w5 - Ethiopia Socioeconomic Survey (2020-2021)

Main folders#

Within each of the main folders described below, any country and/or survey project specific content is located in a subfolder with a name as descibed above. A description of what each subfolder contains is provided below

docs/

This project uses Jupyter notebooks and other mark-up documents to share documentation. This folder is used to organize the files for publishing into this projects’ documentation site. The folder also contains other miscelleneous documents. For other documentation about this project, please visit the Teams Channel for the project.
data/

This is a placeholder folder for data as the actual data is not uploaded here. There are two main categories of data: the input data (in this case survey data) and the output data (in this case geospatial variables attached to the survey coordinates). To learn more (e.g., where the data is being kept, metata etc), please refer to survey speciffic documentation available through this project documentation site or going through the processin scripts and/or notebooks for that particular survey.
src/

All the source code is available in this folder.
notebooks/

This folder contains both Jupyter notebooks notebooks and R scripts. Again, the scripts and notebooks will live in survey specific subfolders. In addition to all the noteboooks and R-scripts for data processing and analysis, each subfolder will also contain a data cover page document stored as a README.md file for convinience and styling purposes. This document provides all the information needed to understand how the geospatial enhacement pocess was done for this particular survey and so its a must read for anyone attempting to replicate this process and/or wanting in-depth understanding of how the geovariables were generated.

Available surveys#

For a quick review of available geovariables in surveys, see links below.

Usage#

As this is not a typical Python package, we cannot prescribe how to use this repository but here are some ways we belive obe can use this repository:

Survey geovariables data users. Find links to latest data on geovariables for surveys of interest. For example, if you are working on Tanzania and need to see geovariables for the National Panel Survey(2012-2013), simply refer to the documentation site) and follow links to the survey documentation.
Survey specialists, economists and others If you are interested in more in-depth understanding of the anonymization process for the survey coordinates, the excact data sources used to genrate the geovariablesd and other deeper technical information, you can use this repo to review our methodology. In case you would like to replicate any of the processing shared in this repositiry,please see the instructions below.

In order to fully follow the documentation in this repository, we recommend you start with the documentation site.

Installation#

This repository is not designed to be pip-installable. However, there there are two ways you can use tools and code associated with this repository.

Perform spatial anonymization on your own data. Visit the spatial-anonymization Python package developed as part of this project and follow instructions to installand use the package. Clone this repository
Re-use notebooks in this repository. In case you want to replicate some of the geovariable data generation and analysis shown in this repository,feel free to clone tis repository and re-use the code.