3. Establishing the Analysis Zone#

3.1. Summary#

The first step when performing these type of analysis is to define the spatio-temporal extent of the studied region. This will serve as the baseline for the analysis where all the data will be layered and results will be extracted for it. It will also allow others to reproduce the work. In other words, it will give the same common language among stakeholders, avoiding misunderstandings.

This notebook will allow the students to learn the best practices on how to define the zone for the analysis.

3.2. Learning Objectives#

3.2.1. Overall goals#

The main goal of this class is to teach students to define the zone of interest for a project.

3.2.2. Specific goals#

At the end of this notebook, you should have gained an understanding and appreciation of the following:

  1. Learn how to define the area of interest for the project.

  2. Understand commonly used boundaries:

    • Administrative levels.

    • H3 geometries.

  3. Be Acquainted with commonly used sources for downloading base maps.

3.3. How to define the Area of Interest?#

Most of the times, the zone definition is based on prestablished administrative boundaries, like country, state/province, city, etc. In other cases, the client might have its own definition and might provide a verbal description or geo reference for it.

An important consideration when working with administrative boudaries is that they can be modified over time and this must be acknowledged, specially when performing longitudinal analysis. For example, suppose we have the task of calculating the population density across the European Union. We have a dataset that provides number of people across the years and a georeferenced vector layer with the extent of the region. To calculate the population density we just need to divide the population by the region’s area. However, is the area going to be the same across years? How should we consider that?

The following map shows the differences in the European Union extent in 2021 vs 2010. The data was downloaded from here.

# !pip install geopandas folium matplotlib mapclassify
# Import the required libraries
import geopandas as gpd
import folium
# Read the data for the EU extent for 2010 and 2021
data_2021 = gpd.read_file('../../data/establishing-the-analysis-zone/NUTS_RG_20M_2021_3035.geojson')
data_2010 = gpd.read_file('../../data/establishing-the-analysis-zone/NUTS_RG_20M_2010_3035.geojson')
# Create a map where students can turn layers on/off and see the changes through time
m = data_2021[data_2021['LEVL_CODE']==0].explore(color = 'deeppink', name = 'members 2021')
data_2010[data_2010['LEVL_CODE']==0].explore(m=m, color = 'royalblue', name = 'members 2010')
folium.LayerControl().add_to(m)
m
Make this Notebook Trusted to load map: File -> Trust Notebook

Another consideration for defining the analysis zone is whether the analysis will be compared with past ones. In that case, the definition used in the previous analysis will be the one to use.

3.4. Border error#

This error is about the fact that when a limit is defined, anything outside it is not being considered. For example, when studying trips to the CBD one might not count the ones originated outside the studied region. Thus, there is a number of trips going to the CBD which are not being counted. The way to deal with this error is by just acknowledging it.

3.5. Geographic level#

The region of study will present geographic subdivisions for which data will be summarized. For example, the US Census Bureau has:

  • State

  • County

  • Tract

  • Block Group

  • Block

In this case, the levels are nested: by aggregating blocks, one can get the block groups; by aggregating block groups, one can get the tracts, and so on.

3.5.1. Administrative Boundaries Across the Globe#

There is a common international agreement to call administrative levels from 0 to 4. Level 0 represents the Country, level 1 represents the State/Province, level 2 represents the District/County, and level 3 and 4 are higher levels and its availability depends on the country. These levels are supposed to be nested and consistent. The key challenge is that the name of the administrative boundaries and the number of them, change by country. Also, higher geography levels like municipalities depend on the political system of the country and is harder to find information (georeferenced or not) about it.

The following map shows the administrative levels for Uganda and it was extracted from GADM. According to the documentation:

Uganda administrative level 0 (country), 1 (region), 2 (district), 3 (county), and 4 (sub-country) boundaries.

Use the layers on/off button to explore the different administrative levels.

# Load the data
uga0 = gpd.read_file('../../data/establishing-the-analysis-zone/gadm41_UGA_shp/gadm41_UGA_0.shp')
uga1 = gpd.read_file('../../data/establishing-the-analysis-zone/gadm41_UGA_shp/gadm41_UGA_1.shp')
uga2 = gpd.read_file('../../data/establishing-the-analysis-zone/gadm41_UGA_shp/gadm41_UGA_2.shp')
uga3 = gpd.read_file('../../data/establishing-the-analysis-zone/gadm41_UGA_shp/gadm41_UGA_3.shp')
uga4 = gpd.read_file('../../data/establishing-the-analysis-zone/gadm41_UGA_shp/gadm41_UGA_4.shp')
# Create the map
m = uga0.explore(name = 'level 0', color = 'none', style_kwds=dict(color='royalblue'))
uga1.explore(m = m, name = 'level 1', color = 'none', style_kwds=dict(color='deeppink'))
uga2.explore(m = m, name = 'level 2', color = 'none', style_kwds=dict(color='seagreen'))
# uga3.explore(m = m, name = 'level 3', color = 'none', style_kwds=dict(color='purple'))
# uga4.explore(m = m, name = 'level 4', color = 'none', style_kwds=dict(color='grey'))
folium.LayerControl().add_to(m) # Add a control for layers
m
Make this Notebook Trusted to load map: File -> Trust Notebook

3.5.1.1. Sources to get Administrative Boundaries#

3.5.2. H3 - Hexagonal Hierarchical Geospatial Indexing System#

The H3 grid system was developed by Uber and it is open source. This grid partitions the world into hexagons of different sizes which are also nested, as depicted by the image.

../../_images/uber_h3.png

Fig. 3.1 Uber’s H3 hexagons example. Source.#

3.5.3. What level should I use?#

The first question to be asked is at which level available data is aggregated. So, if the data is available at the State level, then no more granular analysis, like at the municipal level, could be performed unless a desaggregation process is followed. The next question is the purpose of the analysis and who will be using it.

3.6. Disputed Boundaries#

In sensitive cases where boundaries are being disputed and the map owner does not take part in the dispute, it is a good practice to make a statement regarding that. For example, this map from Gaza and West Bank produced by the Map Design Unit at the World Bank.

../../_images/disputed_territory.png

Fig. 3.2 Example of a map showing disputed boundaries. Source.#

3.7. Practice#

Create a map for Türkiye using Administrative level 0 and 2. Download the data from one of the above listed sources.