3. Establishing the Analysis Zone#
3.1. Summary#
The first step when performing these type of analysis is to define the spatio-temporal extent of the studied region. This will serve as the baseline for the analysis where all the data will be layered and results will be extracted for it. It will also allow others to reproduce the work. In other words, it will give the same common language among stakeholders, avoiding misunderstandings.
This notebook will allow the students to learn the best practices on how to define the zone for the analysis.
3.2. Learning Objectives#
3.2.1. Overall goals#
The main goal of this class is to teach students to define the zone of interest for a project.
3.2.2. Specific goals#
At the end of this notebook, you should have gained an understanding and appreciation of the following:
Learn how to define the area of interest for the project.
Understand commonly used boundaries:
Administrative levels.
H3 geometries.
Be Acquainted with commonly used sources for downloading base maps.
3.3. How to define the Area of Interest?#
Most of the times, the zone definition is based on prestablished administrative boundaries, like country, state/province, city, etc. In other cases, the client might have its own definition and might provide a verbal description or geo reference for it.
An important consideration when working with administrative boudaries is that they can be modified over time and this must be acknowledged, specially when performing longitudinal analysis. For example, suppose we have the task of calculating the population density across the European Union. We have a dataset that provides number of people across the years and a georeferenced vector layer with the extent of the region. To calculate the population density we just need to divide the population by the region’s area. However, is the area going to be the same across years? How should we consider that?
The following map shows the differences in the European Union extent in 2021 vs 2010. The data was downloaded from here.
# !pip install geopandas folium matplotlib mapclassify
# Import the required libraries
import geopandas as gpd
import folium
# Read the data for the EU extent for 2010 and 2021
data_2021 = gpd.read_file('../../data/establishing-the-analysis-zone/NUTS_RG_20M_2021_3035.geojson')
data_2010 = gpd.read_file('../../data/establishing-the-analysis-zone/NUTS_RG_20M_2010_3035.geojson')
# Create a map where students can turn layers on/off and see the changes through time
m = data_2021[data_2021['LEVL_CODE']==0].explore(color = 'deeppink', name = 'members 2021')
data_2010[data_2010['LEVL_CODE']==0].explore(m=m, color = 'royalblue', name = 'members 2010')
folium.LayerControl().add_to(m)
m
Another consideration for defining the analysis zone is whether the analysis will be compared with past ones. In that case, the definition used in the previous analysis will be the one to use.
3.4. Border error#
This error is about the fact that when a limit is defined, anything outside it is not being considered. For example, when studying trips to the CBD one might not count the ones originated outside the studied region. Thus, there is a number of trips going to the CBD which are not being counted. The way to deal with this error is by just acknowledging it.
3.5. Geographic level#
The region of study will present geographic subdivisions for which data will be summarized. For example, the US Census Bureau has:
State
County
Tract
Block Group
Block
In this case, the levels are nested: by aggregating blocks, one can get the block groups; by aggregating block groups, one can get the tracts, and so on.
3.5.1. Administrative Boundaries Across the Globe#
There is a common international agreement to call administrative levels from 0 to 4. Level 0 represents the Country, level 1 represents the State/Province, level 2 represents the District/County, and level 3 and 4 are higher levels and its availability depends on the country. These levels are supposed to be nested and consistent. The key challenge is that the name of the administrative boundaries and the number of them, change by country. Also, higher geography levels like municipalities depend on the political system of the country and is harder to find information (georeferenced or not) about it.
The following map shows the administrative levels for Uganda and it was extracted from GADM. According to the documentation:
Uganda administrative level 0 (country), 1 (region), 2 (district), 3 (county), and 4 (sub-country) boundaries.
Use the layers on/off button to explore the different administrative levels.
# Load the data
uga0 = gpd.read_file('../../data/establishing-the-analysis-zone/gadm41_UGA_shp/gadm41_UGA_0.shp')
uga1 = gpd.read_file('../../data/establishing-the-analysis-zone/gadm41_UGA_shp/gadm41_UGA_1.shp')
uga2 = gpd.read_file('../../data/establishing-the-analysis-zone/gadm41_UGA_shp/gadm41_UGA_2.shp')
uga3 = gpd.read_file('../../data/establishing-the-analysis-zone/gadm41_UGA_shp/gadm41_UGA_3.shp')
uga4 = gpd.read_file('../../data/establishing-the-analysis-zone/gadm41_UGA_shp/gadm41_UGA_4.shp')
# Create the map
m = uga0.explore(name = 'level 0', color = 'none', style_kwds=dict(color='royalblue'))
uga1.explore(m = m, name = 'level 1', color = 'none', style_kwds=dict(color='deeppink'))
uga2.explore(m = m, name = 'level 2', color = 'none', style_kwds=dict(color='seagreen'))
# uga3.explore(m = m, name = 'level 3', color = 'none', style_kwds=dict(color='purple'))
# uga4.explore(m = m, name = 'level 4', color = 'none', style_kwds=dict(color='grey'))
folium.LayerControl().add_to(m) # Add a control for layers
m