Crop Season-Phenology#
Author: Dunstan Matekenya
Affiliation: DECAT, The World Bank Group
Date: April 16, 2023
Background#
In this notebook, we generate geovariables in the category of climate
for the National Panel Survey (NPS 2012-2013) in Tanzania. The following bioclimatic geovariables are computed.
Average total change in greenness (integral of daily EVI values) within primary growing season
Average timing of onset of greenness increase in day of year 1-356, where Jul 1 = 1
Average timing of onset of greenness decrease in day of year 1-356, where Jul 1 = 1, averaged by district
Total change in greenness (integral of daily EVI values) within primary growing season for July 2011 - Jun 2012, averaged by district
Onset of greenness increase in day of year 1-356, starting July 1 2011, averaged by district
Onset of greenness decrease in day of year 1-356, starting July 1 2011, averaged by district
Total change in greenness (integral of daily EVI values) within primary growing season for July 2012- Jun 2013
Onset of greenness increase in day of year 1-356, starting July 1 2012, averaged by district
Onset of greenness decrease in day of year 1-356, starting July 1 2012, averaged by district
Average EVI value at peak of greenness within growing season,starting July 1
EVI value at peak of greenness within growing season,starting July 1 2011, averaged by district
EVI value at peak of greenness within growing season,starting July 1 2012, averaged by district
The naming and definition of the variables are based on WorldClim. For full variables metadata, see Data-Cover-Page for details about definition of the metrics and other important information.
Input datasets#
The following datasets have been used for all the variables.
Data source: The source for this data is the WorldClim-Bioclimatic variables-30secs
Input data files The data files used in this notebook were downloaded from the site mentioned above and kept in this OneDrive folder[1]
Python setup#
The following packages are required for the processing in this notebook.
import warnings
warnings.simplefilter(action='ignore')
import sys
import itertools
from pathlib import Path
from datetime import datetime, timedelta
import geopandas as gpd
import rasterio
from rasterio.enums import Resampling
from rasterstats import gen_zonal_stats
import pandas as pd
import numpy as np
from numpy import ma
spanon = Path.cwd().parents[2].joinpath("Spatial-Anonymization/src/")
sys.path.append(str(spanon))
from spatial_anonymization import utils
from spatial_anonymization import point_displacement
Setup input files#
Define paths to input data files as global variables using Path
package.
# =============================
# BASE WORKING DIR
# ============================
DIR_DATA = Path.cwd().parents[1].joinpath('data', 'TZA')
DIR_SPATIAL_DATA = DIR_DATA.joinpath('spatial')
DIR_NPS_W4 = DIR_DATA.joinpath('surveys', 'NPS_w4')
DIR_OUTPUTS = DIR_NPS_W4.joinpath('geovars', 'puf')
# =================
# GEOVARS DATA
# =================
DIR_MCD = Path.cwd().parents[1].joinpath('data', 'MCD12Q2', 'MCD12Q2.061_2000336_to_2021365')
FILE_AG_MASK = DIR_SPATIAL_DATA.joinpath('ESACCI_2015_agmask.tif')
# =============================
# SURVEY COORDINATES
# ============================
# Public/displaced GPS oordinates files
FILE_HH_PUB_GEOVARS = DIR_NPS_W4.joinpath('geovars', 'npsy4.ea.offset.dta')
# =============================
# ADMIN BOUNDARIES
# =============================
# use geoboundaries API for country bounds
FILE_ADM0 = DIR_SPATIAL_DATA.joinpath('geoBoundaries-TZA-ADM0-all','geoBoundaries-TZA-ADM0.shp')
FILE_ADM3 = DIR_SPATIAL_DATA.joinpath('geoBoundaries-TZA-ADM3-all','geoBoundaries-TZA-ADM3.shp')
FILE_ADM2 = DIR_SPATIAL_DATA.joinpath('geoBoundaries-TZA-ADM2-all','geoBoundaries-TZA-ADM2.shp')
df_pub_coords = pd.read_stata(FILE_HH_PUB_GEOVARS, convert_categoricals=False)
df_pub_coords.rename(columns={'lat_modified':'lat_dd_mod', 'lon_modified': 'lon_dd_mod'}, inplace=True)
df_pub_coords.to_csv("/Users/dunstanmatekenya/Downloads/tmp-pheno/locs.csv", index=False)
Setup processing global variables#
For variables we will use often such as column name containing latitude and longitude, set them as global variables.
# ====================
# GENERAL
# =====================
# in case we need to download OSM data
GEOFABRICK_REGION = 'africa'
COUNTRY_NAME = 'Tanzania'
COUNTRY_ISO = 'TZA'
FILENAME_SUFFIX = 'npsw4'
# When survey was conducted
SURVEY_CALENDAR_YR = 2015
AVG_YEARS = [i for i in range(2001, SURVEY_CALENDAR_YR+1)]
CYCLE = 0
# ==============
# BUFFER
# ==============
# whether to buffer points or
# extract values at point
BUFFER = True
BUFF_ZONE_DIST = 1000
# ====================
# INPUT DATA SPECS
# =====================
# Cols for input coordinates file
LAT = 'lat_modified'
LON = 'lon_modified'
HH_ID = 'clusterid'
# In cases where HH_ID isnt unique to lat,lon pairs, create temporary
# locationID based on lat,lon pairs
UNIQ_LOC_ID = 'loc'
# =============================
# VARIABLE LABELS
# ============================
VAR_LABELS = {'h2013_eviarea':'Total change in greenness within primary growing season',
'avg_eviarea': 'Average total change in greenness within primary growing season',
'h2013_evimax':'EVI value at peak of greenness within growing seaso',
'avg_evimax':'Average EVI value at peak of greenness within growing season',
'avg_grn':'Average timing of onset of greenness increase',
'h2013_grn':'Onset of greenness increase',
'h2013_sen':'Onset of greenness decrease',
'avg_sen': 'Average timing of onset of greenness decrease'}
for name, label in VAR_LABELS.items():
try:
assert len(label) < 80, 'LABEL TOO LONG'
except AssertionError:
print(name)
Define helper functions for computation#
def update_ag_mask(ag_mask_file):
"""
Update ag mask TIF so that it matches MCD12Q2 layers dimensions
"""
sample_MCD12Q2_tif = DIR_MCD.joinpath('MCD12Q2.061_EVI_Area_0_doy2001001_aid0001.tif')
ds_MCD12Q2 = rasterio.open(sample_MCD12Q2_tif)
MCD12Q2_shape = ds_MCD12Q2.meta['height'], ds_MCD12Q2.meta['width']
with rasterio.open(ag_mask_file) as dataset:
# resample data to target shape
data = dataset.read(
out_shape=MCD12Q2_shape,
resampling=Resampling.nearest
)
# Set NaN to zero for the sake of creating numpy mask
data[np.isnan(data)] = 0
# Create numpy mask which sets all zeros to False
# Because we actually want the zeros to be True (in the np.ma mask style)
# we invert the mask
ag_mask = ~ma.make_mask(data)
return ag_mask
def select_files(cycle, category='EVI_Area', target_yrs=None):
"""
Extract relevant files from MCD12Q2 directory
"""
file_dict = {}
category_with_cycle = '{}_{}'.format(category, cycle)
for file in DIR_MCD.iterdir():
if file.suffix != ".tif":
continue
yr = file.parts[-1].split("_")[-2][3:7]
if target_yrs:
if int(yr) not in target_yrs:
continue
fname = file.parts[-1][12:]
fname_parts = fname.split("_")
this_category = sorted(fname_parts[:-2])
target_category = sorted(category_with_cycle.split("_"))
if this_category == target_category:
file_dict[yr] = file
return file_dict
def days_since_epoch_to_doy(days):
"""
Converts days since epoch to day of the year where 1 is Jan-1 and 365 is Dec-31.
"""
# Subtract 182 to shift to June-July
if days == 32767:
return 32767
else:
days = int(days - 182)
new_date = datetime(1970, 1, 1) + timedelta(days - 1)
doy = new_date.timetuple().tm_yday
return doy
def prepare_raster_for_zonal_stats(raster1, raster2=None, ag_mask=None):
"""
Prepare array before calculating zonal stats
"""
ds = rasterio.open(raster1)
no_data = ds.meta['nodata']
transform = ds.meta['transform']
arr = ds.read(1)
if raster2:
ds2 = rasterio.open(raster2)
arr2 = ds2.read(1)
if ag_mask is not None:
arr2 = np.where(arr2!=no_data, arr2, np.NaN)
arr = np.where(arr!=no_data, arr, np.NaN)
arr_out = arr + arr2
arr_out_mask = ma.array(arr_out, mask=ag_mask)
arr_out_mask.fill_data = np.NaN
arr_out_mask[np.isnan(arr_out_mask)] = no_data
return arr_out_mask.data, transform, no_data
else:
arr2 = np.where(arr2!=no_data, arr2, np.NaN)
arr = np.where(arr!=no_data, arr, np.NaN)
arr_out = arr + arr2
arr_out[np.isnan(arr_out)] = no_data
return arr_out, transform, no_data
else:
if ag_mask is not None:
arr_main = ma.array(arr, mask=ag_mask)
arr_main.fill_data = no_data
return arr_main.data, transform, no_data
else:
return arr, transform, no_data
def generate_evi_stats(adm_hhs, coords_notin_geo, evi_category,
colname, use_ag_mask=False, avg_yrs=None,
survey_yr=None):
# Extract files for total change in greenness
if evi_category == 'EVI_max':
files = select_files(cycle=CYCLE, category='EVI_Amplitude', target_yrs=avg_yrs)
files_min = select_files(cycle=CYCLE, category='EVI_Minimum', target_yrs=avg_yrs)
else:
files = select_files(cycle=CYCLE, category=evi_category, target_yrs=avg_yrs)
# Load ag_mask in case we need it
mask = update_ag_mask(ag_mask_file=FILE_AG_MASK)
# Lambda function to convert days since 1970-1-1 to doy
convert2doy = np.vectorize(days_since_epoch_to_doy)
# =================================
# LOOP THROUGH THE YEARS
# =================================
for yr, file in files.items():
col = file.parts[-1].split("_")[-2]
if evi_category=='EVI_max':
second_raster = files_min[yr]
if use_ag_mask:
input_arr, transform, no_data = prepare_raster_for_zonal_stats(raster1=file, raster2=second_raster,
ag_mask=mask)
else:
input_arr, transform, no_data = prepare_raster_for_zonal_stats(raster1=file, raster2=second_raster)
zstats = [i['mean']/10000 for i in list(gen_zonal_stats(adm_hhs , input_arr, affine=transform,
stats=['mean'], nodata=no_data))]
elif evi_category=='Greenup' or evi_category=='Senescence':
if use_ag_mask:
input_arr, transform, no_data = prepare_raster_for_zonal_stats(raster1=file, ag_mask=mask)
input_arr = convert2doy(input_arr)
else:
input_arr, transform, no_data = prepare_raster_for_zonal_stats(raster1=file)
input_arr = convert2doy(input_arr)
zstats = [i['mean'] for i in list(gen_zonal_stats(adm_hhs , input_arr, affine=transform,
stats=['mean'], nodata=no_data))]
else:
if use_ag_mask:
input_arr, transform, no_data = prepare_raster_for_zonal_stats(raster1=file, ag_mask=mask)
else:
input_arr, transform, no_data = prepare_raster_for_zonal_stats(raster1=file)
zstats = [i['mean']/10 for i in list(gen_zonal_stats(adm_hhs , input_arr, affine=transform,
stats=['mean'], nodata=no_data))]
adm_hhs[col] = zstats
# ===========================================
# EXTRACT 1 YEAR DATA OR AVERAGE OVER YEARS
# ============================================
if survey_yr:
print('Extracting value for {} only'.format(survey_yr))
yr_colname = 'doy{}001'.format(survey_yr)
adm_hhs[colname] = adm_hhs[yr_colname]
df = adm_hhs[[UNIQ_LOC_ID,LAT, LON, colname]]
else:
avg_cols = [c for c in adm_hhs.columns if c[:3] == 'doy']
avg_yrs = sorted(files.keys())
print('-'*50)
print(' AVEAGING OVER THE FOLLOWING YEARS')
print('-'*50)
print(avg_yrs)
adm_hhs[colname] = adm_hhs[avg_cols].mean(axis=1)
df = adm_hhs[[UNIQ_LOC_ID,LAT, LON, colname]]
# =================================
# DEAL WITH MISSING VALUES
# =================================
df = df.append(coords_notin_geo)
null_cnt = df[colname].isnull().sum()
if null_cnt > 0:
print()
print('REPLACING {} MISSING VALUES ......'.format(null_cnt))
df2 = utils.replace_missing_with_nearest(df=df, target_col=colname,
idcol=UNIQ_LOC_ID, lon=LON, lat=LAT)
df = df2[[UNIQ_LOC_ID, colname]]
assert df[colname].isnull().sum() == 0
df.set_index(UNIQ_LOC_ID, inplace=True)
return df, colname
Compute variables#
def compute_vars():
"""
Putting everything together.
"""
# ===========================
# LOAD COORDINATES
# ===========================
df_pub_coords = pd.read_stata(FILE_HH_PUB_GEOVARS, convert_categoricals=False)
global UNIQ_LOC_ID
if df_pub_coords[HH_ID].nunique() != len(df_pub_coords.drop_duplicates(subset=[LAT, LON])):
df_pub_coords[UNIQ_LOC_ID] = df_pub_coords.apply(lambda x: '{}-{}'.format(x[LON], x[LAT]), axis=1)
df_coords = df_pub_coords[[UNIQ_LOC_ID, LAT, LON]]
df_coords.drop_duplicates(subset=UNIQ_LOC_ID, inplace=True)
else:
UNIQ_LOC_ID = HH_ID
df_coords = df_pub_coords[[UNIQ_LOC_ID, LAT, LON]]
df_coords[UNIQ_LOC_ID] = df_coords[UNIQ_LOC_ID].astype(str)
gdf_coords = gpd.GeoDataFrame(df_coords,
geometry=gpd.points_from_xy(df_coords[LON],df_coords[LAT]), crs=4326)
gdf_adm2 = gpd.read_file(FILE_ADM2)
gdf_adm2_hhs = gpd.sjoin(gdf_adm2, gdf_coords, op='contains')
gdf_adm2_hhs = gdf_adm2_hhs[['geometry', 'shapeID', UNIQ_LOC_ID, LAT, LON]]
coords_not_in_adm2 = list(set(gdf_coords[UNIQ_LOC_ID].unique()) - set(gdf_adm2_hhs[UNIQ_LOC_ID].unique()))
df_coords_not_in_adm2 = df_coords.query('{} in {}'.format(UNIQ_LOC_ID, coords_not_in_adm2))[[UNIQ_LOC_ID, LAT, LON]]
# Keep variables here
df_list = {}
# =========================
# EVI Area
# =========================
# Survey-year EVI_Area
colname = 'h{}_eviarea'.format(SURVEY_CALENDAR_YR)
df_evia_sy, eviarea_sy = generate_evi_stats(gdf_adm2_hhs, df_coords_not_in_adm2, evi_category='EVI_Area',
colname=colname, avg_yrs=AVG_YEARS, use_ag_mask=True, survey_yr=SURVEY_CALENDAR_YR)
df_list[eviarea_sy] = df_evia_sy
# Average EVI_Area
df_avg_evia, avg_eviarea = generate_evi_stats(gdf_adm2_hhs, df_coords_not_in_adm2, evi_category='EVI_Area',
colname='avg_eviarea', avg_yrs=AVG_YEARS, use_ag_mask=True)
df_list[avg_eviarea] = df_avg_evia
# =========================
# EVI Maximum
# =========================
# Survey year EVI maximum
colname = 'h{}_evimax'.format(SURVEY_CALENDAR_YR)
df_evimax_sy, evimax_sy = generate_evi_stats(gdf_adm2_hhs, df_coords_not_in_adm2, evi_category='EVI_max',
colname=colname, avg_yrs=AVG_YEARS, use_ag_mask=True, survey_yr=SURVEY_CALENDAR_YR)
df_list[evimax_sy] = df_evimax_sy
# Average EVI maximum
df_avg_evimax, avg_evimax = generate_evi_stats(gdf_adm2_hhs, df_coords_not_in_adm2, evi_category='EVI_max',
colname='avg_evimax', avg_yrs=AVG_YEARS, use_ag_mask=True)
df_list[avg_evimax] = df_avg_evimax
# =========================
# TIMING OF GREENUP
# =========================
# Survey year Greenup
colname = 'h{}_grn'.format(SURVEY_CALENDAR_YR)
df_grn_sy, grn_sy = generate_evi_stats(gdf_adm2_hhs, df_coords_not_in_adm2, evi_category='Greenup',
colname=colname, avg_yrs=AVG_YEARS, use_ag_mask=True, survey_yr=SURVEY_CALENDAR_YR)
df_list[grn_sy] = df_grn_sy
# Average Greenup
df_avg_grn, avg_grn = generate_evi_stats(gdf_adm2_hhs, df_coords_not_in_adm2, evi_category='Greenup',
colname='avg_grn', avg_yrs=AVG_YEARS, use_ag_mask=False)
df_list[avg_grn] = df_avg_grn
# =========================
# TIMING OF GREENUP
# =========================
# Survey year senescence
colname = 'h{}_grn'.format(SURVEY_CALENDAR_YR)
df_grn_sy, grn_sy = generate_evi_stats(gdf_adm2_hhs, df_coords_not_in_adm2, evi_category='Greenup',
colname=colname, avg_yrs=AVG_YEARS, use_ag_mask=True, survey_yr=SURVEY_CALENDAR_YR)
df_list[grn_sy] = df_grn_sy
# Average senescence
df_avg_grn, avg_grn = generate_evi_stats(gdf_adm2_hhs, df_coords_not_in_adm2, evi_category='Greenup',
colname='avg_grn', avg_yrs=AVG_YEARS, use_ag_mask=False)
df_list[avg_grn] = df_avg_grn
# =========================
# TIMING OF SENESCENCE
# =========================
# Survey year Greenup
colname = 'h{}_sen'.format(SURVEY_CALENDAR_YR)
df_sen_sy, sen_sy = generate_evi_stats(gdf_adm2_hhs, df_coords_not_in_adm2, evi_category="Senescence",
colname=colname, avg_yrs=AVG_YEARS, use_ag_mask=True, survey_yr=SURVEY_CALENDAR_YR)
df_list[sen_sy] = df_sen_sy
# Average Greenup
df_avg_sen, avg_sen = generate_evi_stats(gdf_adm2_hhs, df_coords_not_in_adm2, evi_category="Senescence",
colname='avg_sen', avg_yrs=AVG_YEARS, use_ag_mask=False)
df_list[avg_sen] = df_avg_sen
# =====================
# COMBINE VARS
# =====================
df = pd.concat(list(df_list.values()), axis=1)
col_sen = 'h{}_sen'.format(SURVEY_CALENDAR_YR)
col_grn = 'h{}_grn'.format(SURVEY_CALENDAR_YR)
col_evimax = 'h{}_evimax'.format(SURVEY_CALENDAR_YR)
col_eviarea = 'h{}_eviarea'.format(SURVEY_CALENDAR_YR)
df[[col_evimax,'avg_evimax']] = df[[col_evimax,'avg_evimax']].applymap(lambda x: round(x, 3) if isinstance(x, float) else x)
cols_to_int = [col_eviarea, 'avg_eviarea', col_grn, 'avg_grn', col_sen, 'avg_sen']
df[cols_to_int] = df[cols_to_int].applymap(lambda x: int(round(x, 0)) if isinstance(x, float) else x)
# ===========================
# SAVE OUTPUTS
# ===========================
out_csv = DIR_OUTPUTS.joinpath('phenology{}.csv'.format(FILENAME_SUFFIX))
df.to_csv(out_csv)
# Save variable labels
out_csv_labels = DIR_OUTPUTS.joinpath('phenology-{}-labels.csv'.format(FILENAME_SUFFIX))
df_var_labels = pd.DataFrame(list(VAR_LABELS.values()), index=list(VAR_LABELS.keys()))
df_var_labels.index.name = 'var_name'
df_var_labels.rename(columns={0:'label'}, inplace=True)
df_var_labels.to_csv(out_csv_labels)
return out_csv, list(df_list.keys())
out_csv, out_vars = compute_vars()
---------------------------------------------------------------------------
KeyboardInterrupt Traceback (most recent call last)
Cell In[13], line 1
----> 1 out_csv, out_vars = compute_vars()
Cell In[12], line 37, in compute_vars()
32 # =========================
33 # EVI Area
34 # =========================
35 # Survey-year EVI_Area
36 colname = 'h{}_eviarea'.format(SURVEY_CALENDAR_YR)
---> 37 df_evia_sy, eviarea_sy = generate_evi_stats(gdf_adm2_hhs, df_coords_not_in_adm2, evi_category='EVI_Area',
38 colname=colname, avg_yrs=AVG_YEARS, use_ag_mask=True, survey_yr=SURVEY_CALENDAR_YR)
39 df_list[eviarea_sy] = df_evia_sy
41 # Average EVI_Area
Cell In[11], line 50, in generate_evi_stats(adm_hhs, coords_notin_geo, evi_category, colname, use_ag_mask, avg_yrs, survey_yr)
47 else:
48 input_arr, transform, no_data = prepare_raster_for_zonal_stats(raster1=file)
---> 50 zstats = [i['mean']/10 for i in list(gen_zonal_stats(adm_hhs , input_arr, affine=transform,
51 stats=['mean'], nodata=no_data))]
53 adm_hhs[col] = zstats
55 # ===========================================
56 # EXTRACT 1 YEAR DATA OR AVERAGE OVER YEARS
57 # ============================================
File ~/anaconda3/lib/python3.10/site-packages/rasterstats/main.py:166, in gen_zonal_stats(vectors, raster, layer, band, nodata, affine, stats, all_touched, categorical, category_map, add_stats, zone_func, raster_out, prefix, geojson_out, boundless, **kwargs)
163 band = band_num
165 with Raster(raster, affine, nodata, band) as rast:
--> 166 features_iter = read_features(vectors, layer)
167 for _, feat in enumerate(features_iter):
168 geom = shape(feat["geometry"])
File ~/anaconda3/lib/python3.10/site-packages/rasterstats/io.py:126, in read_features(obj, layer)
124 features_iter = [parse_feature(obj)]
125 elif hasattr(obj, "__geo_interface__"):
--> 126 mapping = obj.__geo_interface__
127 if mapping["type"] == "FeatureCollection":
128 features_iter = mapping["features"]
File ~/anaconda3/lib/python3.10/site-packages/geopandas/geodataframe.py:761, in GeoDataFrame.__geo_interface__(self)
730 @property
731 def __geo_interface__(self):
732 """Returns a ``GeoDataFrame`` as a python feature collection.
733
734 Implements the `geo_interface`. The returned python data structure
(...)
759
760 """
--> 761 return self._to_geo(na="null", show_bbox=True, drop_id=False)
File ~/anaconda3/lib/python3.10/site-packages/geopandas/geodataframe.py:870, in GeoDataFrame._to_geo(self, **kwargs)
862 def _to_geo(self, **kwargs):
863 """
864 Returns a python feature collection (i.e. the geointerface)
865 representation of the GeoDataFrame.
866
867 """
868 geo = {
869 "type": "FeatureCollection",
--> 870 "features": list(self.iterfeatures(**kwargs)),
871 }
873 if kwargs.get("show_bbox", False):
874 geo["bbox"] = tuple(self.total_bounds)
File ~/anaconda3/lib/python3.10/site-packages/geopandas/geodataframe.py:811, in GeoDataFrame.iterfeatures(self, na, show_bbox, drop_id)
805 raise AttributeError(
806 "No geometry data set (expected in"
807 " column '%s')." % self._geometry_column_name
808 )
810 ids = np.array(self.index, copy=False)
--> 811 geometries = np.array(self[self._geometry_column_name], copy=False)
813 properties_cols = self.columns.difference([self._geometry_column_name])
815 if len(properties_cols) > 0:
816 # convert to object to get python scalars.
File ~/anaconda3/lib/python3.10/site-packages/pandas/core/series.py:893, in Series.__array__(self, dtype)
846 def __array__(self, dtype: npt.DTypeLike | None = None) -> np.ndarray:
847 """
848 Return the values as a NumPy array.
849
(...)
891 dtype='datetime64[ns]')
892 """
--> 893 return np.asarray(self._values, dtype)
File ~/anaconda3/lib/python3.10/site-packages/geopandas/array.py:1283, in GeometryArray.__array__(self, dtype)
1275 def __array__(self, dtype=None):
1276 """
1277 The numpy array interface.
1278
(...)
1281 values : numpy array
1282 """
-> 1283 return to_shapely(self)
File ~/anaconda3/lib/python3.10/site-packages/geopandas/array.py:177, in to_shapely(geoms)
175 if not isinstance(geoms, GeometryArray):
176 raise ValueError("'geoms' must be a GeometryArray")
--> 177 return vectorized.to_shapely(geoms.data)
File ~/anaconda3/lib/python3.10/site-packages/geopandas/_vectorized.py:150, in to_shapely(data)
148 out = np.empty(len(data), dtype=object)
149 with compat.ignore_shapely2_warnings():
--> 150 out[:] = [_pygeos_to_shapely(geom) for geom in data]
151 return out
152 else:
File ~/anaconda3/lib/python3.10/site-packages/geopandas/_vectorized.py:150, in <listcomp>(.0)
148 out = np.empty(len(data), dtype=object)
149 with compat.ignore_shapely2_warnings():
--> 150 out[:] = [_pygeos_to_shapely(geom) for geom in data]
151 return out
152 else:
File ~/anaconda3/lib/python3.10/site-packages/geopandas/_vectorized.py:75, in _pygeos_to_shapely(geom)
73 return shapely.wkt.loads("POINT EMPTY")
74 else:
---> 75 return shapely.wkb.loads(pygeos.to_wkb(geom))
File ~/anaconda3/lib/python3.10/site-packages/shapely/wkb.py:17, in loads(data, hex)
8 def loads(data, hex=False):
9 """Load a geometry from a WKB byte string, or hex-encoded string if
10 ``hex=True``.
11
(...)
15 If ``data`` contains an invalid geometry.
16 """
---> 17 return shapely.from_wkb(data)
File ~/anaconda3/lib/python3.10/site-packages/shapely/io.py:325, in from_wkb(geometry, on_invalid, **kwargs)
321 # ensure the input has object dtype, to avoid numpy inferring it as a
322 # fixed-length string dtype (which removes trailing null bytes upon access
323 # of array elements)
324 geometry = np.asarray(geometry, dtype=object)
--> 325 return lib.from_wkb(geometry, invalid_handler, **kwargs)
KeyboardInterrupt:
Quality checks#
Run summary statistic to generare minimum, maximum values etc and inspect them manually.
df = pd.read_csv(out_csv)
# Only use describe for continuos variables
print('-'*50)
print(' SUMMARY STATISTICS FOR CONTINOUS VARS')
print('-'*50)
stats = df[out_vars].describe().T
print(stats[['mean', 'min', '50%', '75%', 'max']])
print('-'*70)
--------------------------------------------------
SUMMARY STATISTICS FOR CONTINOUS VARS
--------------------------------------------------
mean min 50% 75% max
h2015_eviarea 13.768496 6.000 11.000 18.000 31.000
avg_eviarea 17.305489 8.000 17.000 20.000 32.000
h2015_evimax 0.463456 0.315 0.455 0.508 0.632
avg_evimax 0.476568 0.341 0.473 0.517 0.646
h2015_grn 181.202864 78.000 174.000 230.000 269.000
avg_grn 157.840095 88.000 155.000 183.000 247.000
h2015_sen 285.171838 152.000 288.000 319.000 344.000
avg_sen 280.959427 212.000 282.000 294.000 321.000
----------------------------------------------------------------------
Summary#
The following variables below were generated.
:name: geovariables_crop_season
ID |
Theme |
Variable Description |
Variable ID |
Units |
Resolution |
Data sources |
Required |
---|---|---|---|---|---|---|---|
1 |
Crop Season Parameters |
Average total change in greenness within primary growing season |
crops07 |
N/A |
0.004176 dd |
Yes |
|
2 |
Crop Season Parameters |
Average timing of onset of greenness increase[^10] |
crops07 |
Day of the year |
0.004176 dd |
Yes |
|
3 |
Crop Season Parameters |
Average timing of onset of greenness decrease[^11] |
crops07 |
Day of the year |
0.004176 dd |
Yes |
|
4 |
Crop Season Parameters |
Total change in greenness within primary growing season [^12] |
crops07 |
N/A |
0.004176 dd |
Yes |
|
5 |
Crop Season Parameters |
Onset of greenness increase [^13] |
crops07 |
Day of the year |
0.004176 dd |
Yes |
|
6 |
Crop Season Parameters |
Onset of greenness decrease [^14] |
crops07 |
Day of the year |
0.004176 dd |
Yes |
|
7 |
Crop Season Parameters |
Average EVI value at peak of greenness within growing season[^15] |
crops07 |
N/A |
0.004176 dd |
Yes |
|
8 |
Crop Season Parameters |
EVI value at peak of greenness within growing seaso[^16] |
crops07 |
N/A |
0.004176 dd |
Yes |