wazeasy package#
Submodules#
wazeasy.plots module#
wazeasy.reports module#
wazeasy.utils module#
- wazeasy.utils.assign_geography_to_jams(df, geog_info=None)#
Assign a geography to each traffic jam. The geography is given based on the starting point of the jam. Do not use this function for detailed geographies. In that case, refer to:
Parameters: - df (DataFrame): The Dask or Pandas DataFrame containing traffic jam data. - geog_info (dict): A dictionary containing geographical information for assignment. The key is the name of the geography, and the value is the georreferenced data with the geographic subdivisions.
Returns: - None: Modifies the DataFrame in place.
- wazeasy.utils.create_gdf(ddf, start_node=False)#
Create a Dask-Geopandas GeoDataFrame from a Dask DataFrame.
Parameters: - ddf (DataFrame): The Dask DataFrame containing geographical data.
Returns: - GeoDataFrame: A GeoDataFrame with the geometry column set.
- wazeasy.utils.define_dates_of_interest(df, start_date=None, end_date=None, dow=None)#
- wazeasy.utils.distribute_jam_over_aggregation_geom(gddf, ddf, projected_crs)#
Distribute the jam over the aggregation geometry.
Parameters: - gddf (GeoDataFrame): The GeoDataFrame with unique jams and geometry. - ddf (DataFrame): The Dask DataFrame containing traffic jam data - the original data. - projected_crs (str): The projected coordinate reference system yo be used.
Returns: - DataFrame: A Dask DataFrame with jams distributed over the aggregation geometry.
Notice that this DataFrame will have more rows than the original one due to the overlay process.
- wazeasy.utils.filter_date_range_by_dow(date_range, dow)#
Filter a date range by days of the week.
Parameters: - date_range (DatetimeIndex): The range of dates to filter. - dow (list): Days of the week to consider (0 = Monday, 6 = Sunday).
Returns: - list: A list of dates that match the specified days of the week.
- wazeasy.utils.handle_time(df, utc_region)#
Handle time column to ensure it is in the correct UTC and calculate the following time-related attributes: - year: Year of the record (numeric). - month: Month of the record (numeric, 1–12). - date: Calendar date (YYYY-MM-DD). - hour: Hour of the day in 24-hour format. - local_time: Timestamp converted to the specified UTC region.
Parameters: - df (DataFrame): The DataFrame containing the data. - utc_region (str): The UTC region to convert the time to.
Returns: - None: Modifies the DataFrame in place.
- wazeasy.utils.hourly_tci_by_geography(df, agg_spatial, agg_column, start_date=None, end_date=None, dow=None)#
Calculate the hourly average Traffic Congestion Intensity (TCI) Index, for a time period. Parameters: - ddf (DataFrame): The Dask/Pandas DataFrame containing traffic jam data. - agg_spatial (str): Name of column used for spatial aggregation. - agg_column (str): The column to aggregate in the TCI, normally length. - start_date (str, optional): The start date (YYYY-MM-DD) of the period to consider.
If None, it will use the minimum date in the data.
- end_date (str, optional): The end date (YYYY-MM-DD) of the period to consider.
If None, it will use the maximum date in the data.
dow (list, optional): Days of the week to consider (0 = Monday, 6 = Sunday).
Returns: - Series: A Series with the average TCI for each hour and geography.
- wazeasy.utils.is_dask_dataframe(df)#
Check if DataFrame is a Dask DataFrame.
- wazeasy.utils.load_data(path, storage_options=None, file_type='csv', filter_level_5=True, usecols=None)#
Load data from a specified path for a given year and month.
Parameters: - path (str or list of string): The main directory path where data files are stored. It can also be a list of files to read - year (int): The year of the data to load. - month (int): The month of the data to load. - storage_options (dict, optional): Options for storage backends, e.g., for cloud storage. - file_type (str, optional): The type of file to load (‘csv’ or ‘parquet’). Defaults to ‘csv’.
Returns: - DataFrame: A Dask DataFrame containing the loaded data.
- wazeasy.utils.load_data_csv(path, storage_options=None, filter_level_5=True, usecols=None)#
Load CSV data from a specified path for a given year and month.
Parameters: - path (str): The main directory path where CSV files are stored. - year (int): The year of the data to load. - month (int): The month of the data to load. - storage_options (dict, optional): Options for storage backends, e.g., for cloud storage.
Returns: - DataFrame: A Dask DataFrame containing the loaded CSV data.
- wazeasy.utils.load_data_parquet(path, storage_options, filter_level_5=True, usecols=None)#
Load parquet data from a specified path for a given year and month.
Parameters: - path (str): The main directory path where parquet files are stored. - year (int): The year of the data to load. - month (int): The month of the data to load. - storage_options (dict): Options for storage backends, e.g., for cloud storage.
Returns: - DataFrame: A Dask DataFrame containing the loaded parquet data.
- wazeasy.utils.mean_daily_tci_geog(df, agg_spatial, agg_column, layer, start_date=None, end_date=None, dow=None)#
Averages the Traffic Congestion Intensity Index (TCI) for each geography daily, for a period of time - if defined.
Parameters: - df (DataFrame): The Dask/Pandas DataFrame containing traffic jam data. - start_date (str, optional): The start date (YYYY-MM-DD) of the period to consider.
If None, it will use the minimum date in the data.
- end_date (str, optional): The end date (YYYY-MM-DD) of the period to consider.
If None, it will use the maximum date in the data.
dow (list, optional): Days of the week to consider (0 = Monday, 6 = Sunday)
agg_column (str): The column to aggregate for the TCI, generally length of jam.
Returns: - DataFrame: A DataFrame with the mean TCI for each geography.
- wazeasy.utils.monthly_hourly_tci(df, agg_column, start_date=None, end_date=None, dow=None)#
Calculate the monthly Traffic Congestion Intensity (TCI) Index, hourly distributed, for a time period.
Parameters: - ddf (DataFrame): The Dask DataFrame containing traffic jam data. - agg_column (str): The column to aggregate in the TCI, normally length. - start_date (str, optional): The start date (YYYY-MM-DD) of the period to consider.
If None, it will use the minimum date in the data.
- end_date (str, optional): The end date (YYYY-MM-DD) of the period to consider.
If None, it will use the maximum date in the data.
dow (list, optional): Days of the week to consider (0 = Monday, 6 = Sunday).
Returns: - Series: A Series with the monthly TCI for each hour, month and year.
- wazeasy.utils.obtain_hexagons_for_area(area, resolution)#
Create a georeferenced layer of H3 hexagons for a given Area of Operation.
Parameters: - area (Polygon): The area of operation as a h3 LatLngPolygon. - resolution (int): The resolution of the H3 hexagons.
Returns: - GeoDataFrame: A GeoDataFrame with H3 hexagons.
- wazeasy.utils.obtain_unique_jams_linestrings(ddf, start_node=False)#
Otain unique jam’s geometries.
Parameters: - ddf (DataFrame): The Dask DataFrame containing traffic jam data. - start_node (bool, optional): Whether to use the starting point of the jam as the geometry.
Returns: - GeoDataFrame: A GeoDataFrame with unique jam linestrings.
- wazeasy.utils.overlay_group(group, gdf_area)#
Perform an overlay between layers for delayed processes.
Parameters: - group (GeoDataFrame): A GeoDataFrame group to overlay. - gdf_area (GeoDataFrame): A GeoDataFrame with polygons.
Returns: - GeoDataFrame: The result of the overlay operation.
- wazeasy.utils.parallelized_overlay(ddf, gdf_area)#
Parallelize overlay operation by partition groups over some geometry.
Parameters: - ddf (DataFrame): The Dask DataFrame containing traffic jam data. - aggregation_geog (GeoDataFrame): The geographical areas for aggregation.
Returns: - GeoDataFrame: The result of the parallelized overlay operation.
- wazeasy.utils.parallelized_sjoin(ddf, gdf_area)#
Parallelize sjoin operation by partition groups over some geometry.
Parameters: - ddf (DataFrame): The Dask DataFrame containing traffic jam data. - aggregation_geog (GeoDataFrame): The geographical areas for aggregation.
Returns: - GeoDataFrame: The result of the parallelized sjoin operation.
- wazeasy.utils.process_geowkt_partition(partition)#
Process a partition using vectorized pandas operations
- wazeasy.utils.remove_level5(ddf)#
Remove traffic jams with level 5 from the DataFrame as these jams are associated to road closures.
Parameters: - ddf (DataFrame): The Dask DataFrame containing traffic jam data.
Returns: - DataFrame: A DataFrame excluding level 5 jams.
- wazeasy.utils.sjoin_group(group, gdf_area)#
Perform a sjoin between layers for delayed processes.
Parameters: - group (GeoDataFrame): A GeoDataFrame group to overlay. - gdf_area (GeoDataFrame): A GeoDataFrame with polygons.
Returns: - GeoDataFrame: The result of the overlay operation.
- wazeasy.utils.split_jams_into_geometries(ddf, gdf_area, projected_crs)#
Split jams into geometries (polygons). Notice that this is a heavy process. It is useful when dealing with small geographies.
Parameters: - ddf (DataFrame): The Dask DataFrame containing traffic jam data - the original data. - gdf_area (GeoDataFrame): A GeoDataFrame with polygons. - projected_crs (str): The projected coordinate reference system to be used.
Returns: - DataFrame: A Dask DataFrame with jams distributed over the aggregation geometry.
Notice that this DataFrame will have more rows than the original one due to the overlay process.
- wazeasy.utils.tci_temporal_spatial(df, agg_temporal, agg_spatial, agg_column, start_date=None, end_date=None, dow=None)#
Calculate the Traffic Congestion Index (TCI) by period and geography.
Parameters: - df (DataFrame): The DataFrame (Dask or Pandas) containing traffic jam data. - agg_temporal (list): Name of columns used for temporal aggregation. - agg_spatial (str): Name of column used for spatial aggregation. - start_date (str, optional): The start date (YYYY-MM-DD) of the period to consider.
If None, it will use the minimum date in the data.
- end_date (str, optional): The end date (YYYY-MM-DD) of the period to consider.
If None, it will use the maximum date in the data.
dow (list, optional): Days of the week to consider (0 = Monday, 6 = Sunday)
agg_column (str): The column to aggregate.
Returns: - DataFrame: A DataFrame with the TCI calculated.
- wazeasy.utils.time_attributes(df)#
Calculate year, month, date, and hour for each jam record.
Parameters: - df (DataFrame): The DataFrame containing the data.
Returns: - None: Modifies the DataFrame in place.