wazeasy package#

Submodules#

wazeasy.plots module#

wazeasy.reports module#

wazeasy.utils module#

wazeasy.utils.assign_geography_to_jams(df, geog_info=None)#

Assign a geography to each traffic jam. The geography is given based on the starting point of the jam. Do not use this function for detailed geographies. In that case, refer to:

Parameters: - df (DataFrame): The Dask or Pandas DataFrame containing traffic jam data. - geog_info (dict): A dictionary containing geographical information for assignment. The key is the name of the geography, and the value is the georreferenced data with the geographic subdivisions.

Returns: - None: Modifies the DataFrame in place.

wazeasy.utils.create_gdf(ddf, start_node=False)#

Create a Dask-Geopandas GeoDataFrame from a Dask DataFrame.

Parameters: - ddf (DataFrame): The Dask DataFrame containing geographical data.

Returns: - GeoDataFrame: A GeoDataFrame with the geometry column set.

wazeasy.utils.define_dates_of_interest(df, start_date=None, end_date=None, dow=None)#
wazeasy.utils.distribute_jam_over_aggregation_geom(gddf, ddf, projected_crs)#

Distribute the jam over the aggregation geometry.

Parameters: - gddf (GeoDataFrame): The GeoDataFrame with unique jams and geometry. - ddf (DataFrame): The Dask DataFrame containing traffic jam data - the original data. - projected_crs (str): The projected coordinate reference system yo be used.

Returns: - DataFrame: A Dask DataFrame with jams distributed over the aggregation geometry.

Notice that this DataFrame will have more rows than the original one due to the overlay process.

wazeasy.utils.filter_date_range_by_dow(date_range, dow)#

Filter a date range by days of the week.

Parameters: - date_range (DatetimeIndex): The range of dates to filter. - dow (list): Days of the week to consider (0 = Monday, 6 = Sunday).

Returns: - list: A list of dates that match the specified days of the week.

wazeasy.utils.handle_time(df, utc_region)#

Handle time column to ensure it is in the correct UTC and calculate the following time-related attributes: - year: Year of the record (numeric). - month: Month of the record (numeric, 1–12). - date: Calendar date (YYYY-MM-DD). - hour: Hour of the day in 24-hour format. - local_time: Timestamp converted to the specified UTC region.

Parameters: - df (DataFrame): The DataFrame containing the data. - utc_region (str): The UTC region to convert the time to.

Returns: - None: Modifies the DataFrame in place.

wazeasy.utils.hourly_tci_by_geography(df, agg_spatial, agg_column, start_date=None, end_date=None, dow=None)#

Calculate the hourly average Traffic Congestion Intensity (TCI) Index, for a time period. Parameters: - ddf (DataFrame): The Dask/Pandas DataFrame containing traffic jam data. - agg_spatial (str): Name of column used for spatial aggregation. - agg_column (str): The column to aggregate in the TCI, normally length. - start_date (str, optional): The start date (YYYY-MM-DD) of the period to consider.

If None, it will use the minimum date in the data.

  • end_date (str, optional): The end date (YYYY-MM-DD) of the period to consider.

    If None, it will use the maximum date in the data.

  • dow (list, optional): Days of the week to consider (0 = Monday, 6 = Sunday).

Returns: - Series: A Series with the average TCI for each hour and geography.

wazeasy.utils.is_dask_dataframe(df)#

Check if DataFrame is a Dask DataFrame.

wazeasy.utils.load_data(path, storage_options=None, file_type='csv', filter_level_5=True, usecols=None)#

Load data from a specified path for a given year and month.

Parameters: - path (str or list of string): The main directory path where data files are stored. It can also be a list of files to read - year (int): The year of the data to load. - month (int): The month of the data to load. - storage_options (dict, optional): Options for storage backends, e.g., for cloud storage. - file_type (str, optional): The type of file to load (‘csv’ or ‘parquet’). Defaults to ‘csv’.

Returns: - DataFrame: A Dask DataFrame containing the loaded data.

wazeasy.utils.load_data_csv(path, storage_options=None, filter_level_5=True, usecols=None)#

Load CSV data from a specified path for a given year and month.

Parameters: - path (str): The main directory path where CSV files are stored. - year (int): The year of the data to load. - month (int): The month of the data to load. - storage_options (dict, optional): Options for storage backends, e.g., for cloud storage.

Returns: - DataFrame: A Dask DataFrame containing the loaded CSV data.

wazeasy.utils.load_data_parquet(path, storage_options, filter_level_5=True, usecols=None)#

Load parquet data from a specified path for a given year and month.

Parameters: - path (str): The main directory path where parquet files are stored. - year (int): The year of the data to load. - month (int): The month of the data to load. - storage_options (dict): Options for storage backends, e.g., for cloud storage.

Returns: - DataFrame: A Dask DataFrame containing the loaded parquet data.

wazeasy.utils.mean_daily_tci_geog(df, agg_spatial, agg_column, layer, start_date=None, end_date=None, dow=None)#

Averages the Traffic Congestion Intensity Index (TCI) for each geography daily, for a period of time - if defined.

Parameters: - df (DataFrame): The Dask/Pandas DataFrame containing traffic jam data. - start_date (str, optional): The start date (YYYY-MM-DD) of the period to consider.

If None, it will use the minimum date in the data.

  • end_date (str, optional): The end date (YYYY-MM-DD) of the period to consider.

    If None, it will use the maximum date in the data.

  • dow (list, optional): Days of the week to consider (0 = Monday, 6 = Sunday)

  • agg_column (str): The column to aggregate for the TCI, generally length of jam.

Returns: - DataFrame: A DataFrame with the mean TCI for each geography.

wazeasy.utils.monthly_hourly_tci(df, agg_column, start_date=None, end_date=None, dow=None)#

Calculate the monthly Traffic Congestion Intensity (TCI) Index, hourly distributed, for a time period.

Parameters: - ddf (DataFrame): The Dask DataFrame containing traffic jam data. - agg_column (str): The column to aggregate in the TCI, normally length. - start_date (str, optional): The start date (YYYY-MM-DD) of the period to consider.

If None, it will use the minimum date in the data.

  • end_date (str, optional): The end date (YYYY-MM-DD) of the period to consider.

    If None, it will use the maximum date in the data.

  • dow (list, optional): Days of the week to consider (0 = Monday, 6 = Sunday).

Returns: - Series: A Series with the monthly TCI for each hour, month and year.

wazeasy.utils.obtain_hexagons_for_area(area, resolution)#

Create a georeferenced layer of H3 hexagons for a given Area of Operation.

Parameters: - area (Polygon): The area of operation as a h3 LatLngPolygon. - resolution (int): The resolution of the H3 hexagons.

Returns: - GeoDataFrame: A GeoDataFrame with H3 hexagons.

wazeasy.utils.obtain_unique_jams_linestrings(ddf, start_node=False)#

Otain unique jam’s geometries.

Parameters: - ddf (DataFrame): The Dask DataFrame containing traffic jam data. - start_node (bool, optional): Whether to use the starting point of the jam as the geometry.

Returns: - GeoDataFrame: A GeoDataFrame with unique jam linestrings.

wazeasy.utils.overlay_group(group, gdf_area)#

Perform an overlay between layers for delayed processes.

Parameters: - group (GeoDataFrame): A GeoDataFrame group to overlay. - gdf_area (GeoDataFrame): A GeoDataFrame with polygons.

Returns: - GeoDataFrame: The result of the overlay operation.

wazeasy.utils.parallelized_overlay(ddf, gdf_area)#

Parallelize overlay operation by partition groups over some geometry.

Parameters: - ddf (DataFrame): The Dask DataFrame containing traffic jam data. - aggregation_geog (GeoDataFrame): The geographical areas for aggregation.

Returns: - GeoDataFrame: The result of the parallelized overlay operation.

wazeasy.utils.parallelized_sjoin(ddf, gdf_area)#

Parallelize sjoin operation by partition groups over some geometry.

Parameters: - ddf (DataFrame): The Dask DataFrame containing traffic jam data. - aggregation_geog (GeoDataFrame): The geographical areas for aggregation.

Returns: - GeoDataFrame: The result of the parallelized sjoin operation.

wazeasy.utils.process_geowkt_partition(partition)#

Process a partition using vectorized pandas operations

wazeasy.utils.remove_level5(ddf)#

Remove traffic jams with level 5 from the DataFrame as these jams are associated to road closures.

Parameters: - ddf (DataFrame): The Dask DataFrame containing traffic jam data.

Returns: - DataFrame: A DataFrame excluding level 5 jams.

wazeasy.utils.sjoin_group(group, gdf_area)#

Perform a sjoin between layers for delayed processes.

Parameters: - group (GeoDataFrame): A GeoDataFrame group to overlay. - gdf_area (GeoDataFrame): A GeoDataFrame with polygons.

Returns: - GeoDataFrame: The result of the overlay operation.

wazeasy.utils.split_jams_into_geometries(ddf, gdf_area, projected_crs)#

Split jams into geometries (polygons). Notice that this is a heavy process. It is useful when dealing with small geographies.

Parameters: - ddf (DataFrame): The Dask DataFrame containing traffic jam data - the original data. - gdf_area (GeoDataFrame): A GeoDataFrame with polygons. - projected_crs (str): The projected coordinate reference system to be used.

Returns: - DataFrame: A Dask DataFrame with jams distributed over the aggregation geometry.

Notice that this DataFrame will have more rows than the original one due to the overlay process.

wazeasy.utils.tci_temporal_spatial(df, agg_temporal, agg_spatial, agg_column, start_date=None, end_date=None, dow=None)#

Calculate the Traffic Congestion Index (TCI) by period and geography.

Parameters: - df (DataFrame): The DataFrame (Dask or Pandas) containing traffic jam data. - agg_temporal (list): Name of columns used for temporal aggregation. - agg_spatial (str): Name of column used for spatial aggregation. - start_date (str, optional): The start date (YYYY-MM-DD) of the period to consider.

If None, it will use the minimum date in the data.

  • end_date (str, optional): The end date (YYYY-MM-DD) of the period to consider.

    If None, it will use the maximum date in the data.

  • dow (list, optional): Days of the week to consider (0 = Monday, 6 = Sunday)

  • agg_column (str): The column to aggregate.

Returns: - DataFrame: A DataFrame with the TCI calculated.

wazeasy.utils.time_attributes(df)#

Calculate year, month, date, and hour for each jam record.

Parameters: - df (DataFrame): The DataFrame containing the data.

Returns: - None: Modifies the DataFrame in place.

Module contents#