wazeasy package#

Submodules#

wazeasy.plots module#

wazeasy.plots.hourly_tci_by_geog(ddf, geogs, geog, year, month, agg_column, dow, group_name, projected_crs, save_fig=False)#

Plot the hourly Traffic Congestion Index (TCI)(monthly averaged) by geography.

Parameters: - ddf (DataFrame): A Dask DataFrame containing traffic data. - geogs: GeoDataFrame with geographies where to calculate the TCI. - geog (str): The geographic column to group by. - combination_year_month (list of tuples): List of (year, month) pairs to plot. - dow (list): Days of the week to include (e.g. [0, 1, 2, 3, 4] for weekdays). - group_name (str): Label used in the plot title. - save_fig (bool): Unused currently. Reserved for future implementation.

Returns: - None: Displays the interactive Plotly figure.

wazeasy.plots.hourly_tci_by_month(ddf, geog, combination_year_month, dow, group_name, save_fig=False)#

Plot the hourly Traffic Congestion Index (TCI) for selected months.

Parameters: - ddf (DataFrame): A Dask DataFrame containing traffic data. - geog (str): The geographic column to group by. - combination_year_month (list of tuples): List of (year, month) pairs to plot. - dow (list): Days of the week to include (e.g. [0, 1, 2, 3, 4] for weekdays). - group_name (str): Label used in the plot title (e.g. region name). - save_fig (bool): Unused currently. Reserved for future implementation.

Returns: - None: Displays the interactive Plotly figure.

wazeasy.plots.jams_monthly_aggregated(data, save_fig=False)#

Plot the number of unique traffic jams aggregated by month.

Parameters: - data (DataFrame): A Dask DataFrame containing ‘year’, ‘month’, and ‘uuid’ columns. - save_fig (bool): If True, saves the plot as a PNG file. Default is False.

Returns: - None: Displays the plot and optionally saves it to a file.

wazeasy.plots.jams_per_day(data, save_fig=False)#

Plot the number of unique traffic jams per day.

Parameters: - data (DataFrame): A Dask DataFrame containing ‘date’ and ‘uuid’ columns. - save_fig (bool): If True, saves the plot as a PNG file. Default is False.

Returns: - None: Displays the plot and optionally saves it to a file.

wazeasy.plots.jams_per_day_per_level(data, save_fig=False)#

Plot the number of unique traffic jams per day, grouped by congestion level.

Parameters: - data (DataFrame): A Dask DataFrame containing ‘date’, ‘level’, and ‘uuid’ columns. - save_fig (bool): If True, saves the plot as a PNG file. Default is False.

Returns: - None: Displays the plot and optionally saves it to a file.

wazeasy.plots.regional_tci_per_day(data, save_fig=False)#

Plot the daily regional Traffic Congestion Index (TCI), aggregated at the area of operation level.

Parameters: - data (DataFrame): A Dask or Pandas DataFrame containing ‘date’, ‘region’, and ‘length’ columns. - save_fig (bool): If True, saves the plot as a PNG file. Default is False.

Returns: - None: Displays the plot and optionally saves it to a file.

wazeasy.utils module#

wazeasy.utils.assign_geography_to_jams(ddf)#

Assign a geography to each traffic jam.

Parameters: - ddf (DataFrame): The Dask DataFrame containing traffic jam data.

Returns: - None: Modifies the DataFrame in place.

wazeasy.utils.classify_jam_by_region(ddf, geogs, year, month, projected_crs, dow=None)#

It is important to filter the dataset as much as it can be filtered before the spatial operation

wazeasy.utils.create_gdf(ddf)#

Create a Dask-Geopandas GeoDataFrame from a Dask DataFrame.

Parameters: - ddf (DataFrame): The Dask DataFrame containing geographical data.

Returns: - GeoDataFrame: A GeoDataFrame with the geometry column set.

wazeasy.utils.distribute_jams_over_aggregation_geom(gddf, ddf, projected_crs)#

Distribute jams over aggregation geometry.

Parameters: - gddf (GeoDataFrame): The GeoDataFrame with jams and geometry. - ddf (DataFrame): The Dask DataFrame containing traffic jam data. - projected_crs (str): The coordinate reference system for projection.

Returns: - DataFrame: A DataFrame with jams distributed over the aggregation geometry.

wazeasy.utils.filter_date_range_by_dow(date_range, dow)#

Filter a date range by days of the week.

Parameters: - date_range (DatetimeIndex): The range of dates to filter. - dow (list): Days of the week to consider (0 = Monday, 6 = Sunday).

Returns: - list: A list of dates that match the specified days of the week.

wazeasy.utils.get_jam_count_per_segment(df)#

Count how many jams occured in one segment

wazeasy.utils.get_summary_statistics_city(ddf, year, working_days)#
wazeasy.utils.get_summary_statistics_street(df, street_names, year, working_days)#
wazeasy.utils.handle_time(df, utc_region, parquet=False)#

Handle time column to ensure it is in the correct UTC and calculate the following time-related attributes: - year: Year of the record (numeric). - month: Month of the record (numeric, 1–12). - date: Calendar date (YYYY-MM-DD). - hour: Hour of the day in 24-hour format. - local_time: Timestamp converted to the specified UTC region.

Parameters: - df (DataFrame): The DataFrame containing the data. - utc_region (str): The UTC region to convert the time to. - parquet (bool, optional): Indicates if the data is in parquet format. Defaults to False.

Returns: - None: Modifies the DataFrame in place.

wazeasy.utils.harmonize_data(table)#
wazeasy.utils.line_to_segments(x)#

Break linestrings into individual segments

wazeasy.utils.load_data(main_path, year, month, storage_options=None, file_type='csv')#

Load data from a specified path for a given year and month.

Parameters: - main_path (str): The main directory path where data files are stored. - year (int): The year of the data to load. - month (int): The month of the data to load. - storage_options (dict, optional): Options for storage backends, e.g., for cloud storage. - file_type (str, optional): The type of file to load (‘csv’ or ‘parquet’). Defaults to ‘csv’.

Returns: - DataFrame: A Dask DataFrame containing the loaded data.

wazeasy.utils.load_data_csv(main_path, year, month, storage_options=None)#

Load CSV data from a specified path for a given year and month.

Parameters: - main_path (str): The main directory path where CSV files are stored. - year (int): The year of the data to load. - month (int): The month of the data to load. - storage_options (dict, optional): Options for storage backends, e.g., for cloud storage.

Returns: - DataFrame: A Dask DataFrame containing the loaded CSV data.

wazeasy.utils.load_data_parquet(main_path, year, month, storage_options)#

Load parquet data from a specified path for a given year and month.

Parameters: - main_path (str): The main directory path where parquet files are stored. - year (int): The year of the data to load. - month (int): The month of the data to load. - storage_options (dict): Options for storage backends, e.g., for cloud storage.

Returns: - DataFrame: A Dask DataFrame containing the loaded parquet data.

wazeasy.utils.mean_hourly_tci(ddf, period, geog, agg_column, dates_of_interest)#

Calculate the mean Traffic Congestion Index (TCI)’s hourly distribution considering only the dates of interest.

Parameters: - ddf (DataFrame): The Dask DataFrame containing traffic jam data. - period (list): The period over which to aggregate data. - geog (list): The geographical areas to consider. - agg_column (str): The column to aggregate. - dates_of_interest (list): Dates to consider for the calculation.

Returns: - Series: A Series with the mean TCI for each hour.

wazeasy.utils.mean_tci_geog(ddf, period, geog_id, dates, geogs, agg_column, projected_crs)#

Average the Traffic Congestion Index (TCI) for each geography across a period of time.

Parameters: - ddf (DataFrame): The Dask DataFrame containing traffic jam data. - period (list): The period over which to aggregate data. - geog_id (str): The geographical identifier. - dates (list): Dates to consider for the calculation. - geogs (GeoDataFrame): Geographical areas to consider. - agg_column (str): The column to aggregate. - projected_crs (str): The coordinate reference system for projection.

Returns: - DataFrame: A DataFrame with the mean TCI for each geography.

wazeasy.utils.monthly_hourly_tci(ddf, geog, period, year, month, agg_column, dow=None)#

Calculate the monthly Traffic Congestion Index (TCI) hourly distributed.

Parameters: - ddf (DataFrame): The Dask DataFrame containing traffic jam data. - geog (list): The geographical areas to consider. - period (list): The period over which to aggregate data. - year (int): The year of the data. - month (int): The month of the data. - agg_column (str): The column to aggregate. - dow (list, optional): Days of the week to consider (0 = Monday, 6 = Sunday).

Returns: - Series: A Series with the monthly TCI for each hour.

wazeasy.utils.obtain_hexagons_for_area(area, resolution)#

Create a georeferenced layer of H3 hexagons for a given Area of Operation.

Parameters: - area (Polygon): The area of operation as a Shapely Polygon. - resolution (int): The resolution of the H3 hexagons.

Returns: - GeoDataFrame: A GeoDataFrame with H3 hexagons.

wazeasy.utils.obtain_unique_jams_linestrings(ddf)#

Get unique jam linestrings to avoid overlaying the same linestring multiple times.

Parameters: - ddf (DataFrame): The Dask DataFrame containing traffic jam data.

Returns: - GeoDataFrame: A GeoDataFrame with unique jam linestrings.

wazeasy.utils.overlay_group(group, hexagons)#

Perform an overlay between layers for delayed processes.

Parameters: - group (GeoDataFrame): A GeoDataFrame group to overlay. - hexagons (GeoDataFrame): A GeoDataFrame of hexagons to overlay with.

Returns: - GeoDataFrame: The result of the overlay operation.

wazeasy.utils.parallelized_overlay(ddf, aggregation_geog)#

Parallelize overlay by groups over some geometry.

Parameters: - ddf (DataFrame): The Dask DataFrame containing traffic jam data. - aggregation_geog (GeoDataFrame): The geographical areas for aggregation.

Returns: - GeoDataFrame: The result of the parallelized overlay operation.

wazeasy.utils.remove_last_comma(name)#
wazeasy.utils.remove_level5(ddf)#

Remove traffic jams with level 5 from the DataFrame as these jams are associated to road closures.

Parameters: - ddf (DataFrame): The Dask DataFrame containing traffic jam data.

Returns: - DataFrame: A DataFrame excluding level 5 jams.

wazeasy.utils.tci_by_period_geography(ddf, period, geography, agg_column, dow=None, custom_dates=None)#

Calculate the Traffic Congestion Index (TCI) by period and geography.

Parameters: - ddf (DataFrame): The Dask DataFrame containing traffic jam data. - period (list): The period over which to aggregate data. - geography (list): The geographical areas to consider. - agg_column (str): The column to aggregate. - dow (list, optional): Days of the week to consider (0 = Monday, 6 = Sunday). If provided, filtering by this parameter is applied first. - custom_dates (list, optional): Specific dates to consider. If provided, filtering by this parameter is applied after filtering by dow (if dow is provided).

Returns: - DataFrame: A DataFrame with the TCI calculated.

wazeasy.utils.time_attributes(df)#

Calculate year, month, date, and hour for each jam record.

Parameters: - df (DataFrame): The DataFrame containing the data.

Returns: - None: Modifies the DataFrame in place.

Module contents#