wazeasy package#
Submodules#
wazeasy.plots module#
- wazeasy.plots.hourly_tci_by_geog(ddf, geogs, geog, year, month, agg_column, dow, group_name, projected_crs, save_fig=False)#
Plot the hourly Traffic Congestion Index (TCI)(monthly averaged) by geography.
Parameters: - ddf (DataFrame): A Dask DataFrame containing traffic data. - geogs: GeoDataFrame with geographies where to calculate the TCI. - geog (str): The geographic column to group by. - combination_year_month (list of tuples): List of (year, month) pairs to plot. - dow (list): Days of the week to include (e.g. [0, 1, 2, 3, 4] for weekdays). - group_name (str): Label used in the plot title. - save_fig (bool): Unused currently. Reserved for future implementation.
Returns: - None: Displays the interactive Plotly figure.
- wazeasy.plots.hourly_tci_by_month(ddf, geog, combination_year_month, dow, group_name, save_fig=False)#
Plot the hourly Traffic Congestion Index (TCI) for selected months.
Parameters: - ddf (DataFrame): A Dask DataFrame containing traffic data. - geog (str): The geographic column to group by. - combination_year_month (list of tuples): List of (year, month) pairs to plot. - dow (list): Days of the week to include (e.g. [0, 1, 2, 3, 4] for weekdays). - group_name (str): Label used in the plot title (e.g. region name). - save_fig (bool): Unused currently. Reserved for future implementation.
Returns: - None: Displays the interactive Plotly figure.
- wazeasy.plots.jams_monthly_aggregated(data, save_fig=False)#
Plot the number of unique traffic jams aggregated by month.
Parameters: - data (DataFrame): A Dask DataFrame containing ‘year’, ‘month’, and ‘uuid’ columns. - save_fig (bool): If True, saves the plot as a PNG file. Default is False.
Returns: - None: Displays the plot and optionally saves it to a file.
- wazeasy.plots.jams_per_day(data, save_fig=False)#
Plot the number of unique traffic jams per day.
Parameters: - data (DataFrame): A Dask DataFrame containing ‘date’ and ‘uuid’ columns. - save_fig (bool): If True, saves the plot as a PNG file. Default is False.
Returns: - None: Displays the plot and optionally saves it to a file.
- wazeasy.plots.jams_per_day_per_level(data, save_fig=False)#
Plot the number of unique traffic jams per day, grouped by congestion level.
Parameters: - data (DataFrame): A Dask DataFrame containing ‘date’, ‘level’, and ‘uuid’ columns. - save_fig (bool): If True, saves the plot as a PNG file. Default is False.
Returns: - None: Displays the plot and optionally saves it to a file.
- wazeasy.plots.regional_tci_per_day(data, save_fig=False)#
Plot the daily regional Traffic Congestion Index (TCI), aggregated at the area of operation level.
Parameters: - data (DataFrame): A Dask or Pandas DataFrame containing ‘date’, ‘region’, and ‘length’ columns. - save_fig (bool): If True, saves the plot as a PNG file. Default is False.
Returns: - None: Displays the plot and optionally saves it to a file.
wazeasy.utils module#
- wazeasy.utils.assign_geography_to_jams(ddf)#
Assign a geography to each traffic jam.
Parameters: - ddf (DataFrame): The Dask DataFrame containing traffic jam data.
Returns: - None: Modifies the DataFrame in place.
- wazeasy.utils.classify_jam_by_region(ddf, geogs, year, month, projected_crs, dow=None)#
It is important to filter the dataset as much as it can be filtered before the spatial operation
- wazeasy.utils.create_gdf(ddf)#
Create a Dask-Geopandas GeoDataFrame from a Dask DataFrame.
Parameters: - ddf (DataFrame): The Dask DataFrame containing geographical data.
Returns: - GeoDataFrame: A GeoDataFrame with the geometry column set.
- wazeasy.utils.distribute_jams_over_aggregation_geom(gddf, ddf, projected_crs)#
Distribute jams over aggregation geometry.
Parameters: - gddf (GeoDataFrame): The GeoDataFrame with jams and geometry. - ddf (DataFrame): The Dask DataFrame containing traffic jam data. - projected_crs (str): The coordinate reference system for projection.
Returns: - DataFrame: A DataFrame with jams distributed over the aggregation geometry.
- wazeasy.utils.filter_date_range_by_dow(date_range, dow)#
Filter a date range by days of the week.
Parameters: - date_range (DatetimeIndex): The range of dates to filter. - dow (list): Days of the week to consider (0 = Monday, 6 = Sunday).
Returns: - list: A list of dates that match the specified days of the week.
- wazeasy.utils.get_jam_count_per_segment(df)#
Count how many jams occured in one segment
- wazeasy.utils.get_summary_statistics_city(ddf, year, working_days)#
- wazeasy.utils.get_summary_statistics_street(df, street_names, year, working_days)#
- wazeasy.utils.handle_time(df, utc_region, parquet=False)#
Handle time column to ensure it is in the correct UTC and calculate the following time-related attributes: - year: Year of the record (numeric). - month: Month of the record (numeric, 1–12). - date: Calendar date (YYYY-MM-DD). - hour: Hour of the day in 24-hour format. - local_time: Timestamp converted to the specified UTC region.
Parameters: - df (DataFrame): The DataFrame containing the data. - utc_region (str): The UTC region to convert the time to. - parquet (bool, optional): Indicates if the data is in parquet format. Defaults to False.
Returns: - None: Modifies the DataFrame in place.
- wazeasy.utils.harmonize_data(table)#
- wazeasy.utils.line_to_segments(x)#
Break linestrings into individual segments
- wazeasy.utils.load_data(main_path, year, month, storage_options=None, file_type='csv')#
Load data from a specified path for a given year and month.
Parameters: - main_path (str): The main directory path where data files are stored. - year (int): The year of the data to load. - month (int): The month of the data to load. - storage_options (dict, optional): Options for storage backends, e.g., for cloud storage. - file_type (str, optional): The type of file to load (‘csv’ or ‘parquet’). Defaults to ‘csv’.
Returns: - DataFrame: A Dask DataFrame containing the loaded data.
- wazeasy.utils.load_data_csv(main_path, year, month, storage_options=None)#
Load CSV data from a specified path for a given year and month.
Parameters: - main_path (str): The main directory path where CSV files are stored. - year (int): The year of the data to load. - month (int): The month of the data to load. - storage_options (dict, optional): Options for storage backends, e.g., for cloud storage.
Returns: - DataFrame: A Dask DataFrame containing the loaded CSV data.
- wazeasy.utils.load_data_parquet(main_path, year, month, storage_options)#
Load parquet data from a specified path for a given year and month.
Parameters: - main_path (str): The main directory path where parquet files are stored. - year (int): The year of the data to load. - month (int): The month of the data to load. - storage_options (dict): Options for storage backends, e.g., for cloud storage.
Returns: - DataFrame: A Dask DataFrame containing the loaded parquet data.
- wazeasy.utils.mean_hourly_tci(ddf, period, geog, agg_column, dates_of_interest)#
Calculate the mean Traffic Congestion Index (TCI)’s hourly distribution considering only the dates of interest.
Parameters: - ddf (DataFrame): The Dask DataFrame containing traffic jam data. - period (list): The period over which to aggregate data. - geog (list): The geographical areas to consider. - agg_column (str): The column to aggregate. - dates_of_interest (list): Dates to consider for the calculation.
Returns: - Series: A Series with the mean TCI for each hour.
- wazeasy.utils.mean_tci_geog(ddf, period, geog_id, dates, geogs, agg_column, projected_crs)#
Average the Traffic Congestion Index (TCI) for each geography across a period of time.
Parameters: - ddf (DataFrame): The Dask DataFrame containing traffic jam data. - period (list): The period over which to aggregate data. - geog_id (str): The geographical identifier. - dates (list): Dates to consider for the calculation. - geogs (GeoDataFrame): Geographical areas to consider. - agg_column (str): The column to aggregate. - projected_crs (str): The coordinate reference system for projection.
Returns: - DataFrame: A DataFrame with the mean TCI for each geography.
- wazeasy.utils.monthly_hourly_tci(ddf, geog, period, year, month, agg_column, dow=None)#
Calculate the monthly Traffic Congestion Index (TCI) hourly distributed.
Parameters: - ddf (DataFrame): The Dask DataFrame containing traffic jam data. - geog (list): The geographical areas to consider. - period (list): The period over which to aggregate data. - year (int): The year of the data. - month (int): The month of the data. - agg_column (str): The column to aggregate. - dow (list, optional): Days of the week to consider (0 = Monday, 6 = Sunday).
Returns: - Series: A Series with the monthly TCI for each hour.
- wazeasy.utils.obtain_hexagons_for_area(area, resolution)#
Create a georeferenced layer of H3 hexagons for a given Area of Operation.
Parameters: - area (Polygon): The area of operation as a Shapely Polygon. - resolution (int): The resolution of the H3 hexagons.
Returns: - GeoDataFrame: A GeoDataFrame with H3 hexagons.
- wazeasy.utils.obtain_unique_jams_linestrings(ddf)#
Get unique jam linestrings to avoid overlaying the same linestring multiple times.
Parameters: - ddf (DataFrame): The Dask DataFrame containing traffic jam data.
Returns: - GeoDataFrame: A GeoDataFrame with unique jam linestrings.
- wazeasy.utils.overlay_group(group, hexagons)#
Perform an overlay between layers for delayed processes.
Parameters: - group (GeoDataFrame): A GeoDataFrame group to overlay. - hexagons (GeoDataFrame): A GeoDataFrame of hexagons to overlay with.
Returns: - GeoDataFrame: The result of the overlay operation.
- wazeasy.utils.parallelized_overlay(ddf, aggregation_geog)#
Parallelize overlay by groups over some geometry.
Parameters: - ddf (DataFrame): The Dask DataFrame containing traffic jam data. - aggregation_geog (GeoDataFrame): The geographical areas for aggregation.
Returns: - GeoDataFrame: The result of the parallelized overlay operation.
- wazeasy.utils.remove_last_comma(name)#
- wazeasy.utils.remove_level5(ddf)#
Remove traffic jams with level 5 from the DataFrame as these jams are associated to road closures.
Parameters: - ddf (DataFrame): The Dask DataFrame containing traffic jam data.
Returns: - DataFrame: A DataFrame excluding level 5 jams.
- wazeasy.utils.tci_by_period_geography(ddf, period, geography, agg_column, dow=None, custom_dates=None)#
Calculate the Traffic Congestion Index (TCI) by period and geography.
Parameters: - ddf (DataFrame): The Dask DataFrame containing traffic jam data. - period (list): The period over which to aggregate data. - geography (list): The geographical areas to consider. - agg_column (str): The column to aggregate. - dow (list, optional): Days of the week to consider (0 = Monday, 6 = Sunday). If provided, filtering by this parameter is applied first. - custom_dates (list, optional): Specific dates to consider. If provided, filtering by this parameter is applied after filtering by dow (if dow is provided).
Returns: - DataFrame: A DataFrame with the TCI calculated.
- wazeasy.utils.time_attributes(df)#
Calculate year, month, date, and hour for each jam record.
Parameters: - df (DataFrame): The DataFrame containing the data.
Returns: - None: Modifies the DataFrame in place.