scatcluster.analysis.external_correlation

External Correlation analysis module.

Classes

ExternalCorrelation

Module Contents

class scatcluster.analysis.external_correlation.ExternalCorrelation[source]
plot_external_correlation(df_predictions: pandas.DataFrame, df_external: pandas.DataFrame, metric_list: List[str], title: str = None, **kwargs)[source]

Plot external correlation between predictions and external data for each metric in the list.

Parameters:
  • df_predictions (pd.DataFrame) – DataFrame containing predictions.

  • df_external (pd.DataFrame) – DataFrame containing external data.

  • metric_list (List[str]) – List of metrics to plot.

  • title (str, optional) – Title of the plot. Defaults to None.

  • **kwargs – Additional keyword arguments for plt.subplots.

merge_same_time_duration(df_predictions: pandas.DataFrame, df_external: pandas.DataFrame)[source]

Merge two dataframes based on the same time duration.

Parameters:
  • df_predictions (pd.DataFrame) – The dataframe containing the predictions.

  • df_external (pd.DataFrame) – The dataframe containing the external data.

Returns:

The merged dataframe with the same time duration.

Return type:

pd.DataFrame

This function takes two dataframes, df_predictions and df_external, and merges them based on the same time duration. It first determines the start and end dates of the common time duration by finding the maximum and minimum dates in both dataframes. Then, it truncates the dataframes to the same time duration by selecting only the rows within the common time duration. Finally, it merges the truncated dataframes on the ‘dates’ column, using a left join to include all rows from df_predictions_same. The merged dataframe is returned.

interpolate_missing_values(df_merge: pandas.DataFrame, target_cols)[source]

Interpolate missing values in the specified columns of a merged dataframe.

Parameters:
  • df_merge (pd.DataFrame) – The merged dataframe containing the target columns.

  • target_cols (List[str], optional) – The list of columns to interpolate missing values for. Defaults to [‘outTemp’, ‘outHumidity’, ‘barometer’, ‘windSpeed’, ‘windDir’, ‘windGust’, ‘windGustDir’, ‘rain’, ‘rainRate’, ‘dewpoint’].

Returns:

The merged dataframe with interpolated missing values in the specified columns.

Return type:

pd.DataFrame

detection_rate(df_merge: pandas.DataFrame, rolling_window_size: int = 48, plot_detection: bool = True, title: str = None)[source]

Calculate the detection rate of clusters against weather data.

Parameters:
  • df_merge (pd.DataFrame) – The merged dataframe containing the clusters and weather data.

  • rolling_window_size (int, optional) – The size of the rolling window for smoothing the data. Defaults to 48.

  • plot_detection (bool, optional) – Whether to plot the detection rate. Defaults to True.

  • title (str, optional) – The title of the plot. Defaults to None.

Returns:

The dataframe containing the detection rate for each cluster.

Return type:

pd.DataFrame

external_data_smoothing_scaling(df_merge: pandas.DataFrame, target_cols: List[str], rolling_window_size: int = 48, plot_external_data_smoothing: bool = True, title: str = None)[source]

Applies smoothing and scaling to external data.

Parameters:
  • df_merge (pd.DataFrame) – The merged dataframe containing the external data.

  • target_cols (List[str]) – The columns of the external data to be smoothed and scaled.

  • rolling_window_size (int, optional) – The size of the rolling window for smoothing. Defaults to 48.

  • plot_external_data_smoothing (bool, optional) – Whether to plot the smoothed and scaled data. Defaults to True.

  • title (str, optional) – The title for the plot. Defaults to None.

Returns:

The smoothed and scaled external data.

Return type:

pd.DataFrame

external_correlation(df_clusters_smoothed, df_weather_smoothed_scaled, target_cols)[source]

Calculates the external correlation between the smoothed and scaled clusters and the weather data.

Parameters:
  • df_clusters_smoothed (pd.DataFrame) – The smoothed-clusters data.

  • df_weather_smoothed_scaled (pd.DataFrame) – The smoothed and scaled weather data.

  • target_cols (list) – The list of target columns to calculate the correlation for.

Returns:

A tuple containing two dictionaries. The first dictionary contains the scores of each target column,

and the second dictionary contains the regression models for each target column.

Return type:

tuple

min_max_vector(vector)[source]

Normalizes a vector by scaling its values between 0 and 1.

Parameters:

vector (numpy.ndarray) – The vector to be normalized.

Returns:

The normalized vector.

Return type:

numpy.ndarray

plot_external_data_predicted_actual(df_clusters_smoothed, df_weather_smoothed_scaled, REGRESSION, target_cols)[source]

Plot the actual and predicted weather data for each target column.

Parameters:
  • df_clusters_smoothed (DataFrame) – The smoothed-clusters data.

  • df_weather_smoothed_scaled (DataFrame) – The smoothed and scaled weather data.

  • REGRESSION (dict) – A dictionary containing the regression models for each target column.

  • target_cols (list) – A list of target columns.

plot_detection_actual_prediction(target_cols, df_clusters_smoothed, df_weather_smoothed_scaled, REGRESSION)[source]

Plot the actual and predicted detection rate data for each target column.

Parameters:
  • target_cols (list) – A list of target columns.

  • df_clusters_smoothed (DataFrame) – The smoothed-clusters data.

  • df_weather_smoothed_scaled (DataFrame) – The smoothed and scaled weather data.

  • REGRESSION (dict) – A dictionary containing the regression models for each target column.

plot_regression_coefficients_contributions(target_cols, REGRESSION)[source]

Plots the regression coefficients contributions for the specified target columns.

Parameters:
  • self (-) – the instance of the class

  • target_cols (-) – a list of target columns

  • REGRESSION (-) – the regression model

plot_regression_coefficients(df_clusters_smoothed, df_weather_smoothed_scaled, REGRESSION, SCORE, target_cols)[source]

Plots the regression coefficients for the specified target columns.

Parameters:
  • df_clusters_smoothed (DataFrame) – The smoothed-clusters data.

  • df_weather_smoothed_scaled (DataFrame) – The smoothed and scaled weather data.

  • REGRESSION (dict) – A dictionary containing the regression models for each target column.

  • SCORE (dict) – A dictionary containing the scores for each regression model.

  • target_cols (list) – A list of target columns.