scatcluster.analysis.crosstab

Cross-Tab analysis module.

Classes

SSNCrossTabAnalysis

ScatCluster Add-on to run cross-analysis of different clusterings

Module Contents

class scatcluster.analysis.crosstab.SSNCrossTabAnalysis(data_savepath: str = '/home/jovyan/shared/users/zerafa/data/sds.chris/scatcluster-sds/', data_network: str = 'ET', data_station: str = 'SOE0', data_location: str = '', network_sampling_rate_banks_pooling: str = '50_4_4_2_7_1_1_avg', ica_number: int = 10)[source]

ScatCluster Add-on to run cross-analysis of different clusterings

data_savepath[source]
data_network[source]
data_station[source]
data_location[source]
network_sampling_rate_banks_pooling[source]
ica_number[source]
_get_predictions(win_size, num_clusters)[source]

Retrieves predictions from a file based on provided window size and number of clusters.

Parameters:
  • self – SSNCrossTabAnalysis object

  • win_size (int) – Size of the window

  • num_clusters (int) – Number of clusters

Returns:

Predictions from the file

Return type:

numpy array

_build_crosstab_data(cluster_1=3600, num_clusters_1=10, cluster_2=60, num_clusters_2=10, normalization=None)[source]

Builds a crosstab data table based on the given parameters.

Parameters:
  • cluster_1 (int) – The size of the first window cluster. Defaults to 3600.

  • num_clusters_1 (int) – The number of clusters for the first window cluster. Defaults to 10.

  • cluster_2 (int) – The size of the second window cluster. Defaults to 60.

  • num_clusters_2 (int) – The number of clusters for the second window cluster. Defaults to 10.

  • normalization (str) – The normalization option. Defaults to None.

Returns:

The crosstab data table. factor_difference (int): The factor difference between the two window clusters. fowlkes_mallows_score (float): The Fowlkes-Mallows score.

Return type:

ct_data (pandas DataFrame)

plot_clustering_crosstab(cluster_1=3600, num_clusters_1=10, cluster_2=60, num_clusters_2=10, normalization=None, **kwargs)[source]

Generates a clustering crosstab plot based on the provided parameters.

Parameters:
  • cluster_1 (int, optional) – The size of the first window cluster. Defaults to 3600.

  • num_clusters_1 (int, optional) – The number of clusters in the first window cluster. Defaults to 10.

  • cluster_2 (int, optional) – The size of the second window cluster. Defaults to 60.

  • num_clusters_2 (int, optional) – The number of clusters in the second window cluster. Defaults to 10.

  • normalization (str, optional) – The type of normalization to apply to the crosstab data. Can be ‘None’, ‘all’, ‘index’, or ‘columns’. Defaults to None.

  • **kwargs – Additional keyword arguments to be passed to the plt.subplots function.

Raises:

ValueError – If the provided normalization is not one of the valid options.

Note

This function generates a heatmap plot using the sns.heatmap function from the seaborn library. The heatmap shows the crosstab data between the two window clusters. The x-axis label indicates the size and number of clusters in the second window cluster, while the y-axis label indicates the size and number of clusters in the first window cluster. The title of the plot includes the Fowlkes-Mallows similarity score between the two clusterings. The plot is saved as a PNG file with a filename based on the provided parameters.

_custom_dendrogram_crosstab(linkage: numpy.array, ax: matplotlib.axes.Axes, depth: int = 30, orientation='left', factor=1)[source]

Generate a custom dendrogram for a crosstab plot.

Parameters:
  • linkage (np.array) – The linkage array generated by the hierarchical clustering algorithm.

  • ax (Axes) – The matplotlib Axes object to plot the dendrogram on.

  • depth (int, optional) – The depth of the dendrogram to display. Defaults to 30.

  • orientation (str, optional) – The orientation of the dendrogram. Can be ‘left’ or ‘bottom’. Defaults to ‘left’.

  • factor (int, optional) – A factor to multiply the population labels. Defaults to 1.

Description:
This function generates a custom dendrogram for a crosstab plot using the linkage array generated by the

hierarchical clustering algorithm. The dendrogram is plotted on the provided matplotlib Axes object. The depth of the dendrogram, the orientation, and the population labels can be customized.

Notes

  • The dendrogram is generated using the hierarchy.dendrogram function from the scipy.cluster.hierarchy

    module.

  • The coordinates and population labels of the leaf nodes are extracted from the dendrogram information.

  • The leaf nodes are plotted on the Axes object using a scatter plot.

  • The population labels are formatted and displayed next to the leaf nodes.

_get_linkage(win_size)[source]

Retrieves the linkage matrix from a file based on the provided window size.

Parameters:

win_size (int) – The size of the window.

Returns:

The linkage matrix.

Return type:

numpy.ndarray

plot_crosstab_dendrograms(cluster_1=3600, num_clusters_1=10, cluster_2=60, num_clusters_2=10, normalization=None, **kwargs)[source]

Plot crosstab dendrograms to compare clusterings predictions.

Parameters:
  • cluster_1 (int, optional) – The size of the first window. Defaults to 3600.

  • num_clusters_1 (int, optional) – The number of clusters for the first window. Defaults to 10.

  • cluster_2 (int, optional) – The size of the second window. Defaults to 60.

  • num_clusters_2 (int, optional) – The number of clusters for the second window. Defaults to 10.

  • normalization (str, optional) – The type of normalization to apply. Defaults to None.

  • **kwargs – Additional keyword arguments to be passed to plt.subplots().

This function plots crosstab dendrograms to compare clusterings predictions. It first builds the crosstab data using the _build_crosstab_data() method. Then it creates a subplot with two rows and two columns using

plt.subplots().

The crosstab data is plotted using sns.heatmap() with the specified parameters. The x-axis label, y-axis label, and title of the subplot are set accordingly. The dendrogram for the first window is plotted using the _custom_dendrogram_crosstab() method. The dendrogram for the second window is plotted using the same method