NNSOM package#

Submodules#

NNSOM.som module#

class NNSOM.som.SOM(dimensions)#

Bases: object

A class to represent a Self-Organizing Map (SOM), a type of artificial neural network trained using unsupervised learning to produce a two-dimensional, discretized representation of the input space of the training samples.

dimensions#

The dimensions of the SOM grid. Determines the layout and number of neurons in the map.

Type:

tuple, list, or array-like

numNeurons#

The total number of neurons in the SOM, calculated as the product of the dimensions.

Type:

int

pos#

The positions of the neurons in the SOM grid.

Type:

array-like

neuron_dist#

The distances between neurons in the SOM.

Type:

array-like

w#

The weight matrix of the SOM, representing the feature vectors of the neurons.

Type:

array-like

sim_flag#

A flag indicating whether the SOM has been simulated or not.

Type:

bool

__init__(self, dimensions):

Initializes the SOM with the specified dimensions.

init_w(self, x):

Initializes the weights of the SOM using principal components analysis on the input data x.

sim_som(self, x):

Simulates the SOM with x as the input, determining which neurons are activated by the input vectors.

train(self, x, init_neighborhood=3, epochs=200, steps=100):

Trains the SOM using the batch SOM algorithm on the input data x.

quantization_error(self, dist)#

Calculate quantization error

topological_error(self, data)#

Calculate 1st and 1st-2nd toplogical error

distortion_error(self, data)#

Calculate distortion error

save_pickle(self, filename, path, data_format='pkl'):

Saves the SOM object to a file using the pickle format.

load_pickle(self, filename, path, data_format='pkl'):

Loads a SOM object from a file using the pickle format.

cluster_data(x)#

Cluster the input data based on the trained SOM reference vectors.

Parameters:

x (ndarray (normalized)) – The input data to be clustered.

Returns:

  • clusters (list of lists) – A list containing sub-lists, where each sublist represents a cluster. The indices of the input data points belonging to the same cluster are stored in the corresponding sublist, sorted by their proximity to the cluster center.

  • cluster_distances (list of lists) – A list containing sub-lists, where each sublist represents the distances of the input data points to the corresponding cluster center, sorted in the same order as the indices in the clusters list.

  • max_cluster_distances (ndarray) – A list containing the maximum distance between each cluster center and the data points belonging to that cluster.

  • cluster_sizes (ndarray) – A list containing the number of data points in each cluster.

Raises:
  • ValueError – If the SOM has not been trained.

  • ValueError – If the number of features in the input data and the SOM weights do not match.

distortion_error(x)#

Calculate distortion

init_w(x, norm_func=None)#

Initializes the weights of the SOM using principal components analysis (PCA) on the input data x.

Parameters:

x (np.ndarray) – The input data used for weight initialization.

load_pickle(filename, path, data_format='pkl')#

Load the SOM object from a file using pickle.

Parameters:
  • filename (str) – The name of the file to load the SOM object from.

  • path (str) – The path to the file to load the SOM object from.

  • data_format (str) – The format to load the SOM object from. Must be one of: pkl

Return type:

None

normalize(x, norm_func=None)#

Normalize the input data using a custom function.

Parameters:
  • x (array-like) – The input data to be normalized.

  • norm_func (callable, optional) – A custom normalization or standardization function to be applied to the input data. If provided, it should take the input data as its argument and return the preprocessed data. Default is None, in which case the input data is returned as-is.

Returns:

x_preprocessed – The preprocessed input data.

Return type:

array-like

Raises:

Warning – If norm_func is None, a warning is raised to indicate the potential inefficiency in SOM training.

Examples

>>> import numpy as np
>>> from sklearn.datasets import load_iris
>>> from sklearn.feature_extraction.text import TfidfVectorizer
>>> from sklearn.preprocessing import StandardScaler
>>> # Case 1: Tabular data (without normalization)
>>> iris = load_iris()
>>> X = iris.data
>>> som = SOM(dimensions=(5, 5))
>>> X_norm = som.normalize(X)
>>> print(np.allclose(np.transpose(X_norm), X))
True
>>> # Case 2: Image data (using custom normalization)
>>> image_data = np.random.randint(0, 256, size=(28, 28))
>>> som = SOM(dimensions=(10, 10))
>>> custom_norm_func = lambda x: x / 255  # Custom normalization function
>>> image_data_norm = som.normalize(image_data, norm_func=custom_norm_func)
>>> print(image_data_norm.min(), image_data_norm.max())
0.0 1.0
>>> # Case 3: Text data (without normalization)
>>> text_data = ["This is a sample text.", "Another example sentence."]
>>> vectorizer = TfidfVectorizer()
>>> tfidf_matrix = vectorizer.fit_transform(text_data)
>>> som = SOM(dimensions=(8, 8))
>>> text_data_norm = som.normalize(tfidf_matrix.toarray())
>>> print(np.allclose(np.transpose(text_data_norm), tfidf_matrix.toarray()))
True
quantization_error(dist)#

Calculate quantization error

save_pickle(filename, path, data_format='pkl')#

Save the SOM object to a file using pickle.

Parameters:
  • filename (str) – The name of the file to save the SOM object to.

  • path (str) – The path to the file to save the SOM object to.

  • data_format (str) – The format to save the SOM object in. Must be one of: pkl

Return type:

None

sim_som(x)#

Simulates the SOM with x as the input, determining which neurons are activated by the input vectors.

Parameters:

x (np.ndarray) – The input data to simulate the SOM with.

Returns:

The simulated output of the SOM.

Return type:

np.ndarray

topological_error(x)#

Calculate topological error

train(x, init_neighborhood=3, epochs=200, steps=100, norm_func=None)#

Trains the SOM using the batch SOM algorithm on the input data x.

Parameters:
  • x (np.ndarray) – The input data to train the SOM with.

  • init_neighborhood (int, optional) – The initial neighborhood size.

  • epochs (int, optional) – The number of epochs to train for.

  • steps (int, optional) – The number of steps for training.

Return type:

None

NNSOM.som_gpu module#

class NNSOM.som_gpu.SOMGpu(dimensions)#

Bases: object

Represents a Self-Organizing Map (SOM) using GPU acceleration with CuPy.

A Self-Organizing Map (SOM) is an artificial neural network used for unsupervised learning, which projects high-dimensional data into a lower-dimensional (typically two-dimensional) space. It is trained using a competitive learning approach to produce a discretized representation of the input space of training samples.

dimensions#

Dimensions of the SOM grid, defining the layout and number of neurons.

Type:

tuple, list, np.ndarray

numNeurons#

Total number of neurons, computed as the product of the grid dimensions.

Type:

int

pos#

Positions of neurons within the grid.

Type:

np.ndarray

neuron_dist#

Precomputed Euclidean distances between neurons in the grid.

Type:

np.ndarray

w#

Weight matrix representing the feature vectors of the neurons.

Type:

np.ndarray

sim_flag#

Indicates if the SOM has been simulated/trained.

Type:

bool

output#

Output from the latest simulation.

Type:

np.ndarray

norm_func#

Function used to normalize input data.

Type:

callable

sub_som#

Optional sub-clustering using additional SOMs at neuron positions.

Type:

dict

__init__(self, dimensions)#

Initializes the SOM with the specified dimensions.

init_w(self, x, norm_func=None)#

Initializes the weights using PCA on input data x.

sim_som(self, x)#

Simulates SOM processing for input x, identifying activated neurons.

train(self, x, init_neighborhood=3, epochs=200, steps=100, norm_func=None)#

Trains the SOM using batch SOM algorithm on input data x.

quantization_error(self, dist)#

Calculates the quantization error of the model.

topological_error(self, data)#

Calculates the topological error of the model.

distortion_error(self, data)#

Calculates the distortion error of the model.

save_pickle(self, filename, path, data_format='pkl')#

Saves the SOM object to a file in pickle format.

load_pickle(self, filename, path, data_format='pkl')#

Loads the SOM object from a file in pickle format.

_normalize_position(self, position)#

Helper method to normalize neuron positions.

_spread_positions(self, position, positionMean, positionBasis)#

Helper method to adjust neuron positions.

_euclidean_distance(self, XA, XB)#

Computes Euclidean distances between two sets of vectors.

_to_categorical(self, x, num_classes=None)#

Converts class vector to binary class matrix.

Raises:

ImportError – If CuPy is not available, suggests using the NNSOM package for a NumPy-based implementation.

Example

>>> dimensions = (10, 10)
>>> som = SOMGpu(dimensions)
>>> data = np.random.rand(100, 10)
>>> som.init_w(data, norm_func=None)
>>> som.train(data, norm_func=None)
>>> output = som.sim_som(data)
cluster_data(x)#

Cluster the input data based on the trained SOM reference vectors.

Parameters:

x (ndarray (normalized)) – The input data to be clustered.

Returns:

  • clusters (list of lists) – A list containing sub-lists, where each sublist represents a cluster. The indices of the input data points belonging to the same cluster are stored in the corresponding sublist, sorted by their proximity to the cluster center.

  • cluster_distances (list of lists) – A list containing sub-lists, where each sublist represents the distances of the input data points to the corresponding cluster center, sorted in the same order as the indices in the clusters list.

  • max_cluster_distances (ndarray) – A list containing the maximum distance between each cluster center and the data points belonging to that cluster.

  • cluster_sizes (ndarray) – A list containing the number of data points in each cluster.

Raises:
  • ValueError – If the SOM has not been trained.

  • ValueError – If the number of features in the input data and the SOM weights do not match.

distortion_error(x)#

Calculate distortion

init_w(x, norm_func=None)#

Initializes the weights of the SOM using principal components analysis (PCA) on the input data x.

Parameters:

x (np.ndarray) – The input data used for weight initialization.

load_pickle(filename, path, data_format='pkl')#

Load the SOM object from a file using pickle.

Parameters:
  • filename (str) – The name of the file to load the SOM object from.

  • path (str) – The path to the file to load the SOM object from.

  • data_format (str) – The format to load the SOM object from. Must be one of: pkl

Return type:

None

normalize(x, norm_func=None)#

Normalize the input data using a custom function.

Parameters:
  • x (array-like) – The input data to be normalized.

  • norm_func (callable, optional) – A custom normalization or standardization function to be applied to the input data. If provided, it should take the input data as its argument and return the preprocessed data. Default is None, in which case the input data is returned as-is.

Returns:

x_preprocessed – The preprocessed input data.

Return type:

array-like

Raises:

Warning – If norm_func is None, a warning is raised to indicate the potential inefficiency in SOM training.

Examples

>>> import numpy as np
>>> from sklearn.datasets import load_iris
>>> from sklearn.feature_extraction.text import TfidfVectorizer
>>> from sklearn.preprocessing import StandardScaler
>>> # Case 1: Tabular data (without normalization)
>>> iris = load_iris()
>>> X = iris.data
>>> som = SOM(dimensions=(5, 5))
>>> X_norm = som.normalize(X)
>>> print(np.allclose(np.transpose(X_norm), X))
True
>>> # Case 2: Image data (using custom normalization)
>>> image_data = np.random.randint(0, 256, size=(28, 28))
>>> som = SOM(dimensions=(10, 10))
>>> custom_norm_func = lambda x: x / 255  # Custom normalization function
>>> image_data_norm = som.normalize(image_data, norm_func=custom_norm_func)
>>> print(image_data_norm.min(), image_data_norm.max())
0.0 1.0
>>> # Case 3: Text data (without normalization)
>>> text_data = ["This is a sample text.", "Another example sentence."]
>>> vectorizer = TfidfVectorizer()
>>> tfidf_matrix = vectorizer.fit_transform(text_data)
>>> som = SOM(dimensions=(8, 8))
>>> text_data_norm = som.normalize(tfidf_matrix.toarray())
>>> print(np.allclose(np.transpose(text_data_norm), tfidf_matrix.toarray()))
True
quantization_error(dist)#

Calculate quantization error

save_pickle(filename, path, data_format='pkl')#

Save the SOM object to a file using pickle.

Parameters:
  • filename (str) – The name of the file to save the SOM object to.

  • path (str) – The path to the file to save the SOM object to.

  • data_format (str) – The format to save the SOM object in. Must be one of: pkl

Return type:

None

sim_som(x)#

Simulates the SOM with x as the input, determining which neurons are activated by the input vectors.

Parameters:

x (np.ndarray) – The input data to simulate the SOM with.

Returns:

The simulated output of the SOM.

Return type:

np.ndarray

topological_error(x)#

Calculate topological error

train(x, init_neighborhood=3, epochs=200, steps=100, norm_func=None)#

Trains the SOM using the batch SOM algorithm on the input data x.

Parameters:
  • x (np.ndarray) – The input data to train the SOM with.

  • init_neighborhood (int, optional) – The initial neighborhood size.

  • epochs (int, optional) – The number of epochs to train for.

  • steps (int, optional) – The number of steps for training.

Return type:

None

NNSOM.plots module#

class NNSOM.plots.SOMPlots(dimensions)#

Bases: SOM

A subclass of either SOM or SOMGpu (based on the availability of CuPy), designed to provide visualization and interactive plotting capabilities for Self-Organizing Maps (SOMs). This class is intended to enrich the analysis of SOMs by offering a variety of advanced visualization techniques to explore the trained SOM topology, distribution of data points, and various statistics derived from the SOM’s learning process.

dimensions#

A Gird of the SOM topology.

The class includes methods for plotting the topology of the SOM, generating hit histograms,
displaying cluster information, and more. These methods support interactive features through
mouse clicks, allowing users to engage with the visualizations dynamically. Each method
can also handle additional parameters for customization and handles various plotting styles
like hexagonal units, numbered neurons, color gradients, and complex cluster histograms.
The class also provides a generic plot method to handle different types of SOM visualizations
and an event handling method to manage user interactions during the plotting sessions.

This class supports interactivity and offers multiple visualization methods to deeply understand and analyze the behaviors and results of SOMs. It’s especially useful for gaining insights into the topology, data distribution, and classification performance of SOMs.

button_click_event(button_type, ax, neuron_ind, **kwargs)#
cmplx_hit_hist(x, clust, perc, ind_missClass, ind21, ind12, mouse_click=False, **kwargs)#

Generates a complex hit histogram for the SOM, incorporating information about cluster quality, misclassifications, and false positives/negatives.

Parameters:
  • x (array-like) – The input data to be visualized in the histogram.

  • clust (list or array-like) – A list or array containing the cluster assignments for each data point.

  • perc (array-like) – An array containing the percentage values for each cluster, representing the proportion of good binders.

  • ind_missClass (array-like) – An array containing the indices of misclassified data points.

  • ind21 (array-like) – An array containing the indices of false positive data points (classified as good binders but are actually bad binders).

  • ind12 (array-like) – An array containing the indices of false negative data points (classified as bad binders but are actually good binders).

  • mouse_click (bool, optional) – If True, enables the plot to respond to mouse clicks, allowing for interactive functionality such as querying or modifying neuron data, by default False.

  • **kwargs (dict) – Arbitrary keyword arguments that can be passed to the event handler onpick when an interactive element is clicked. Common parameters could include data specific to the plot or visualization settings.

Returns:

  • fig (matplotlib.figure.Figure) – The Figure object containing the plot.

  • ax (matplotlib.axes.Axes) – The Axes object containing the plot elements.

  • patches (list) – A list of matplotlib.patches.Patch objects representing the inner hexagons colored and styled based on cluster quality and misclassifications.

  • text (list) – A list of matplotlib.text.Text objects displaying the hit counts within each hexagon.

color_hist(x, avg, mouse_click=False, **kwargs)#

Generates a colored histogram for the SOM, where the color of each hexagon represents the corresponding value from the provided average array.

Parameters:
  • x (array-like) – The input data to be visualized in the histogram.

  • avg (array-like) – An array containing the average values to be represented by the color map, where higher values correspond to warmer colors (e.g., red) and lower values correspond to cooler colors (e.g., blue).

  • mouse_click (bool, optional) – If True, enables the plot to respond to mouse clicks, allowing for interactive functionality such as querying or modifying neuron data, by default False.

  • **kwargs (dict) – Arbitrary keyword arguments that can be passed to the event handler onpick when an interactive element is clicked. Common parameters could include data specific to the plot or visualization settings.

Returns:

  • fig (matplotlib.figure.Figure) – The Figure object containing the plot.

  • ax (matplotlib.axes.Axes) – The Axes object containing the plot elements.

  • patches (list) – A list of matplotlib.patches.Patch objects representing the inner hexagons colored based on the average values.

  • text (list, optional) – A list of matplotlib.text.Text objects displaying the hit counts, included if textFlag is True in the underlying hit_hist method.

  • cbar (matplotlib.colorbar.Colorbar) – The Colorbar object attached to the plot, representing the color mapping.

component_planes(X)#

Visualizes the weight distribution across different features in a Self-Organizing Map (SOM) using a series of 2D plots.

This method creates a grid of subplots where each subplot represents the weight distribution for a specific feature of the input data. The weight of each neuron for the given feature is represented in the plot by the color of a hexagonal cell, with darker colors indicating higher weights. This visualization helps in understanding the importance and distribution of each feature across the map.

Parameters:

X (array-like) – A 2D array of input data where each row represents a feature and each column represents a sample.

component_positions(x)#

Visualizes the positions of the components in a Self-Organizing Map (SOM) along with the input vectors.

This method plots the trained SOM weight vectors as gray dots and the input vectors as green dots on a 2D plot. It also connects neighboring SOM neurons with red lines to represent the grid structure, illustrating the organization and clustering within the map.

Parameters:

x (array-like) – A 2D array or sequence of vectors, typically representing input data or test data that has been projected onto the SOM.

create_click_handler(button_type, ax, neuron_ind, **kwargs)#
custom_cmplx_hit_hist(x, face_labels, edge_labels, edge_width, mouse_click=False, **kwargs)#

Generates a custom complex hit histogram for the SOM, allowing for flexible customization of the hexagon face colors, edge colors, and edge widths.

Parameters:
  • x (array-like) – The input data to be visualized in the histogram.

  • face_labels (array-like) – An array containing the labels or values to be represented by the face colors of the hexagons.

  • edge_labels (array-like) – An array containing the labels or values to be represented by the edge colors of the hexagons.

  • edge_width (array-like) – An array containing the values for the edge widths of the hexagons.

  • mouse_click (bool, optional) – If True, enables the plot to respond to mouse clicks, allowing for interactive functionality such as querying or modifying neuron data, by default False.

  • **kwargs (dict) – Arbitrary keyword arguments that can be passed to the event handler onpick when an interactive element is clicked. Common parameters could include data specific to the plot or visualization settings.

Returns:

  • fig (matplotlib.figure.Figure) – The Figure object containing the plot.

  • ax (matplotlib.axes.Axes) – The Axes object containing the plot elements.

  • patches (list) – A list of matplotlib.patches.Patch objects representing the inner hexagons colored and styled based on the provided face labels, edge labels, and edge widths.

  • text (list) – A list of matplotlib.text.Text objects displaying the hit counts within each hexagon.

Raises:

ValueError – If the input data or label arrays have incorrect dimensions or lengths.

determine_button_types(**kwargs)#
gray_hist(x, perc, mouse_click=False, **kwargs)#

Generates a grayscale histogram for the SOM, where the shade of each hexagon represents the corresponding value from the provided percentage array.

Parameters:
  • x (array-like) – The input data to be visualized in the histogram.

  • perc (array-like) – An array containing the percentage values to be represented by the grayscale shades, where higher values correspond to darker shades.

  • mouse_click (bool, optional) – If True, enables the plot to respond to mouse clicks, allowing for interactive functionality such as querying or modifying neuron data, by default False.

  • **kwargs (dict) – Arbitrary keyword arguments that can be passed to the event handler onpick when an interactive element is clicked. Common parameters could include data specific to the plot or visualization settings.

Returns:

  • fig (matplotlib.figure.Figure) – The Figure object containing the plot.

  • ax (matplotlib.axes.Axes) – The Axes object containing the plot elements.

  • patches (list) – A list of matplotlib.patches.Patch objects representing the inner hexagons colored based on the grayscale values.

  • text (list, optional) – A list of matplotlib.text.Text objects displaying the hit counts, included if textFlag is True in the underlying hit_hist method.

hit_hist(x, textFlag=True, mouse_click=False, connect_pick_event=True, **kwargs)#

Generates a hit histogram for the SOM, which displays the frequency of data points assigned to each neuron. Each neuron is represented as a hexagon, and the size of each hexagon is proportional to the number of hits. Optionally, the actual number of hits can be displayed within each hexagon.

Parameters:
  • x (array-like) – The input data to be visualized in the histogram.

  • textFlag (bool, optional) – If True, displays the count of hits within each hexagon, by default True.

  • mouse_click (bool, optional) – If True, enables the plot to respond to mouse clicks, allowing for interactive functionality such as querying or modifying neuron data, by default False.

  • connect_pick_event (bool, optional) – If True, connects a pick event that triggers when a neuron is clicked, by default True.

  • **kwargs (dict) – Arbitrary keyword arguments that can be passed to the event handler onpick when an interactive element is clicked. Common parameters could include data specific to the plot or visualization settings.

Returns:

  • fig (matplotlib.figure.Figure) – The Figure object containing the plot.

  • ax (matplotlib.axes.Axes) – The Axes object containing the plot elements.

  • patches (list) – A list of matplotlib.patches.Patch objects representing the inner hexagons colored based on hit counts.

  • text (list, optional) – A list of matplotlib.text.Text objects displaying the hit counts, included if textFlag is True.

neuron_dist_plot(mouse_click=False, connect_pick_event=True, **kwargs)#

Generates a neuron distance plot that visualizes the distances between neighboring neurons in the SOM grid.

Parameters:
  • mouse_click (bool, optional) – If True, enables the plot to respond to mouse clicks, allowing for interactive functionality such as querying or modifying neuron data, by default False.

  • connect_pick_event (bool, optional) – If True, connects a pick event that triggers when a neuron is clicked, by default True.

  • **kwargs (dict) – Arbitrary keyword arguments that can be passed to the event handler onpick when an interactive element is clicked. Common parameters could include data specific to the plot or visualization settings.

Returns:

  • fig (matplotlib.figure.Figure) – The Figure object containing the plot.

  • ax (matplotlib.axes.Axes) – The Axes object containing the plot elements.

  • patches (list) – A list of matplotlib.patches.Patch objects representing the edges between connected neurons, colored based on the distance between their respective neuron weights.

onpick(event, hexagons, hexagon_to_neuron, **kwargs)#

Interactive Plot Function

Parameters:
  • event (event) – a mouse click event

  • hexagons (list) – a list of hexagons

  • hexagon_to_neuron (dict) – a dictionary mapping hexagons to neurons

  • **kwargs – a dictionary with input data

  • Returns – None

plot(plot_type, data_dict=None, ind=None, target_class=None, use_add_array=False, **kwargs)#

Generic Plot Function. It generates a plot based on the plot type and data provides.

Parameters:
  • plot_type (str) – The type of plot to be generated: [“top”, “top_num”, “hit_hist”, “gray_hist”, “color_hist”, “complex_hist”, “nc”, “neuron_dist”, “simple_grid”, “stem”, “pie”, “wgts”, “pie”, “hist”, “box”, “violin”, “scatter”, “component_positions”, “component_planes”]

  • data_dict (dict (optional)) – A dictionary containing the data to be plotted. The key is prefixed with the data type and the value is the data itself. {“data”, “target”, “clust”, “add_1d_array”, “add_2d_array”}

  • ind (int, str or array-like (optional)) – The indices of the data to be plotted.

  • target_class (int (optional)) – The target class to be plotted.

  • use_add_array (bool (optional)) – If true, the additional array to be used.

  • **kwargs (dict) – Additional arguments to be passed to the interactive plot function.

plot_box(ax, data, neuronNum)#

Generates a box plot for a specific neuron’s data on the provided axes.

Parameters:
  • ax (matplotlib.axes.Axes) – The axes object on which the box plot will be drawn.

  • data (array-like) – The data array for which the box plot is to be generated.

  • neuronNum (int) – The neuron index that the data is associated with.

plot_hist(ax, data, neuronNum)#

Helper function to plot histogram in the interactive plots :param ax: :param data: :param neuronNum:

Returns:

plot_pie(ax, data, neuronNum)#

Plots a pie chart on the specified matplotlib axes object.

Parameters:
  • ax (matplotlib.axes.Axes) – The matplotlib axes object where the pie chart will be plotted.

  • data (array-like) – An array of numeric data which represents the portions of the pie chart.

  • neuronNum (int) – The neuron number associated with the data, which is used to title the pie chart.

plot_scatter(ax, num1, num2, neuronNum)#

Plots a scatter plot for a specific neuron’s data on the provided axes.

Parameters:
  • ax (matplotlib.axes.Axes) – The axes object on which the scatter plot will be drawn.

  • num1 (array-like) – The x coordinates of the data points.

  • num2 (array-like) – The y coordinates of the data points.

  • neuronNum (int) – The index of the neuron for which the plot is being generated.

plot_stem(ax, align, height, neuronNum)#

Plots a stem plot for a specific neuron’s data on the given axes

Parameters:
  • ax (matplotlib.axes.Axes) – The matplotlib axes object where the stem plot will be drawn.

  • align (array-like) – The x positions of the stems.

  • height (array-like) – The y values for each stem, indexed by neuron number.

  • neuronNum (int) – The index of the neuron for which the plot is being generated.

plot_violin(ax, data, neuronNum)#

Displays a violin plot for a specific neuron’s data on the provided axes.

Parameters:
  • ax (matplotlib.axes.Axes) – The axes object where the violin plot will be drawn.

  • data (array-like) – The data to be used for the violin plot.

  • neuronNum (int) – The index of the neuron associated with the data.

plt_boxplot(x, mouse_click=False, connect_pick_event=True, **kwargs)#

Generates a boxplot visualization for the SOM, displaying the statistical summary of the data distribution within each neuron’s cluster.

Parameters:
  • x (array-like) – A 2D array or sequence of vectors, where each row represents the data points assigned to a single neuron.

  • mouse_click (bool, optional) – If True, enables the plot to respond to mouse clicks, allowing for interactive functionality such as querying or modifying neuron data, by default False.

  • connect_pick_event (bool, optional) – If True, connects a pick event that triggers when a neuron is clicked, by default True.

  • **kwargs (dict) – Arbitrary keyword arguments that can be passed to the event handler onpick when an interactive element is clicked. Common parameters could include data specific to the plot or visualization settings.

Returns:

  • fig (matplotlib.figure.Figure) – The Figure object containing the plot.

  • ax (matplotlib.axes.Axes) – The Axes object containing the plot elements.

  • h_axes (list) – A list of matplotlib.axes.Axes objects, each containing a boxplot for a single neuron’s data distribution.

plt_histogram(x, mouse_click=False, connect_pick_event=True, **kwargs)#

Generates a histogram visualization for the SOM, displaying the data distribution within each neuron’s cluster.

Parameters:
  • x (array-like) – A 2D array or sequence of vectors, where each row represents the data points assigned to a single neuron.

  • mouse_click (bool, optional) – If True, enables the plot to respond to mouse clicks, allowing for interactive functionality such as querying or modifying neuron data, by default False.

  • connect_pick_event (bool, optional) – If True, connects a pick event that triggers when a neuron is clicked, by default True.

  • **kwargs (dict) – Arbitrary keyword arguments that can be passed to the event handler onpick when an interactive element is clicked. Common parameters could include data specific to the plot or visualization settings.

Returns:

  • fig (matplotlib.figure.Figure) – The Figure object containing the plot.

  • ax (matplotlib.axes.Axes) – The Axes object containing the plot elements.

  • h_axes (list) – A list of matplotlib.axes.Axes objects, each containing a histogram for a single neuron’s data distribution.

plt_nc(mouse_click=False, connect_pick_event=True, **kwargs)#

Generates a Neighborhood Connection Map for the SOM, displaying the connections between neighboring neurons.

Parameters:
  • mouse_click (bool, optional) – If True, enables the plot to respond to mouse clicks, allowing for interactive functionality such as querying or modifying neuron data, by default False.

  • connect_pick_event (bool, optional) – If True, connects a pick event that triggers when a neuron is clicked, by default True.

  • **kwargs (dict) – Arbitrary keyword arguments that can be passed to the event handler onpick when an interactive element is clicked. Common parameters could include data specific to the plot or visualization settings.

Returns:

  • fig (matplotlib.figure.Figure) – The Figure object containing the plot.

  • ax (matplotlib.axes.Axes) – The Axes object containing the plot elements.

  • patches (list) – A list of matplotlib.patches.Patch objects representing the edges between connected neurons.

plt_pie(x, s=None, mouse_click=False, connect_pick_event=True, **kwargs)#

Generates a pie chart visualization for the SOM, displaying the composition of each neuron’s data or cluster.

Parameters:
  • x (array-like) – A 2D array or sequence of vectors, where each row represents the composition or category values for a single neuron.

  • s (array-like, optional) – An array containing the percentage values to be used for scaling the pie chart sizes, by default None.

  • mouse_click (bool, optional) – If True, enables the plot to respond to mouse clicks, allowing for interactive functionality such as querying or modifying neuron data, by default False.

  • connect_pick_event (bool, optional) – If True, connects a pick event that triggers when a neuron is clicked, by default True.

  • **kwargs (dict) – Arbitrary keyword arguments that can be passed to the event handler onpick when an interactive element is clicked. Common parameters could include data specific to the plot or visualization settings.

Returns:

  • fig (matplotlib.figure.Figure) – The Figure object containing the plot.

  • ax (matplotlib.axes.Axes) – The Axes object containing the plot elements.

  • h_axes (list) – A list of matplotlib.axes.Axes objects, each containing a pie chart for a single neuron’s composition.

Raises:

ValueError – If the length of x or s (if provided) does not match the number of neurons, or if the percentage values in s are not between 0 and 100.

plt_scatter(x, y, reg_line=True, mouse_click=False, connect_pick_event=True, **kwargs)#

Generates a scatter plot visualization for the SOM, displaying the data points assigned to each neuron and an optional regression line.

Parameters:
  • x (array-like) – A 2D array or sequence of vectors, where each row represents the x-coordinate data points assigned to a single neuron.

  • y (array-like) – A 2D array or sequence of vectors, where each row represents the y-coordinate data points assigned to a single neuron.

  • reg_line (bool, optional) – If True, a regression line is plotted for each neuron’s data, by default True.

  • mouse_click (bool, optional) – If True, enables the plot to respond to mouse clicks, allowing for interactive functionality such as querying or modifying neuron data, by default False.

  • connect_pick_event (bool, optional) – If True, connects a pick event that triggers when a neuron is clicked, by default True.

  • **kwargs (dict) – Arbitrary keyword arguments that can be passed to the event handler onpick when an interactive element is clicked. Common parameters could include data specific to the plot or visualization settings.

Returns:

  • fig (matplotlib.figure.Figure) – The Figure object containing the plot.

  • ax (matplotlib.axes.Axes) – The Axes object containing the plot elements.

  • h_axes (list) – A list of matplotlib.axes.Axes objects, each containing a scatter plot for a single neuron’s data.

plt_stem(x, y, mouse_click=False, connect_pick_event=True, **kwargs)#

Generates a stem plot visualization for the SOM, displaying the input data and neuron responses.

Parameters:
  • x (array-like) – The input data or independent variable for the stem plot.

  • y (array-like) – The neuron responses or dependent variable for the stem plot, where each row corresponds to a neuron.

  • mouse_click (bool, optional) – If True, enables the plot to respond to mouse clicks, allowing for interactive functionality such as querying or modifying neuron data, by default False.

  • connect_pick_event (bool, optional) – If True, connects a pick event that triggers when a neuron is clicked, by default True.

  • **kwargs (dict) – Arbitrary keyword arguments that can be passed to the event handler onpick when an interactive element is clicked. Common parameters could include data specific to the plot or visualization settings.

Returns:

  • fig (matplotlib.figure.Figure) – The Figure object containing the plot.

  • ax (matplotlib.axes.Axes) – The Axes object containing the plot elements.

  • h_axes (list) – A list of matplotlib.axes.Axes objects, each containing a stem plot for a single neuron.

plt_top(mouse_click=False, connect_pick_event=True, **kwargs)#

Plots the topology of the SOM using hexagonal units. This method visualizes the position and the boundaries of each neuron within the grid, allowing for interaction if enabled.

Parameters:
  • mouse_click (bool, optional) – If True, enables the plot to respond to mouse clicks, allowing interactive functionality such as querying or modifying neuron data, by default False.

  • connect_pick_event (bool, optional) – If True, connects a pick event that triggers when a neuron (hexagon) is clicked, by default True.

  • **kwargs (dict) – Arbitrary keyword arguments that can be passed to the event handler onpick when an interactive element is clicked. Common parameters could include data specific to the plot or visualization settings.

Returns:

  • fig (matplotlib.figure.Figure) – The Figure object containing the plot.

  • ax (matplotlib.axes.Axes) – The Axes object containing the plot elements.

  • patches (list) – A list of matplotlib.patches.Patch objects representing the hexagonal units of the SOM.

plt_top_num(mouse_click=False, connect_pick_event=True, **kwargs)#

Plots the topology of the SOM with each neuron numbered. This method visualizes each neuron as a hexagon with a number indicating its index, which is useful for identifying and referencing specific neurons during analysis.

Parameters:
  • mouse_click (bool, optional) – If True, enables the plot to respond to mouse clicks, allowing for interaction such as detailed queries or data manipulation associated with specific neurons, by default False.

  • connect_pick_event (bool, optional) – If True, connects a pick event that triggers when a neuron is clicked, by default True.

  • **kwargs (dict) – Arbitrary keyword arguments that can be passed to the event handler onpick when an interactive element is clicked. Common parameters could include data specific to the plot or visualization settings.

Returns:

  • fig (matplotlib.figure.Figure) – The Figure object containing the plot.

  • ax (matplotlib.axes.Axes) – The Axes object containing the plot elements.

  • patches (list) – A list of matplotlib.patches.Patch objects representing the hexagonal units of the SOM.

  • text (list) – A list of matplotlib.text.Text objects displaying the neuron indices.

plt_violin_plot(x, mouse_click=False, connect_pick_event=True, **kwargs)#

Generates a violin plot visualization for the SOM, displaying the distribution of data within each neuron’s cluster.

Parameters:
  • x (array-like) – A 2D array or sequence of vectors, where each row represents the data points assigned to a single neuron.

  • mouse_click (bool, optional) – If True, enables the plot to respond to mouse clicks, allowing for interactive functionality such as querying or modifying neuron data, by default False.

  • connect_pick_event (bool, optional) – If True, connects a pick event that triggers when a neuron is clicked, by default True.

  • **kwargs (dict) – Arbitrary keyword arguments that can be passed to the event handler onpick when an interactive element is clicked. Common parameters could include data specific to the plot or visualization settings.

Returns:

  • fig (matplotlib.figure.Figure) – The Figure object containing the plot.

  • ax (matplotlib.axes.Axes) – The Axes object containing the plot elements.

  • h_axes (list) – A list of matplotlib.axes.Axes objects, each containing a violin plot for a single neuron’s data distribution.

plt_wgts(mouse_click=False, connect_pick_event=True, **kwargs)#

Generates a line plot visualization for the SOM weights, displaying the weight vectors for each neuron.

Parameters:
  • mouse_click (bool, optional) – If True, enables the plot to respond to mouse clicks, allowing for interactive functionality such as querying or modifying neuron data, by default False.

  • connect_pick_event (bool, optional) – If True, connects a pick event that triggers when a neuron is clicked, by default True.

  • **kwargs (dict) – Arbitrary keyword arguments that can be passed to the event handler onpick when an interactive element is clicked. Common parameters could include data specific to the plot or visualization settings.

Returns:

  • fig (matplotlib.figure.Figure) – The Figure object containing the plot.

  • ax (matplotlib.axes.Axes) – The Axes object containing the plot elements.

  • h_axes (list) – A list of matplotlib.axes.Axes objects, each containing a line plot for a single neuron’s weight vector.

setup_axes()#
simple_grid(avg, sizes, mouse_click=False, connect_pick_event=True, **kwargs)#

Generates a simple grid plot that visualizes the SOM neurons as hexagons with varying sizes and colors.

Parameters:
  • avg (array-like) – An array containing the average values to be represented by the color map, where higher values correspond to warmer colors (e.g., red) and lower values correspond to cooler colors (e.g., blue).

  • sizes (array-like) – An array containing the sizes to be used for the inner hexagons within each neuron, where larger values result in larger hexagons.

  • mouse_click (bool, optional) – If True, enables the plot to respond to mouse clicks, allowing for interactive functionality such as querying or modifying neuron data, by default False.

  • connect_pick_event (bool, optional) – If True, connects a pick event that triggers when a neuron is clicked, by default True.

  • **kwargs (dict) – Arbitrary keyword arguments that can be passed to the event handler onpick when an interactive element is clicked. Common parameters could include data specific to the plot or visualization settings.

Returns:

  • fig (matplotlib.figure.Figure) – The Figure object containing the plot.

  • ax (matplotlib.axes.Axes) – The Axes object containing the plot elements.

  • patches (list) – A list of matplotlib.patches.Patch objects representing the inner hexagons colored based on the average values and sized based on the provided sizes.

  • cbar (matplotlib.colorbar.Colorbar) – The Colorbar object attached to the plot, representing the color mapping.

sub_clustering(data, neuron_ind)#

Performs sub-clustering on the data associated with a specific neuron within the Self-Organizing Map (SOM).

Parameters:
  • data (array-like) – The dataset from which sub-clusters are to be derived. Typically, this is the subset of the overall dataset that has been mapped to the neuron specified by neuron_ind.

  • neuron_ind (int) – The index of the neuron for which sub-clustering is to be performed. This index is used to refer to a specific neuron in the SOM’s grid.

Returns:

A list of clusters, where each cluster is an array of data points that form a sub-group within the neuron’s data.

Return type:

list of array-like

weight_as_image(rows=None, mouse_click=False, connect_pick_event=True, **kwargs)#

Visualizes the weights of a Self-Organizing Map (SOM) as images within a hexagonal grid layout.

This method maps the weight vectors of each neuron onto a hexagonal cell and optionally enables interaction with each hexagon. The hexagons represent the neurons, and the colors within each hexagon represent the neuron’s weight vector reshaped into either a specified or automatically determined matrix form. This visualization is useful for analyzing the learned patterns and feature representations within the SOM.

Parameters:
  • rows (int, optional) – The number of rows to reshape each neuron’s weight vector into. If None, the weight vector is reshaped into a square matrix by default. If specified, the weight vector is reshaped into a matrix with the given number of rows, and the number of columns is determined automatically.

  • mouse_click (bool, optional) – If True, enables the plot to respond to mouse clicks, allowing for interactive functionality such as querying or modifying neuron data, by default False.

  • connect_pick_event (bool, optional) – If True, connects a pick event that triggers when a neuron is clicked, by default True.

  • **kwargs (dict) – Arbitrary keyword arguments that can be passed to the event handler onpick when an interactive element is clicked. Common parameters could include data specific to the plot or visualization settings.

Returns:

  • fig (matplotlib.figure.Figure) – The Figure object containing the plot.

  • ax (matplotlib.axes.Axes) – The Axes object containing the plot elements.

  • patches (list) – A list of matplotlib.patches.Patch objects, each representing a hexagon in the plot.

NNSOM.utils module#

NNSOM.utils.cal_class_cluster_intersect(clust, *args)#

Calculate the intersection sizes of each class with each neuron cluster.

This function computes the size of the intersection between each given class (represented by arrays of indices) and each neuron cluster (represented by a list of lists of indices). The result is a 2D array where each row corresponds to a neuron cluster, and each column corresponds to one of the classes.

Parameters:
  • clust (list of lists) – A collection of neuron clusters, where each neuron cluster is a list of indices.

  • *args (sequence of array-like) – A variable number of arrays, each representing a class with indices.

Returns:

A 2D array where the entry at position (i, j) represents the number of indices in the j-th class that are also in the i-th neuron cluster.

Return type:

numpy.ndarray

Examples

>>> clust = [[4, 5, 9], [1, 7], [2, 10, 11], [3, 6, 8]]
>>> ind1 = np.array([1, 2, 3])
>>> ind2 = np.array([4, 5, 6])
>>> ind3 = np.array([7, 8, 9])
>>> ind4 = np.array([10, 11, 12])
>>> get_sizes_clust(clust, ind1, ind2, ind3, ind4)
array([[0, 2, 1, 0],
       [1, 0, 1, 0],
       [1, 0, 0, 2],
       [1, 1, 1, 0]])
NNSOM.utils.calculate_button_positions(num_buttons, sidebar_width)#
NNSOM.utils.calculate_positions(dim)#
NNSOM.utils.cart2pol(x, y)#
NNSOM.utils.closest_class_cluster(cat_feature, clust)#

Returns the cluster array with the closest class for each cluster.

Paramters#

cat_featurearray-like

Categorical feature array.

clustlist

A cluster array of indices sorted by distances.

returns:

closest_class – A cluster array with the closest class for each cluster.

rtype:

numpy array

NNSOM.utils.count_classes_in_cluster(cat_feature, clust)#

Count the occurrences of each class in each cluster using vectorized operations for efficiency.

Parameters:
  • cat_feature (array-like) – Categorical feature array.

  • clust (list) – A list of arrays, each containing the indices of elements in a cluster.

Returns:

cluster_counts – A 2D array with counts of each class in each cluster.

Return type:

numpy array

NNSOM.utils.create_buttons(fig, button_types)#
NNSOM.utils.distances(pos)#
NNSOM.utils.flatten(data)#

Recursively flattens a nested list structure of numbers into a single list.

Parameters:

data – A number (int or float) or a nested list of numbers. The data to be flattened.

Returns:

A list of numbers, where all nested structures in the input have been flattened into a single list.

NNSOM.utils.get_cluster_array(feature, clust)#

Returns a NumPy array of objects, each containing the feature values for each cluster.

Parameters:
  • feature (array-like) – Feature array.

  • clust (list) – A list of cluster arrays, each containing indices sorted by distances.

Returns:

cluster_array – A NumPy array where each element is an array of feature values for that cluster.

Return type:

numpy.ndarray

NNSOM.utils.get_cluster_avg(feature, clust)#

Returns the average value of a feature for each cluster.

Parameters:
  • feature (array-like) – Feature array.

  • clust (list) – A list of cluster arrays, each containing indices sorted by distances.

Returns:

cluster_avg – A cluster array with the average value of the feature for each cluster.

Return type:

numpy array

NNSOM.utils.get_cluster_data(data, clust)#

For each cluster, extract the corresponding data points and return them in a list.

Parameters:
  • data (numpy array) – The dataset from which to extract the clusters’ data.

  • clust (list of arrays) – A list where each element is an array of indices for data points in the corresponding cluster.

Returns:

cluster_data_list – A list where each element is a numpy array containing the data points of a cluster.

Return type:

list of numpy arrays

NNSOM.utils.get_color_labels(clust, *listOfIndices)#

Generates color label for each cluster based on indices of classes.

Parameters:
  • clust – sequence of vectors A sequence of vectors, each containing the indices of elements in a cluster.

  • *args – 1-d array A list of indices where the specific class is present.

NNSOM.utils.get_conf_indices(target, results, target_class)#

Get the indices of True Positive, True Negative, False Positive, and False Negative for a specific target class.

Parameters:
  • target (array-like) – The true target values.

  • results (array-like) – The predicted values.

  • target_class (int) – The target class for which to get the confusion indices.

Returns:

  • tp_index (numpy array) – Indices of True Positives.

  • tn_index (numpy array) – Indices of True Negatives.

  • fp_index (numpy array) – Indices of False Positives.

  • fn_index (numpy array) – Indices of False Negatives.

NNSOM.utils.get_dominant_class_error_types(dominant_classes, error_types)#

Map dominant class to the corresponding majority error type for each cluster, dynamically applying the correct error type based on the dominant class.

Parameters:#

dominant_classes: array-like (som.numNeurons, )

List of dominant class labels for each cluster. May contain NaN values.

error_types: list of array-like (numClasses, som.numNeurons)

Variable number of arrays, each representing majority error types for each class.

Returns:#

array-like (som.numNeurons, )

List of majority error type for each cluster corresponding to the dominant class.

NNSOM.utils.get_edge_shape()#
NNSOM.utils.get_edge_widths(indices, clust)#

Calculate edge width for each cluster based on the number of indices in the cluster.

Parameters:
  • indices – 1-d array Array of indices for the specific class.

  • clust – sequence of vectors A sequence of vectors, each containing the indices of elements in a cluster.

Returns:

1-d array

Array of edge widths for each cluster.

Return type:

lwidth

NNSOM.utils.get_global_min_max(data)#

Finds the global minimum and maximum values in a nested list structure.

This function flattens the input data into a single list and then determines the minimum and maximum values.

Parameters:

data – A nested list of integers. The structure can be of any depth.

Returns:

A tuple (min_value, max_value) where min_value is the minimum value in the data, and max_value is the maximum value.

NNSOM.utils.get_hexagon_shape()#
NNSOM.utils.get_ind_misclassified(target, prediction)#

Get the indices of misclassified items.

Parameters:
  • target (array-like) – The true target values.

  • prediction (array-like) – The predicted values.

Returns:

misclassified_indices – List of indices of misclassified items.

Return type:

list

NNSOM.utils.get_perc_cluster(cat_feature, target, clust)#

Return cluster array with the percentage of a specific target class in each cluster.

Parameters:
  • cat_feature (array-like) – Categorical feature array.

  • target (int or str) – Target class to calculate the percentage.

  • clust (list) – A cluster array of indices sorted by distances.

Returns:

cluster_array – A cluster array with the percentage of target class.

Return type:

numpy array

NNSOM.utils.get_perc_misclassified(target, prediction, clust)#

Calculate the percentage of misclassified items in each cluster and return as a numpy array.

Parameters:
  • target (array-like) – The true target values.

  • prediction (array-like) – The predicted values.

  • clust (array-like) – List of arrays, each containing the indices of elements in a cluster.

Returns:

proportion_misclassified – Percentage of misclassified items in each cluster.

Return type:

numpy array

NNSOM.utils.majority_class_cluster(cat_feature, clust)#

Returns the cluster array with the majority class for each cluster.

Paramters#

cat_featurearray-like

Categorical feature array.

clustlist

A cluster array of indices sorted by distances.

returns:

majority_class – A cluster array with the majority class

rtype:

numpy array

NNSOM.utils.normalize_position(position)#
NNSOM.utils.pol2cart(theta, rho)#
NNSOM.utils.preminmax(p)#
NNSOM.utils.rotate_xy(x1, y1, angle)#
NNSOM.utils.spread_positions(position, positionMean, positionBasis)#

Module contents#