spectralbrain.statistics.clustering#

Spectral-descriptor clustering for brain surface meshes.

Spatial, temporal, and joint spatio-temporal clustering of HKS, WKS, and fused descriptor matrices on triangle meshes and point clouds. Every algorithm is input-agnostic (mesh or point cloud — only an adjacency matrix is needed for spatially-regularised methods) and exposes a backend parameter for CPU/GPU dispatch.

Sections#

§1 Result containers §2 Distance / affinity construction §3 Spatial clustering (cluster vertices by descriptor values) §4 Temporal / scale clustering (cluster the t- or E-axis) §5 Spatio-temporal joint clustering §6 HKS + WKS descriptor fusion §7 Bayesian cluster confirmation §8 Cluster quality & comparison metrics §9 Convenience / pipeline wrappers §10 Persistence vineyards — tracking topology across HKS scales §11 Mapper pipeline — topological data analysis on meshes §12 Non-negative tensor decomposition (multi-subject) §13 Joint time-vertex graph signal processing §14 Scale-space blob tracking (Lindeberg on manifolds) §15 Multi-view clustering (geometry + descriptor views) §16 Spectral graph wavelet clustering

Design principles#

Backend-agnostic: backend="auto" chooses GPU when available.
k-free where possible: most methods auto-determine cluster count.
Progress bars: all O(n²) or iterative routines expose Rich bars.
Memory-safe: GPU paths use batched transfers; CPU paths use joblib parallelism where effective.
Reproducible: every stochastic method accepts random_state.

References

Sun J, Ovsjanikov M, Guibas L. A concise and provably informative: multi-scale signature based on heat diffusion. Computer Graphics Forum 28(5):1383–1392, 2009.
Aubry M, Schlickewei U, Cremers D. The wave kernel signature: a: quantum mechanical approach to shape analysis. ICCV Workshops, 2011.
Cai D, He X, Han J, Huang TS. Graph regularized nonnegative matrix: factorization for data representation. IEEE TPAMI 33(8): 1548–1560, 2011.
Campello RJGB, Moulavi D, Sander J. Density-based clustering based: on hierarchical density estimates. PAKDD, 2013.
Chazal F, Guibas LJ, Oudot SY, Skraba P. Persistence-based: clustering in Riemannian manifolds. J. ACM 60(6):41, 2013.
Dhillon IS. Co-clustering documents and words using bipartite: spectral graph partitioning. KDD, 2001.

Functions

`auto_cluster`(H, *[, adjacency, methods, ...])	Run multiple clustering algorithms and return all results.
`build_descriptor_distance`(H, *[, metric, ...])	Build pairwise distance matrix from a descriptor matrix.
`build_hks_affinity_graph`(H, adjacency, *[, ...])	Build HKS-weighted mesh adjacency for graph-based clustering.
`build_hybrid_distance`(D_descriptor, adjacency, *)	Fuse descriptor distance with geodesic distance on the mesh.
`cluster_comparison`(labels_a, labels_b)	Compare two clusterings (e.g., algorithm output vs atlas labels).
`cluster_dpmm`(H, *[, adjacency, ...])	Dirichlet Process Mixture Model — automatic cluster count.
`cluster_gnmf`(H, adjacency, *[, ...])	Graph-Regularised Non-negative Matrix Factorisation (GNMF).
`cluster_hdbscan`(H, *[, adjacency, ...])	HDBSCAN clustering on spectral descriptor features.
`cluster_joint_spectral`(H, adjacency, *[, ...])	Cluster vertices by joint time-vertex spectral energy.
`cluster_leiden`(adjacency_or_H, *[, H, ...])	Leiden community detection on mesh graph.
`cluster_mapper`(H, *[, lens, custom_lens, ...])	TDA Mapper pipeline with HKS-derived lens function.
`cluster_multiview`(H, adjacency, *[, ...])	Multi-view clustering with geometry and descriptor views.
`cluster_persistence`(H_scalar, adjacency, *)	Persistence-based clustering (ToMATo) on a scalar field.
`cluster_quality`(H, labels, *[, adjacency, ...])	Compute internal clustering quality metrics.
`cluster_scalespace_blobs`(H, adjacency, *[, ...])	Lindeberg-style scale-space blob tracking on HKS.
`cluster_spatiotemporal_gnmf`(H, adjacency, *)	Graph-regularised NMF with both spatial and temporal smoothness.
`cluster_spatiotemporal_stdbscan`(H, adjacency, *)	ST-DBSCAN adapted for mesh + spectral descriptor profiles.
`cluster_spectral_coclustering`(H, *[, ...])	Spectral co-clustering of the vertex × time/energy matrix.
`cluster_temporal_dtw`(H, *[, n_clusters, ...])	Time-series k-means with DTW on HKS/WKS profiles.
`cluster_temporal_fpca`(H, *[, n_components, ...])	Functional PCA on HKS time-profiles, then cluster fPC scores.
`cluster_tensor_decomposition`(tensor, *[, ...])	Non-negative CP/PARAFAC or Tucker on (vertices × scales × subjects).
`cluster_vineyards`(H, adjacency, *[, ...])	Track persistence diagram points across HKS time-scales.
`cluster_wavelet_coefficients`(H, adjacency, *)	Cluster vertices by spectral graph wavelet energy profiles.
`confirm_clusters_bayesian`(H, labels, *[, ...])	Bayesian confirmation of cluster assignments.
`denoise_joint_timevertex`(H, adjacency, *[, ...])	Joint time-vertex low-pass filtering of a descriptor matrix.
`find2`(parent, x)	Path-compressed find for union-find.
`fuse_concatenate`(hks, wks, *[, ...])	Simple weighted concatenation of HKS and WKS.
`fuse_joint_nmf`(hks, wks, *[, n_components, ...])	Joint NMF on concatenated [HKS \| WKS] for shared basis.
`fuse_multi_kernel`(hks, wks, *[, ...])	Multi-kernel fusion: build a combined kernel from HKS and WKS.

Classes

`BayesianClusterConfirmation`(...[, metadata])	Output of Bayesian cluster confirmation analysis.
`ClusterResult`(labels, n_clusters, method[, ...])	Output of any spatial or spatio-temporal clustering algorithm.
`FusionResult`(fused, method[, weights, metadata])	Output of HKS + WKS descriptor fusion.
`MapperResult`(nerve_graph, node_membership, ...)	Output of the Mapper pipeline on a descriptor-equipped mesh.
`ScaleSpaceBlobResult`(blob_trajectories, ...)	Output of Lindeberg-style scale-space blob tracking on HKS.
`TemporalClusterResult`(labels, n_clusters, method)	Output of temporal/scale-axis clustering.
`TensorDecompositionResult`(spatial_factors, ...)	Output of non-negative tensor CP/PARAFAC or Tucker decomposition.
`VineyardResult`(vines, diagrams, ...[, metadata])	Output of persistence vineyard tracking across HKS scales.

class spectralbrain.statistics.clustering.BayesianClusterConfirmation(posterior_labels, label_probabilities, waic, loo, cluster_credible_intervals, agreement_with_input, metadata=<factory>)[source]#

Bases: object

Output of Bayesian cluster confirmation analysis.

Parameters:

posterior_labels (NDArray[integer])
label_probabilities (ndarray)
waic (float)
loo (float)
cluster_credible_intervals (dict[int, dict[str, Any]])
agreement_with_input (float)
metadata (dict[str, Any])

posterior_labels#

MAP cluster assignments from the Bayesian model.

Type:: ndarray, shape (N,)

label_probabilities#

Full posterior membership probabilities.

Type:: ndarray, shape (N, K)

waic#

Widely Applicable Information Criterion.

Type:: float

loo#

Leave-One-Out cross-validation estimate (ELPD).

Type:: float

cluster_credible_intervals#

Per-cluster posterior summaries (mean, HDI of centroid).

Type:: dict

agreement_with_input#

ARI between input labels and Bayesian MAP labels.

Type:: float

metadata#

Type:: dict

agreement_with_input: float#

cluster_credible_intervals: dict[int, dict[str, Any]]#

label_probabilities: ndarray#

loo: float#

metadata: dict[str, Any]#

posterior_labels: NDArray[integer]#

waic: float#

class spectralbrain.statistics.clustering.ClusterResult(labels, n_clusters, method, probabilities=None, quality=<factory>, metadata=<factory>)[source]#

Bases: object

Output of any spatial or spatio-temporal clustering algorithm.

Parameters:

labels (NDArray[integer])
n_clusters (int)
method (str)
probabilities (ndarray | None)
quality (dict[str, float])
metadata (dict[str, Any])

labels#

Integer cluster label per vertex. -1 = noise / unassigned.

Type:: ndarray, shape (N,)

n_clusters#

Number of discovered clusters (excluding noise).

Type:: int

method#

Algorithm name (e.g. "hdbscan", "gnmf", "leiden").

Type:: str

probabilities#

Soft membership. Shape depends on method: (N,) for HDBSCAN membership probability, (N, K) for NMF / DPMM component weights.

Type:: ndarray or None, shape (N,) or (N, K)

quality#

Internal quality metrics (silhouette, modularity, etc.).

Type:: dict

metadata#

Algorithm-specific outputs (condensed tree, persistence diagram, temporal profiles, etc.).

Type:: dict

property cluster_sizes: dict[int, int]#: Return a dict mapping cluster label to member count.

labels: NDArray[integer]#

metadata: dict[str, Any]#

method: str#

n_clusters: int#

property noise_count: int#: Return the number of noise (unclustered) points.

probabilities: ndarray | None = None#

quality: dict[str, float]#

class spectralbrain.statistics.clustering.FusionResult(fused, method, weights=None, metadata=<factory>)[source]#

Bases: object

Output of HKS + WKS descriptor fusion.

Parameters:

fused (NDArray[floating])
method (str)
weights (ndarray | None)
metadata (dict[str, Any])

fused#

Fused per-vertex descriptor matrix.

Type:: ndarray, shape (N, D)

method#

Fusion strategy name.

Type:: str

weights#

Per-channel or per-kernel weights (for MKL / learned fusions).

Type:: ndarray or None

metadata#

Type:: dict

fused: NDArray[floating]#

metadata: dict[str, Any]#

method: str#

weights: ndarray | None = None#

class spectralbrain.statistics.clustering.MapperResult(nerve_graph, node_membership, n_nodes, n_edges, vertex_to_nodes, metadata=<factory>)[source]#

Bases: object

Output of the Mapper pipeline on a descriptor-equipped mesh.

Parameters:

nerve_graph (dict[int, list[int]])
node_membership (dict[int, list[int]])
n_nodes (int)
n_edges (int)
vertex_to_nodes (ndarray)
metadata (dict[str, Any])

nerve_graph#

Adjacency list of the nerve complex (node → set of neighbours).

Type:: dict

node_membership#

Mapping from nerve-node index to list of vertex indices.

Type:: dict

n_nodes#

Number of nodes in the nerve complex.

Type:: int

n_edges#

Number of edges.

Type:: int

vertex_to_nodes#

For each vertex, the nerve node(s) it belongs to (first hit).

Type:: ndarray, shape (N,)

metadata#

Contains the full kmapper.KeplerMapper graph object for downstream visualisation.

Type:: dict

metadata: dict[str, Any]#

n_edges: int#

n_nodes: int#

nerve_graph: dict[int, list[int]]#

node_membership: dict[int, list[int]]#

vertex_to_nodes: ndarray#

class spectralbrain.statistics.clustering.ScaleSpaceBlobResult(blob_trajectories, natural_scales, blob_labels, n_blobs, metadata=<factory>)[source]#

Bases: object

Output of Lindeberg-style scale-space blob tracking on HKS.

Parameters:

blob_trajectories (list[list[dict[str, Any]]])
natural_scales (ndarray)
blob_labels (NDArray[integer])
n_blobs (int)
metadata (dict[str, Any])

blob_trajectories#

Each trajectory is a list of {vertex, scale_index, t_value, response} dicts ordered by scale.

Type:: list of list of dict

natural_scales#

For each vertex, the t at which the normalised LoG response is maximal (its “natural scale”).

Type:: ndarray, shape (N,)

blob_labels#

Cluster label derived from trajectory membership.

Type:: ndarray, shape (N,)

n_blobs#

Type:: int

metadata#

Type:: dict

blob_labels: NDArray[integer]#

blob_trajectories: list[list[dict[str, Any]]]#

metadata: dict[str, Any]#

n_blobs: int#

natural_scales: ndarray#

class spectralbrain.statistics.clustering.TemporalClusterResult(labels, n_clusters, method, centroids=None, quality=<factory>, metadata=<factory>)[source]#

Bases: object

Output of temporal/scale-axis clustering.

Clusters the T time/energy samples rather than the N vertices.

Parameters:

labels (NDArray[integer])
n_clusters (int)
method (str)
centroids (ndarray | None)
quality (dict[str, float])
metadata (dict[str, Any])

labels#

Cluster label per time/energy sample.

Type:: ndarray, shape (T,)

n_clusters#

Type:: int

centroids#

Centroid profiles (one per cluster × all vertices).

Type:: ndarray or None, shape (K, N)

method#

Type:: str

quality#

Type:: dict

centroids: ndarray | None = None#

labels: NDArray[integer]#

metadata: dict[str, Any]#

method: str#

n_clusters: int#

quality: dict[str, float]#

class spectralbrain.statistics.clustering.TensorDecompositionResult(spatial_factors, temporal_factors, subject_factors, labels, n_components, reconstruction_error, metadata=<factory>)[source]#

Bases: object

Output of non-negative tensor CP/PARAFAC or Tucker decomposition.

For a tensor ℋ ∈ ℝ^{n × T × S} (vertices × scales × subjects):

Parameters:

spatial_factors (ndarray)
temporal_factors (ndarray)
subject_factors (ndarray)
labels (NDArray[integer])
n_components (int)
reconstruction_error (float)
metadata (dict[str, Any])

spatial_factors#

Shared spatial components (vertex loadings).

Type:: ndarray, shape (N, R)

temporal_factors#

Population-level temporal/scale profiles.

Type:: ndarray, shape (T, R)

subject_factors#

Per-subject component strengths.

Type:: ndarray, shape (S, R)

labels#

Hard cluster labels from argmax of spatial factors.

Type:: ndarray, shape (N,)

n_components#

Type:: int

reconstruction_error#

Type:: float

metadata#

Type:: dict

labels: NDArray[integer]#

metadata: dict[str, Any]#

n_components: int#

reconstruction_error: float#

spatial_factors: ndarray#

subject_factors: ndarray#

temporal_factors: ndarray#

class spectralbrain.statistics.clustering.VineyardResult(vines, diagrams, salient_features, scale_of_emergence, metadata=<factory>)[source]#

Bases: object

Output of persistence vineyard tracking across HKS scales.

Parameters:

vines (list[ndarray])
diagrams (dict[int, ndarray])
salient_features (list[dict[str, Any]])
scale_of_emergence (ndarray)
metadata (dict[str, Any])

vines#

Each vine is an array of (t, birth, death) triples tracing one topological feature across scales.

Type:: list of ndarray

diagrams#

Persistence diagram at each sampled t, keyed by t index.

Type:: dict

salient_features#

Features that persist across ≥ min_life fraction of the t-range, with their birth/death scale and spatial location.

Type:: list of dict

scale_of_emergence#

The t at which each salient feature first appears.

Type:: ndarray, shape (n_features,)

metadata#

Type:: dict

diagrams: dict[int, ndarray]#

metadata: dict[str, Any]#

salient_features: list[dict[str, Any]]#

scale_of_emergence: ndarray#

vines: list[ndarray]#

spectralbrain.statistics.clustering.auto_cluster(H, *, adjacency=None, methods=('hdbscan', 'leiden', 'gnmf'), random_state=42, backend='auto', **kwargs)[source]#

Run multiple clustering algorithms and return all results.

A convenience function for exploratory analysis that runs a battery of methods on the same data and returns a dict keyed by method name. The user can then compare via cluster_quality() and cluster_comparison().

Parameters:

H (ndarray, shape (N, T))
adjacency (sparse or None)
methods (sequence of str) – Subset of {"hdbscan", "leiden", "gnmf", "dpmm", "fpca", "coclustering", "persistence"}.
random_state (int)
backend (str)
**kwargs – Forwarded to individual clustering functions.

Returns:

dict[str, ClusterResult]

Return type:

dict[str, ClusterResult]

spectralbrain.statistics.clustering.build_descriptor_distance(H, *, metric='euclidean', log_transform=True, normalize='l1', backend='auto')[source]#

Build pairwise distance matrix from a descriptor matrix.

Parameters:

H (ndarray, shape (N, T)) – Per-vertex descriptor (HKS, WKS, or fused).
metric (str) – Distance metric in feature space.
log_transform (bool) – Apply log(H + ε) before distance computation. Recommended for HKS because values span many orders of magnitude.
normalize (str) – Per-row normalisation after optional log transform.
backend (str) – "auto" selects GPU if torch+CUDA available.

Returns:

ndarray, shape (N, N) – Symmetric pairwise distance matrix.

Return type:

NDArray[floating]

spectralbrain.statistics.clustering.build_hks_affinity_graph(H, adjacency, *, sigma=None, log_transform=True)[source]#

Build HKS-weighted mesh adjacency for graph-based clustering.

Each mesh edge (i, j) gets weight w_ij = c_ij · exp(-||h_i - h_j||² / 2σ²) where c_ij is the original cotangent weight and h is the (optionally log-transformed, L1-normalised) HKS vector.

Parameters:

H (ndarray, shape (N, T)) – Descriptor matrix.
adjacency (sparse, shape (N, N)) – Mesh Laplacian or adjacency with cotangent weights.
sigma (float or None) – Kernel bandwidth. If None, uses the median edge-HKS-distance.
log_transform (bool) – Apply log(H + ε) before computing feature distances.

Returns:

sparse, shape (N, N) – Weighted adjacency (CSR).

Return type:

spmatrix

spectralbrain.statistics.clustering.build_hybrid_distance(D_descriptor, adjacency, *, alpha=0.7, geodesic_backend='dijkstra')[source]#

Fuse descriptor distance with geodesic distance on the mesh.

Parameters:

D_descriptor (ndarray, shape (N, N)) – Pairwise descriptor-space distance.
adjacency (sparse, shape (N, N)) – Mesh adjacency (cotangent weights or binary).
alpha (float) – Weight for descriptor distance. 0.0 = pure geodesic, 1.0 = pure descriptor.
geodesic_backend (str) – "dijkstra" via scipy or "heat" via potpourri3d.

Returns:

ndarray, shape (N, N) – Normalised fused distance.

Return type:

NDArray[floating]

spectralbrain.statistics.clustering.cluster_comparison(labels_a, labels_b)[source]#

Compare two clusterings (e.g., algorithm output vs atlas labels).

Returns:

dict – Keys: ari, nmi, ami, homogeneity, completeness, v_measure.

Parameters:

labels_a (NDArray[integer])
labels_b (NDArray[integer])

Return type:

dict[str, float]

spectralbrain.statistics.clustering.cluster_dpmm(H, *, adjacency=None, max_components=25, dim_reduction='pca', n_components_reduce=8, log_transform=True, random_state=42, backend='variational', mrf_beta=0.0)[source]#

Dirichlet Process Mixture Model — automatic cluster count.

Parameters:

H (ndarray, shape (N, T)) – Descriptor matrix.
adjacency (sparse or None) – Mesh adjacency for MRF spatial prior (only with pymc backend).
max_components (int) – Truncation level for variational DP.
dim_reduction (str or None) – Reduce dimensionality before fitting.
n_components_reduce (int) – Target dimensionality.
log_transform (bool)
random_state (int)
backend (str) – "variational" uses sklearn BayesianGaussianMixture (fast). "pymc" uses full MCMC with optional MRF prior (slow, rich).
mrf_beta (float) – Potts MRF coupling strength (pymc backend only).

Returns:

ClusterResult

Return type:

ClusterResult

spectralbrain.statistics.clustering.cluster_gnmf(H, adjacency, *, n_components=8, lam=1.0, sparsity_alpha=0.0, n_iter=300, tol=1e-05, random_state=42, backend='auto')[source]#

Graph-Regularised Non-negative Matrix Factorisation (GNMF).

Decomposes the non-negative descriptor matrix H ≈ W·F subject to a Laplacian smoothness penalty on the spatial factor W, so that mesh-adjacent vertices receive similar component activations.

\[\min_{W,F \geq 0} \frac{1}{2}\|H - WF\|_F^2 + \lambda \operatorname{tr}(W^T L W) + \alpha \|W\|_1\]

Parameters:

H (ndarray, shape (N, T)) – Non-negative descriptor matrix (HKS is always ≥ 0).
adjacency (sparse, shape (N, N)) – Mesh adjacency (cotangent Laplacian or its negative off-diag).
n_components (int) – Number of spatial components (≈ cluster count).
lam (float) – Laplacian smoothness weight λ.
sparsity_alpha (float) – ℓ₁ sparsity on W.
n_iter (int) – Maximum multiplicative update iterations.
tol (float) – Relative objective change for convergence.
random_state (int)
backend (str)

Returns:

ClusterResult – With W (spatial), F (temporal profiles) in metadata.

Return type:

ClusterResult

spectralbrain.statistics.clustering.cluster_hdbscan(H, *, adjacency=None, alpha_fusion=0.7, min_cluster_size=200, min_samples=10, cluster_selection_method='eom', metric='euclidean', dim_reduction='umap', n_components=8, log_transform=True, random_state=42, backend='auto')[source]#

HDBSCAN clustering on spectral descriptor features.

Density-based clustering that automatically determines the number of clusters. When an adjacency matrix is provided, uses a fused geodesic + descriptor distance for spatially coherent parcellation.

Parameters:

H (ndarray, shape (N, T)) – Per-vertex descriptor matrix (HKS, WKS, or fused).
adjacency (sparse or None) – Mesh adjacency for hybrid distance fusion. If None, clusters purely in descriptor feature space.
alpha_fusion (float) – Weight for descriptor distance in the hybrid metric.
min_cluster_size (int) – Minimum cluster size for HDBSCAN.
min_samples (int) – Core-distance neighbourhood size.
cluster_selection_method (str) – "eom" (excess of mass) or "leaf".
metric (str) – If "precomputed", H is treated as a distance matrix.
dim_reduction (str or None) – Reduce descriptor dimensionality before clustering.
n_components (int) – Target dimensionality for reduction.
log_transform (bool) – Apply log(H + ε) before processing.
random_state (int) – Seed for reproducibility.
backend (str) – "auto" / "cpu" / "gpu" for distance computation.

Returns:

ClusterResult – With outlier_scores in metadata.

Return type:

ClusterResult

spectralbrain.statistics.clustering.cluster_joint_spectral(H, adjacency, *, n_clusters=6, n_eigenvectors=30, n_freq_bands=5, clusterer='kmeans', random_state=42)[source]#

Cluster vertices by joint time-vertex spectral energy.

Computes the Joint Fourier Transform of H, partitions the (graph-frequency, time-frequency) plane into bands, measures energy concentration per vertex per band, and clusters vertices by their spectral energy profile.

Parameters:

H (ndarray, shape (N, T))
adjacency (sparse, shape (N, N))
n_clusters (int) – For k-means.
n_eigenvectors (int) – Graph Laplacian eigenvectors.
n_freq_bands (int) – Number of bands to partition the graph-frequency axis.
clusterer (str)
random_state (int)

Returns:

ClusterResult – With spectral_energy in metadata.

Return type:

ClusterResult

spectralbrain.statistics.clustering.cluster_leiden(adjacency_or_H, *, H=None, resolution=1.0, quality_function='modularity', n_iterations=-1, random_state=42, sigma=None)[source]#

Leiden community detection on mesh graph.

Accepts either (a) a pre-built weighted adjacency matrix or (b) a raw descriptor matrix adjacency_or_H plus mesh adjacency H (confusing naming avoided by keyword use).

Parameters:

adjacency_or_H (sparse or ndarray) – If sparse: the weighted adjacency/affinity graph. If ndarray (N, T): descriptor matrix (requires H=None and will build k-NN graph from descriptors).
H (ndarray or None) – If adjacency_or_H is sparse and H is provided, weights are multiplied by HKS affinity.
resolution (float) – Resolution parameter γ. Higher = more clusters.
quality_function (str) – "modularity" (RBConfiguration) or "cpm" (CPM).
n_iterations (int) – Leiden iterations. -1 = iterate until stable.
random_state (int)
sigma (float or None) – Bandwidth for HKS affinity kernel (if H provided).

Returns:

ClusterResult – With modularity in quality dict.

Return type:

ClusterResult

spectralbrain.statistics.clustering.cluster_mapper(H, *, lens='hks_sum', custom_lens=None, n_cubes=15, perc_overlap=0.3, clusterer_method='dbscan', clusterer_eps=0.5, dim_reduction=None, n_components=2, random_state=42)[source]#

TDA Mapper pipeline with HKS-derived lens function.

Projects each vertex through a lens (filter) function into ℝ^d, covers the lens range with overlapping hypercubes, clusters within each pullback, and forms the nerve complex. The resulting graph is a topological skeleton that reveals branching structure, loops, and flares in the descriptor space.

Parameters:

H (ndarray, shape (N, T)) – Per-vertex descriptor matrix.
lens (str) – "hks_sum" — sum of HKS across all scales. "hks_first_pc" — first PC of the descriptor matrix. "custom" — use custom_lens.
custom_lens (ndarray or None, shape (N,) or (N, d)) – Custom lens function values.
n_cubes (int) – Number of intervals per lens dimension.
perc_overlap (float) – Overlap fraction between adjacent intervals (0–1).
clusterer_method (str) – Clustering algorithm within each pullback.
clusterer_eps (float) – Epsilon for DBSCAN within pullbacks.
dim_reduction (str or None) – Reduce descriptor space before building Mapper graph.
n_components (int) – Target dimensionality for reduction.
random_state (int)

Returns:

MapperResult

Return type:

MapperResult

References

Singh G, Mémoli F, Carlsson G. Topological methods for the: analysis of high dimensional data sets and 3D object recognition. SPBG, 2007.

spectralbrain.statistics.clustering.cluster_multiview(H, adjacency, *, n_clusters=6, n_eigenvectors_geo=20, n_eigenvectors_desc=10, alpha=0.5, fusion='spectral_average', random_state=42)[source]#

Multi-view clustering with geometry and descriptor views.

View 1: Low-frequency Laplacian eigenfunctions (encoding spatial position on the manifold — the same basis from which HKS is built).

View 2: HKS/WKS descriptor profiles (encoding multi-scale geometric features).

Three fusion strategies are available:

"spectral_average": average the normalised Laplacians of both views, then spectral clustering on the combined Laplacian.
"late_consensus": cluster each view independently, then reconcile via consensus (CSPA).
"concatenate": stack view features and cluster jointly.

Parameters:

H (ndarray, shape (N, T)) – Descriptor matrix (View 2).
adjacency (sparse, shape (N, N)) – Mesh Laplacian (View 1 is its eigenfunctions).
n_clusters (int)
n_eigenvectors_geo (int) – Number of Laplacian eigenfunctions for View 1.
n_eigenvectors_desc (int) – PCA components for View 2.
alpha (float) – Weight for View 1 (geometry). 1−α for View 2 (descriptors).
fusion (str)
random_state (int)

Returns:

ClusterResult

Return type:

ClusterResult

References

Kumar A, Rai P, Daumé H. Co-regularized multi-view spectral: clustering. NeurIPS 24, 2011.

spectralbrain.statistics.clustering.cluster_persistence(H_scalar, adjacency, *, persistence_threshold=None, n_clusters=None)[source]#

Persistence-based clustering (ToMATo) on a scalar field.

Treats HKS(·, t₀) as a density function on the mesh and uses persistent homology of sub-level sets to find topologically stable basins — each basin becomes a cluster.

Parameters:

H_scalar (ndarray, shape (N,)) – Scalar field on the mesh (e.g. HKS at one time-scale, or summed HKS, or any vertex-wise feature).
adjacency (sparse, shape (N, N)) – Mesh adjacency.
persistence_threshold (float or None) – Minimum persistence to retain a cluster. If None and n_clusters is None, uses the largest gap in the diagram.
n_clusters (int or None) – If set, selects the top-n most persistent components.

Returns:

ClusterResult – With persistence_pairs and diagram in metadata.

Return type:

ClusterResult

spectralbrain.statistics.clustering.cluster_quality(H, labels, *, adjacency=None, metric='euclidean')[source]#

Compute internal clustering quality metrics.

Parameters:

H (ndarray, shape (N, T) or (N, N)) – Descriptor matrix or precomputed distance matrix.
labels (ndarray, shape (N,))
adjacency (sparse or None) – For spatial coherence metrics.
metric (str)

Returns:

dict – Keys: silhouette, calinski_harabasz, davies_bouldin, spatial_coherence (if adjacency provided).

Return type:

dict[str, float]

spectralbrain.statistics.clustering.cluster_scalespace_blobs(H, adjacency, *, t_values=None, gamma_normalize=1.0, linking_radius=3.0, min_trajectory_length=3)[source]#

Lindeberg-style scale-space blob tracking on HKS.

HKS(x, t) IS the Gaussian scale-space on the manifold. This function detects local maxima of the scale-normalised response t^γ · HKS(x, t) at each scale, links them across consecutive scales by geodesic proximity, and assigns each vertex to the blob trajectory whose maximum is closest.

Parameters:

H (ndarray, shape (N, T)) – HKS matrix at T log-spaced scales.
adjacency (sparse, shape (N, N)) – Mesh adjacency (for 1-ring local-max detection).
t_values (ndarray or None, shape (T,)) – Actual diffusion time values. If None, assumes 1..T.
gamma_normalize (float) – Scale-normalisation exponent γ (Lindeberg 1998). Default 1.0 corresponds to the standard normalised Laplacian of Gaussian.
linking_radius (float) – Maximum geodesic hops to link a maximum across scales.
min_trajectory_length (int) – Minimum number of scales a blob must span.

Returns:

ScaleSpaceBlobResult

Return type:

ScaleSpaceBlobResult

References

Lindeberg T. Feature detection with automatic scale selection.: IJCV 30(2):79–116, 1998.

spectralbrain.statistics.clustering.cluster_spatiotemporal_gnmf(H, adjacency, *, n_components=8, lam_spatial=1.0, lam_temporal=0.1, n_iter=300, tol=1e-05, random_state=42, backend='auto')[source]#

Graph-regularised NMF with both spatial and temporal smoothness.

Extends GNMF by adding a temporal smoothness penalty on F, encouraging neighbouring time scales to have similar profiles.

\[\min_{W,F \geq 0} \tfrac{1}{2}\|H - WF\|_F^2 + \lambda_s \operatorname{tr}(W^T L_s W) + \lambda_t \operatorname{tr}(F L_t F^T)\]

where L_s is the mesh Laplacian and L_t is the 1D chain Laplacian along the time axis.

Parameters:

H (ndarray, shape (N, T))
adjacency (sparse, shape (N, N))
n_components (int)
lam_spatial (float)
lam_temporal (float)
n_iter (int)
tol (float)
random_state (int)
backend (str)

Returns:

ClusterResult – With spatial W and temporally-smooth F in metadata.

Return type:

ClusterResult

spectralbrain.statistics.clustering.cluster_spatiotemporal_stdbscan(H, adjacency, *, eps_spatial=None, eps_temporal=None, min_pts=10, temporal_metric='euclidean', gamma_dtw=0.1, backend='auto')[source]#

ST-DBSCAN adapted for mesh + spectral descriptor profiles.

Uses a conjunctive neighbourhood: vertex y is in the neighbourhood of x iff geodesic(x, y) ≤ ε₁ AND d_temporal(h_x, h_y) ≤ ε₂.

Parameters:

H (ndarray, shape (N, T))
adjacency (sparse, shape (N, N))
eps_spatial (float or None) – Geodesic distance threshold. Auto-set from median if None.
eps_temporal (float or None) – Temporal distance threshold. Auto-set from median if None.
min_pts (int)
temporal_metric (str)
gamma_dtw (float)
backend (str)

Returns:

ClusterResult

Return type:

ClusterResult

spectralbrain.statistics.clustering.cluster_spectral_coclustering(H, *, n_clusters=6, adjacency=None, laplacian_smoothing=5.0)[source]#

Spectral co-clustering of the vertex × time/energy matrix.

Simultaneously clusters rows (vertices) and columns (scales), revealing which spatial regions share which scale bands.

Parameters:

H (ndarray, shape (N, T)) – Non-negative descriptor matrix.
n_clusters (int) – Number of co-clusters.
adjacency (sparse or None) – If provided, applies Laplacian smoothing post-hoc.
laplacian_smoothing (float) – Tikhonov regularisation weight μ for spatial coherence.

Returns:

ClusterResult – With column_labels (scale clustering) in metadata.

Return type:

ClusterResult

spectralbrain.statistics.clustering.cluster_temporal_dtw(H, *, n_clusters=6, metric='softdtw', gamma=0.1, random_state=42)[source]#

Time-series k-means with DTW on HKS/WKS profiles.

Parameters:

H (ndarray, shape (N, T)) – Descriptor profiles.
n_clusters (int)
metric (str) – "dtw", "softdtw", or "euclidean".
gamma (float) – Smoothing for soft-DTW (smaller = sharper).
random_state (int)

Returns:

ClusterResult – With barycenters in metadata.

Return type:

ClusterResult

spectralbrain.statistics.clustering.cluster_temporal_fpca(H, *, n_components=6, n_clusters=6, clusterer='kmeans', random_state=42)[source]#

Functional PCA on HKS time-profiles, then cluster fPC scores.

Treats each vertex’s HKS(x, ·) as a function of log(t) and performs fPCA to extract the dominant modes of variation, then clusters in the score space.

Parameters:

H (ndarray, shape (N, T)) – Per-vertex HKS / WKS profiles across T scales.
n_components (int) – Number of functional principal components.
n_clusters (int) – For k-means; ignored if clusterer="hdbscan".
clusterer (str) – "kmeans" or "hdbscan".
random_state (int)

Returns:

ClusterResult – With fpc_scores, explained_variance_ratio in metadata.

Return type:

ClusterResult

spectralbrain.statistics.clustering.cluster_tensor_decomposition(tensor, *, n_components=8, adjacency=None, lam_spatial=0.0, method='cp', n_iter_max=200, tol=1e-06, random_state=42, backend='auto')[source]#

Non-negative CP/PARAFAC or Tucker on (vertices × scales × subjects).

For a cohort of S subjects with vertex-corresponded meshes (e.g. via HippUnfold), the HKS data forms a 3-way tensor ℋ ∈ ℝ^{N × T × S}. CP decomposes it as

\[\mathcal{H}_{ijk} \approx \sum_{r=1}^R w_r(i) \cdot f_r(j) \cdot s_r(k)\]

yielding shared spatial atoms w_r (which define a parcellation), population-level temporal profiles f_r, and per-subject loadings s_r.

Parameters:

tensor (ndarray, shape (N, T, S)) – Non-negative tensor (HKS across subjects).
n_components (int) – CP rank R (≈ expected number of parcels).
adjacency (sparse or None) – Mesh Laplacian for graph-regularised variant.
lam_spatial (float) – Weight for the Laplacian penalty on spatial factors. 0.0 disables spatial regularisation.
method (str) – "cp" for CP/PARAFAC, "tucker" for Tucker.
n_iter_max (int)
tol (float)
random_state (int)
backend (str) – "gpu" uses tensorly with PyTorch backend.

Returns:

TensorDecompositionResult

Return type:

TensorDecompositionResult

References

Kolda TG, Bader BW. Tensor decompositions and applications.: SIAM Review 51(3):455–500, 2009.

spectralbrain.statistics.clustering.cluster_vineyards(H, adjacency, *, n_scales=None, min_persistence_frac=0.1, min_life_frac=0.3, backend='manual')[source]#

Track persistence diagram points across HKS time-scales.

For each column t_j of H (a fixed HKS scale), computes the H₀ sub-level-set persistence diagram of HKS(·, t_j) on the mesh. Then links diagram points across consecutive scales by nearest- neighbour matching in (birth, death) space, producing continuous “vines” — trajectories of topological features through scale.

Features that persist across a large fraction of the t-range correspond to anatomically stable sub-regions; features that appear or disappear at specific scales reveal scale-dependent structural boundaries.

Parameters:

H (ndarray, shape (N, T)) – Per-vertex descriptor matrix (HKS at T time-scales).
adjacency (sparse, shape (N, N)) – Mesh adjacency.
n_scales (int or None) – Sub-sample to this many scales (for speed). None = use all T.
min_persistence_frac (float) – Minimum persistence as fraction of the function range to retain a feature in the diagram.
min_life_frac (float) – A vine must span at least this fraction of total scales to be considered salient.
backend (str) – "dionysus" uses the dionysus2 library (faster, exact). "manual" uses built-in union-find (no extra dependency).

Returns:

VineyardResult – With vines, diagrams per scale, salient features, and scale-of-emergence per feature.

Return type:

VineyardResult

References

Cohen-Steiner D, Edelsbrunner H, Morozov D. Vines and vineyards: by updating persistence in linear time. Proc. 22nd ACM Symp. Computational Geometry, 119–126, 2006.

spectralbrain.statistics.clustering.cluster_wavelet_coefficients(H, adjacency, *, n_scales=5, n_clusters=6, wavelet_type='mexican_hat', n_eigenvectors=100, clusterer='kmeans', random_state=42, backend='auto')[source]#

Cluster vertices by spectral graph wavelet energy profiles.

Decomposes HKS into band-pass components using spectral graph wavelets (Hammond, Vandergheynst & Gribonval, 2011), computes the energy in each band at each vertex, and clusters vertices by their multi-band energy signature.

Unlike raw HKS (which is a low-pass filter at each t), wavelet decomposition provides orthogonal band-pass filters, so features at different scales do not leak into each other.

Parameters:

H (ndarray, shape (N, T)) – Descriptor matrix.
adjacency (sparse, shape (N, N)) – Mesh Laplacian.
n_scales (int) – Number of wavelet scales (frequency bands).
n_clusters (int) – For k-means.
wavelet_type (str) – "mexican_hat" (g(x) = x·exp(-x)), "heat" (exp(-x)), "meyer" (smooth band-pass).
n_eigenvectors (int) – Laplacian eigenvectors for spectral acceleration.
clusterer (str)
random_state (int)
backend (str)

Returns:

ClusterResult – With wavelet_energy matrix in metadata.

Return type:

ClusterResult

References

Hammond DK, Vandergheynst P, Gribonval R. Wavelets on graphs via: spectral graph theory. ACHA 30(2):129–150, 2011.

spectralbrain.statistics.clustering.confirm_clusters_bayesian(H, labels, *, adjacency=None, mrf_beta=1.0, n_samples=2000, n_tune=1000, dim_reduction=8, random_state=42)[source]#

Bayesian confirmation of cluster assignments.

Fits a Bayesian Gaussian mixture model with cluster-specific priors informed by the input labels, plus an optional Potts MRF spatial prior from the mesh adjacency. Compares MAP assignments to input labels and computes model quality (WAIC, LOO).

This answers the question: “Are these clusters statistically credible given the data and the spatial structure?”

Parameters:

H (ndarray, shape (N, T)) – Descriptor matrix.
labels (ndarray, shape (N,)) – Input cluster labels to confirm.
adjacency (sparse or None) – Mesh adjacency for MRF prior.
mrf_beta (float) – Potts coupling strength.
n_samples (int) – MCMC posterior samples.
n_tune (int) – MCMC tuning samples.
dim_reduction (int) – Reduce to this many dimensions before modelling.
random_state (int)

Returns:

BayesianClusterConfirmation

Return type:

BayesianClusterConfirmation

spectralbrain.statistics.clustering.denoise_joint_timevertex(H, adjacency, *, alpha_graph=1.0, beta_time=1.0, n_eigenvectors=50)[source]#

Joint time-vertex low-pass filtering of a descriptor matrix.

Applies a separable filter in the graph-spectral and temporal- spectral domains simultaneously:

\[g(\lambda, \omega) = \exp(-\alpha \lambda - \beta \omega^2)\]

This smooths H in both mesh-space (removing high-frequency geometric noise) and time-space (removing scale-to-scale oscillations), producing a cleaner descriptor for downstream clustering.

Parameters:

H (ndarray, shape (N, T)) – Descriptor matrix.
adjacency (sparse, shape (N, N)) – Mesh Laplacian (should be positive semi-definite).
alpha_graph (float) – Graph-spectral smoothing strength.
beta_time (float) – Temporal smoothing strength.
n_eigenvectors (int) – Number of graph Laplacian eigenvectors to use.

Returns:

ndarray, shape (N, T) – Filtered descriptor matrix.

Return type:

NDArray[floating]

References

Grassi F, Loukas A, Perraudin N, Ricaud B. A time-vertex signal: processing framework. IEEE Trans. Signal Processing 66(3): 817–829, 2018.

spectralbrain.statistics.clustering.find2(parent, x)[source]#: Path-compressed find for union-find.

spectralbrain.statistics.clustering.fuse_concatenate(hks, wks, *, log_transform=True, normalize='l1', weight_hks=1.0, weight_wks=1.0)[source]#

Simple weighted concatenation of HKS and WKS.

Parameters:

hks (ndarray, shape (N, T_h))
wks (ndarray, shape (N, T_w))
log_transform (bool)
normalize (str)
weight_hks (float)
weight_wks (float)

Returns:

FusionResult

Return type:

FusionResult

spectralbrain.statistics.clustering.fuse_joint_nmf(hks, wks, *, n_components=16, random_state=42)[source]#

Joint NMF on concatenated [HKS | WKS] for shared basis.

Learns a shared spatial factor W and separate temporal factors F_hks, F_wks such that [HKS | WKS] ≈ W · [F_hks | F_wks].

Parameters:

hks (ndarray, shape (N, T_h))
wks (ndarray, shape (N, T_w))
n_components (int)
random_state (int)

Returns:

FusionResult – fused is the W matrix (shared spatial loadings).

Return type:

FusionResult

spectralbrain.statistics.clustering.fuse_multi_kernel(hks, wks, *, n_kernels_per_desc=5, sigma_range=(0.1, 10.0))[source]#

Multi-kernel fusion: build a combined kernel from HKS and WKS.

Computes K = Σ_j α_j K^hks_j + Σ_k β_k K^wks_k with uniform weights (simpleMKL optimisation is available as extension).

Parameters:

hks (ndarray, shape (N, T_h))
wks (ndarray, shape (N, T_w))
n_kernels_per_desc (int) – Number of bandwidth samples per descriptor.
sigma_range (tuple) – (min, max) bandwidth range.

Returns:

FusionResult – fused is the combined kernel matrix K, shape (N, N).

Return type:

FusionResult