spectralbrain.statistics.eda#

Exploratory data analysis and quality control for spectral morphometry.

Five diagnostic blocks plus the descriptor recommendation engine:

Spectral QC — validate eigendecomposition quality.
Optimal k — how many eigenpairs are enough?
Descriptor profiling — summary statistics, normality, outliers.
Reliability — ICC test-retest, batch-effect detection.
Report — integrated markdown/Rich output.
recommend_descriptor() — surrogate-based descriptor selection.

Functions

`batch_effect_scan`(descriptors, site_labels, *)	Scan for batch/site effects in spectral descriptors.
`compute_icc`(test, retest, *[, icc_type])	Intraclass Correlation Coefficient for test-retest.
`descriptor_correlation`(descriptors, *[, method])	Correlation matrix between descriptors (redundancy check).
`descriptor_profile`(descriptors, *[, ...])	Summary statistics for each descriptor.
`eigenvalue_stability`(decomps, *[, n_eigenvalues])	Cross-subject eigenvalue stability analysis.
`optimal_k`(eigenvalues, *[, energy_thresholds])	Determine optimal number of eigenpairs.
`recommend_descriptor`(points[, labels, ...])	Recommend the best spectral descriptor for an analysis objective.
`spectral_qc`(decomp, *[, lambda_0_tol, ...])	Run quality-control diagnostics on a spectral decomposition.

Classes

`DescriptorRecommendation`(recommended, ...)	Output of `recommend_descriptor()`.
`OptimalKResult`([k_elbow, k_energy_95, ...])	Recommended number of eigenpairs by multiple criteria.
`SpectralQCReport`([n_vertices, ...])	Quality-control diagnostics for a spectral decomposition.

class spectralbrain.statistics.eda.DescriptorRecommendation(recommended, objective, ranking, surrogate_details)[source]#

Bases: object

Output of recommend_descriptor().

Parameters:

recommended (str)
objective (str)
ranking (list[dict[str, Any]])
surrogate_details (dict[str, Any])

recommended#

Name of the top-ranked descriptor.

Type:: str

objective#

The analysis objective used.

Type:: str

ranking#

Top descriptors with scores and metrics.

Type:: list of dict

surrogate_details#

Information about the surrogates generated.

Type:: dict

objective: str#

ranking: list[dict[str, Any]]#

recommended: str#

surrogate_details: dict[str, Any]#

class spectralbrain.statistics.eda.OptimalKResult(k_elbow=0, k_energy_95=0, k_energy_99=0, k_gap=0, k_recommended=0, eigenvalues=None, cumulative_energy=None)[source]#

Bases: object

Recommended number of eigenpairs by multiple criteria.

Parameters:

k_elbow (int)
k_energy_95 (int)
k_energy_99 (int)
k_gap (int)
k_recommended (int)
eigenvalues (ndarray | None)
cumulative_energy (ndarray | None)

cumulative_energy: ndarray | None = None#

eigenvalues: ndarray | None = None#

k_elbow: int = 0#

k_energy_95: int = 0#

k_energy_99: int = 0#

k_gap: int = 0#

k_recommended: int = 0#

class spectralbrain.statistics.eda.SpectralQCReport(n_vertices=0, n_eigenvalues=0, lambda_0=0.0, lambda_0_ok=True, fiedler_value=0.0, spectral_gap=0.0, eigenvalues_nonneg=True, n_negative_eigenvalues=0, max_negative_eigenvalue=0.0, orthonormality_error=0.0, orthonormality_ok=True, laplacian_row_sum_max=0.0, laplacian_row_sum_ok=True, near_degenerate_pairs=0, recommended_k=None, warnings=<factory>, passed=True)[source]#

Bases: object

Quality-control diagnostics for a spectral decomposition.

All fields are populated by spectral_qc().

Parameters:

n_vertices (int)
n_eigenvalues (int)
lambda_0 (float)
lambda_0_ok (bool)
fiedler_value (float)
spectral_gap (float)
eigenvalues_nonneg (bool)
n_negative_eigenvalues (int)
max_negative_eigenvalue (float)
orthonormality_error (float)
orthonormality_ok (bool)
laplacian_row_sum_max (float)
laplacian_row_sum_ok (bool)
near_degenerate_pairs (int)
recommended_k (int | None)
warnings (list[str])
passed (bool)

eigenvalues_nonneg: bool = True#

fiedler_value: float = 0.0#

lambda_0: float = 0.0#

lambda_0_ok: bool = True#

laplacian_row_sum_max: float = 0.0#

laplacian_row_sum_ok: bool = True#

max_negative_eigenvalue: float = 0.0#

n_eigenvalues: int = 0#

n_negative_eigenvalues: int = 0#

n_vertices: int = 0#

near_degenerate_pairs: int = 0#

orthonormality_error: float = 0.0#

orthonormality_ok: bool = True#

passed: bool = True#

recommended_k: int | None = None#

spectral_gap: float = 0.0#

warnings: list[str]#

spectralbrain.statistics.eda.batch_effect_scan(descriptors, site_labels, *, alpha=0.05)[source]#

Scan for batch/site effects in spectral descriptors.

For each descriptor, tests whether distributions differ significantly across sites using Kruskal-Wallis.

Parameters:

descriptors (dict of {name: ndarray}) – Per-subject descriptor values.
site_labels (ndarray, shape (n_subjects,)) – Site/scanner labels.
alpha (float) – Significance threshold.

Returns:

dict of {name ({statistic, p_value, has_batch_effect, effect_size}})

Return type:

dict[str, dict[str, Any]]

spectralbrain.statistics.eda.compute_icc(test, retest, *, icc_type='ICC3,1')[source]#

Intraclass Correlation Coefficient for test-retest.

Parameters:

test (ndarray, shape (N,) or (N, T)) – Descriptor values at time 1.
retest (ndarray, shape (N,) or (N, T)) – Descriptor values at time 2.
icc_type (str) – "ICC2,1" — two-way random, single measures. "ICC3,1" — two-way mixed, single measures (recommended for neuroimaging).

Returns:

float – ICC value in [-1, 1]. >0.75 = excellent, 0.60–0.75 = good, 0.40–0.60 = fair, <0.40 = poor.

Return type:

float

spectralbrain.statistics.eda.descriptor_correlation(descriptors, *, method='pearson')[source]#

Correlation matrix between descriptors (redundancy check).

For multi-column descriptors, uses the mean across columns.

Parameters:

descriptors (dict of {name: ndarray})
method (str)

Returns:

corr_matrix (ndarray, shape (D, D))
names (list of str)

Return type:

tuple[ndarray, list[str]]

spectralbrain.statistics.eda.descriptor_profile(descriptors, *, normality_samples=500, seed=None)[source]#

Summary statistics for each descriptor.

Parameters:

descriptors (dict of {name: ndarray}) – Descriptor arrays (any shape — handles ScalarMap, DescriptorMatrix, GlobalDescriptor).
normality_samples (int) – Subsample size for Shapiro-Wilk test.
seed (int, optional)

Returns:

dict of {name ({stat: value}}) – Keys per descriptor: mean, std, min, max, skew, kurtosis, q25, q50, q75, shapiro_p, n_outliers_3sigma, shape.

Return type:

dict[str, dict[str, Any]]

spectralbrain.statistics.eda.eigenvalue_stability(decomps, *, n_eigenvalues=None)[source]#

Cross-subject eigenvalue stability analysis.

Parameters:

decomps (list of SpectralDecomposition) – Decompositions from multiple subjects.
n_eigenvalues (int, optional) – Number of eigenvalues to compare.

Returns:

dict – Keys: "mean", "std", "cv" (coefficient of variation), "eigenvalue_matrix" (subjects × k).

Return type:

dict[str, ndarray]

spectralbrain.statistics.eda.optimal_k(eigenvalues, *, energy_thresholds=(0.95, 0.99))[source]#

Determine optimal number of eigenpairs.

Three criteria: 1. Elbow — maximum curvature of log(λ) vs index. 2. Energy — Σᵢλᵢ / Σλ > threshold. 3. Max gap — largest relative gap between consecutive λ.

Parameters:

eigenvalues (ndarray) – Full eigenvalue sequence.
energy_thresholds (tuple of float) – Thresholds for cumulative energy (default 95% and 99%).

Returns:

OptimalKResult

Return type:

OptimalKResult

spectralbrain.statistics.eda.recommend_descriptor(points, labels=None, objective='group_discrimination', *, n_surrogates=30, k_eigenpairs=30, n_jobs=1, seed=42)[source]#

Recommend the best spectral descriptor for an analysis objective.

Generates synthetic surrogates with controlled deformations, computes all eligible descriptors, evaluates each descriptor’s discriminative power, and ranks by consensus.

Parameters:

points (ndarray, shape (N, 3)) – Representative geometry (e.g. mean mesh vertices, or one subject’s point cloud).
labels (ndarray, optional) – Not used by the surrogate engine (surrogates generate their own labels). Reserved for future data-driven evaluation.
objective (str or AnalysisObjective) – Analysis goal. Determines eligible descriptors and surrogate deformation type.
n_surrogates (int) – Number of synthetic shapes to generate.
k_eigenpairs (int) – Eigenpairs per surrogate decomposition.
n_jobs (int) – Number of parallel workers for surrogate decomposition. 1 = sequential (default), -1 = all cores. Requires joblib when > 1.
seed (int, optional) – RNG seed for reproducibility.

Returns:

DescriptorRecommendation – Contains .recommended, .ranking (top descriptors with AUC, accuracy, effect size), and .surrogate_details.

Return type:

DescriptorRecommendation

Notes

This function is computationally heavy (30 surrogates × k eigenpairs × all descriptors by default). For large meshes, consider using n_jobs=-1 to parallelise the surrogate decomposition across CPU cores.

Examples

>>> rec = sb.statistics.recommend_descriptor(
...     mesh.vertices,
...     objective="group_discrimination",
...     n_jobs=-1,
... )
>>> print(rec.recommended)
'wks'
>>> print(rec.ranking[:3])

spectralbrain.statistics.eda.spectral_qc(decomp, *, lambda_0_tol=0.0001, ortho_tol=0.001, row_sum_tol=0.01, degeneracy_tol=1e-06)[source]#

Run quality-control diagnostics on a spectral decomposition.

Parameters:

decomp (SpectralDecomposition)
lambda_0_tol (float) – Tolerance for λ₀ ≈ 0.
ortho_tol (float) – Tolerance for M-orthonormality of eigenvectors.
row_sum_tol (float) – Tolerance for Laplacian row-sum ≈ 0.
degeneracy_tol (float) – Relative gap below which eigenvalue pairs are flagged as near-degenerate.

Returns:

SpectralQCReport

Return type:

SpectralQCReport