spectralbrain.statistics.eda#

Exploratory data analysis and quality control for spectral morphometry.

Five diagnostic blocks plus the descriptor recommendation engine:

  1. Spectral QC — validate eigendecomposition quality.

  2. Optimal k — how many eigenpairs are enough?

  3. Descriptor profiling — summary statistics, normality, outliers.

  4. Reliability — ICC test-retest, batch-effect detection.

  5. Report — integrated markdown/Rich output.

  6. recommend_descriptor() — surrogate-based descriptor selection.

Functions

batch_effect_scan(descriptors, site_labels, *)

Scan for batch/site effects in spectral descriptors.

compute_icc(test, retest, *[, icc_type])

Intraclass Correlation Coefficient for test-retest.

descriptor_correlation(descriptors, *[, method])

Correlation matrix between descriptors (redundancy check).

descriptor_profile(descriptors, *[, ...])

Summary statistics for each descriptor.

eigenvalue_stability(decomps, *[, n_eigenvalues])

Cross-subject eigenvalue stability analysis.

optimal_k(eigenvalues, *[, energy_thresholds])

Determine optimal number of eigenpairs.

recommend_descriptor(points[, labels, ...])

Recommend the best spectral descriptor for an analysis objective.

spectral_qc(decomp, *[, lambda_0_tol, ...])

Run quality-control diagnostics on a spectral decomposition.

Classes

DescriptorRecommendation(recommended, ...)

Output of recommend_descriptor().

OptimalKResult([k_elbow, k_energy_95, ...])

Recommended number of eigenpairs by multiple criteria.

SpectralQCReport([n_vertices, ...])

Quality-control diagnostics for a spectral decomposition.

class spectralbrain.statistics.eda.DescriptorRecommendation(recommended, objective, ranking, surrogate_details)[source]#

Bases: object

Output of recommend_descriptor().

Parameters:
recommended#

Name of the top-ranked descriptor.

Type:

str

objective#

The analysis objective used.

Type:

str

ranking#

Top descriptors with scores and metrics.

Type:

list of dict

surrogate_details#

Information about the surrogates generated.

Type:

dict

objective: str#
ranking: list[dict[str, Any]]#
recommended: str#
surrogate_details: dict[str, Any]#
class spectralbrain.statistics.eda.OptimalKResult(k_elbow=0, k_energy_95=0, k_energy_99=0, k_gap=0, k_recommended=0, eigenvalues=None, cumulative_energy=None)[source]#

Bases: object

Recommended number of eigenpairs by multiple criteria.

Parameters:
  • k_elbow (int)

  • k_energy_95 (int)

  • k_energy_99 (int)

  • k_gap (int)

  • k_recommended (int)

  • eigenvalues (ndarray | None)

  • cumulative_energy (ndarray | None)

cumulative_energy: ndarray | None = None#
eigenvalues: ndarray | None = None#
k_elbow: int = 0#
k_energy_95: int = 0#
k_energy_99: int = 0#
k_gap: int = 0#
class spectralbrain.statistics.eda.SpectralQCReport(n_vertices=0, n_eigenvalues=0, lambda_0=0.0, lambda_0_ok=True, fiedler_value=0.0, spectral_gap=0.0, eigenvalues_nonneg=True, n_negative_eigenvalues=0, max_negative_eigenvalue=0.0, orthonormality_error=0.0, orthonormality_ok=True, laplacian_row_sum_max=0.0, laplacian_row_sum_ok=True, near_degenerate_pairs=0, recommended_k=None, warnings=<factory>, passed=True)[source]#

Bases: object

Quality-control diagnostics for a spectral decomposition.

All fields are populated by spectral_qc().

Parameters:
  • n_vertices (int)

  • n_eigenvalues (int)

  • lambda_0 (float)

  • lambda_0_ok (bool)

  • fiedler_value (float)

  • spectral_gap (float)

  • eigenvalues_nonneg (bool)

  • n_negative_eigenvalues (int)

  • max_negative_eigenvalue (float)

  • orthonormality_error (float)

  • orthonormality_ok (bool)

  • laplacian_row_sum_max (float)

  • laplacian_row_sum_ok (bool)

  • near_degenerate_pairs (int)

  • recommended_k (int | None)

  • warnings (list[str])

  • passed (bool)

eigenvalues_nonneg: bool = True#
fiedler_value: float = 0.0#
lambda_0: float = 0.0#
lambda_0_ok: bool = True#
laplacian_row_sum_max: float = 0.0#
laplacian_row_sum_ok: bool = True#
max_negative_eigenvalue: float = 0.0#
n_eigenvalues: int = 0#
n_negative_eigenvalues: int = 0#
n_vertices: int = 0#
near_degenerate_pairs: int = 0#
orthonormality_error: float = 0.0#
orthonormality_ok: bool = True#
passed: bool = True#
recommended_k: int | None = None#
spectral_gap: float = 0.0#
warnings: list[str]#
spectralbrain.statistics.eda.batch_effect_scan(descriptors, site_labels, *, alpha=0.05)[source]#

Scan for batch/site effects in spectral descriptors.

For each descriptor, tests whether distributions differ significantly across sites using Kruskal-Wallis.

Parameters:
  • descriptors (dict of {name: ndarray}) – Per-subject descriptor values.

  • site_labels (ndarray, shape (n_subjects,)) – Site/scanner labels.

  • alpha (float) – Significance threshold.

Returns:

dict of {name ({statistic, p_value, has_batch_effect, effect_size}})

Return type:

dict[str, dict[str, Any]]

spectralbrain.statistics.eda.compute_icc(test, retest, *, icc_type='ICC3,1')[source]#

Intraclass Correlation Coefficient for test-retest.

Parameters:
  • test (ndarray, shape (N,) or (N, T)) – Descriptor values at time 1.

  • retest (ndarray, shape (N,) or (N, T)) – Descriptor values at time 2.

  • icc_type (str) – "ICC2,1" — two-way random, single measures. "ICC3,1" — two-way mixed, single measures (recommended for neuroimaging).

Returns:

float – ICC value in [-1, 1]. >0.75 = excellent, 0.60–0.75 = good, 0.40–0.60 = fair, <0.40 = poor.

Return type:

float

spectralbrain.statistics.eda.descriptor_correlation(descriptors, *, method='pearson')[source]#

Correlation matrix between descriptors (redundancy check).

For multi-column descriptors, uses the mean across columns.

Parameters:
  • descriptors (dict of {name: ndarray})

  • method (str)

Returns:

  • corr_matrix (ndarray, shape (D, D))

  • names (list of str)

Return type:

tuple[ndarray, list[str]]

spectralbrain.statistics.eda.descriptor_profile(descriptors, *, normality_samples=500, seed=None)[source]#

Summary statistics for each descriptor.

Parameters:
  • descriptors (dict of {name: ndarray}) – Descriptor arrays (any shape — handles ScalarMap, DescriptorMatrix, GlobalDescriptor).

  • normality_samples (int) – Subsample size for Shapiro-Wilk test.

  • seed (int, optional)

Returns:

dict of {name ({stat: value}}) – Keys per descriptor: mean, std, min, max, skew, kurtosis, q25, q50, q75, shapiro_p, n_outliers_3sigma, shape.

Return type:

dict[str, dict[str, Any]]

spectralbrain.statistics.eda.eigenvalue_stability(decomps, *, n_eigenvalues=None)[source]#

Cross-subject eigenvalue stability analysis.

Parameters:
  • decomps (list of SpectralDecomposition) – Decompositions from multiple subjects.

  • n_eigenvalues (int, optional) – Number of eigenvalues to compare.

Returns:

dict – Keys: "mean", "std", "cv" (coefficient of variation), "eigenvalue_matrix" (subjects × k).

Return type:

dict[str, ndarray]

spectralbrain.statistics.eda.optimal_k(eigenvalues, *, energy_thresholds=(0.95, 0.99))[source]#

Determine optimal number of eigenpairs.

Three criteria: 1. Elbow — maximum curvature of log(λ) vs index. 2. Energy — Σᵢλᵢ / Σλ > threshold. 3. Max gap — largest relative gap between consecutive λ.

Parameters:
  • eigenvalues (ndarray) – Full eigenvalue sequence.

  • energy_thresholds (tuple of float) – Thresholds for cumulative energy (default 95% and 99%).

Returns:

OptimalKResult

Return type:

OptimalKResult

spectralbrain.statistics.eda.recommend_descriptor(points, labels=None, objective='group_discrimination', *, n_surrogates=30, k_eigenpairs=30, n_jobs=1, seed=42)[source]#

Recommend the best spectral descriptor for an analysis objective.

Generates synthetic surrogates with controlled deformations, computes all eligible descriptors, evaluates each descriptor’s discriminative power, and ranks by consensus.

Parameters:
  • points (ndarray, shape (N, 3)) – Representative geometry (e.g. mean mesh vertices, or one subject’s point cloud).

  • labels (ndarray, optional) – Not used by the surrogate engine (surrogates generate their own labels). Reserved for future data-driven evaluation.

  • objective (str or AnalysisObjective) – Analysis goal. Determines eligible descriptors and surrogate deformation type.

  • n_surrogates (int) – Number of synthetic shapes to generate.

  • k_eigenpairs (int) – Eigenpairs per surrogate decomposition.

  • n_jobs (int) – Number of parallel workers for surrogate decomposition. 1 = sequential (default), -1 = all cores. Requires joblib when > 1.

  • seed (int, optional) – RNG seed for reproducibility.

Returns:

DescriptorRecommendation – Contains .recommended, .ranking (top descriptors with AUC, accuracy, effect size), and .surrogate_details.

Return type:

DescriptorRecommendation

Notes

This function is computationally heavy (30 surrogates × k eigenpairs × all descriptors by default). For large meshes, consider using n_jobs=-1 to parallelise the surrogate decomposition across CPU cores.

Examples

>>> rec = sb.statistics.recommend_descriptor(
...     mesh.vertices,
...     objective="group_discrimination",
...     n_jobs=-1,
... )
>>> print(rec.recommended)
'wks'
>>> print(rec.ranking[:3])
spectralbrain.statistics.eda.spectral_qc(decomp, *, lambda_0_tol=0.0001, ortho_tol=0.001, row_sum_tol=0.01, degeneracy_tol=1e-06)[source]#

Run quality-control diagnostics on a spectral decomposition.

Parameters:
  • decomp (SpectralDecomposition)

  • lambda_0_tol (float) – Tolerance for λ₀ ≈ 0.

  • ortho_tol (float) – Tolerance for M-orthonormality of eigenvectors.

  • row_sum_tol (float) – Tolerance for Laplacian row-sum ≈ 0.

  • degeneracy_tol (float) – Relative gap below which eigenvalue pairs are flagged as near-degenerate.

Returns:

SpectralQCReport

Return type:

SpectralQCReport