spectralbrain.statistics.normative#

Normative modeling, harmonization, and non-inferiority testing.

Build age-/sex-stratified normative distributions of spectral descriptors from healthy reference cohorts, score individual patients against the normative, and formally test whether spectral descriptors are non-inferior to conventional morphometrics.

Sections#

§1 ComBat / ComBat-GAM harmonization §2 NormativeModel — build, evaluate, persist §3 Centile curves — age-trajectory percentile charts §4 Individual deviation scoring §5 Non-inferiority & equivalence testing (TOST, AUC comparison) §6 Method comparison — spectral vs volumetric discrimination

Functions

auc_comparison_delong(y_true, scores_new, ...)

DeLong test for two correlated (paired) ROC AUCs.

centile_curves(descriptors, ages, *[, ...])

Compute age-binned centile curves for a descriptor.

compare_methods(y_true, features_new, ...[, ...])

Full head-to-head comparison between two methods.

equivalence_test_tost(metric_new, ...[, ...])

Two One-Sided Tests (TOST) for equivalence.

extreme_value_map(z_scores, *[, threshold])

Binary map of extreme deviations: +1 above, -1 below, 0 within.

harmonize(data, sites, *[, method])

Unified harmonization interface dispatching to ComBat or ComBat-GAM.

harmonize_combat(data, sites, *[, ...])

Remove multi-site batch effects using ComBat (Johnson et al., 2007).

harmonize_combat_gam(data, sites, *[, ...])

Remove multi-site batch effects using ComBat-GAM (Pomponio et al., 2020).

non_inferiority_test(metric_new, ...[, ...])

Non-inferiority test for method comparison.

z_score_map(subject, normative_mean, ...)

Simple z-score map (no covariates).

Classes

HarmonizationResult(data_harmonized, method, ...)

Result container for ComBat / ComBat-GAM harmonization.

MethodComparisonResult(method_new, ...)

Comprehensive comparison between two methods.

NonInferiorityResult(test_type, metric_new, ...)

Result of a non-inferiority or equivalence test.

NormativeModel([method])

Age- and sex-stratified normative distribution of descriptors.

class spectralbrain.statistics.normative.HarmonizationResult(data_harmonized, method, sites, n_sites, site_counts, estimates=<factory>)[source]#

Bases: object

Result container for ComBat / ComBat-GAM harmonization.

Parameters:
data_harmonized#

Harmonized data matrix, shape (n_samples, n_features).

Type:

np.ndarray

method#

Method used: "combat" or "combat_gam".

Type:

str

sites#

Original site labels.

Type:

np.ndarray

n_sites#

Number of unique sites.

Type:

int

site_counts#

Per-site sample counts.

Type:

dict

estimates#

Estimated batch parameters (gamma, delta, etc.) for reproducibility and inspection.

Type:

dict

data_harmonized: ndarray#
estimates: dict[str, Any]#
method: str#
n_sites: int#
site_counts: dict[str, int]#
sites: ndarray#
class spectralbrain.statistics.normative.MethodComparisonResult(method_new, method_reference, auc_new, auc_reference, auc_p_value, non_inferiority, equivalence, effect_size_new, effect_size_reference)[source]#

Bases: object

Comprehensive comparison between two methods.

Parameters:
auc_new: float#
auc_p_value: float#
auc_reference: float#
effect_size_new: float#
effect_size_reference: float#
equivalence: NonInferiorityResult#
method_new: str#
method_reference: str#
non_inferiority: NonInferiorityResult#
class spectralbrain.statistics.normative.NonInferiorityResult(test_type, metric_new, metric_reference, margin, difference, ci_lower, ci_upper, p_value, is_non_inferior, is_equivalent=False)[source]#

Bases: object

Result of a non-inferiority or equivalence test.

Parameters:
test_type, metric_new, metric_reference, margin, difference,
ci_lower, ci_upper, p_value, is_non_inferior, is_equivalent.
ci_lower: float#
ci_upper: float#
difference: float#
is_equivalent: bool = False#
is_non_inferior: bool#
margin: float#
metric_new: float#
metric_reference: float#
p_value: float#
test_type: str#
class spectralbrain.statistics.normative.NormativeModel(method='gaussian')[source]#

Bases: object

Age- and sex-stratified normative distribution of descriptors.

Fits a normative model on a healthy reference cohort and scores individuals against it. Supports parametric (Gaussian) and non-parametric (percentile) scoring.

Parameters:

method (str) – "gaussian" | "centile" | "gp".

Examples

>>> norm = NormativeModel(method="gaussian")
>>> norm.fit(descriptors_controls, ages=ages, sex=sex)
>>> z = norm.score(descriptor_patient, age=45, sex=1)
extreme_count(z_scores, threshold=2.0)[source]#

Count extreme deviations in a z-score map.

Parameters:
  • z_scores (ndarray, shape (N,))

  • threshold (float)

Returns:

dict

Return type:

dict[str, Any]

fit(descriptors, *, ages=None, sex=None, sites=None, harmonize_method=None, harmonize_kwargs=None)[source]#

Fit the normative model on a healthy reference cohort.

Parameters:
  • descriptors (ndarray, shape (S, N) or (S, d)) – Per-subject descriptor values.

  • ages (ndarray, shape (S,), optional) – Ages in years (enables age conditioning).

  • sex (ndarray, shape (S,), optional) – Biological sex (0/1).

  • sites (ndarray, shape (S,), optional) – Site labels. Used with harmonize_method.

  • harmonize_method (str, optional) – "combat" or "combat_gam" – harmonize before fitting.

  • harmonize_kwargs (dict, optional) – Extra kwargs forwarded to the harmonization function.

Returns:

self

Return type:

NormativeModel

save(path)[source]#

Save normative model to HDF5.

Parameters:

path (str or Path)

Returns:

Path

Return type:

Path

score(descriptor, *, age=None, sex=None)[source]#

Score an individual against the normative.

Parameters:
  • descriptor (ndarray, shape (N,) or (d,)) – Individual’s descriptor values.

  • age (float, optional)

  • sex (int, optional)

Returns:

ndarray – Z-scores (Gaussian) or percentiles (centile).

Return type:

ndarray

score_batch(descriptors, *, ages=None, sex=None)[source]#

Score multiple individuals.

Parameters:
  • descriptors (ndarray, shape (S, N))

  • ages (ndarray, shape (S,), optional)

  • sex (ndarray, shape (S,), optional)

Returns:

ndarray, shape (S, N)

Return type:

ndarray

spectralbrain.statistics.normative.auc_comparison_delong(y_true, scores_new, scores_reference)[source]#

DeLong test for two correlated (paired) ROC AUCs.

Implements the analytic DeLong test using the fast midrank algorithm of Sun & Xu (2014). It is deterministic (no resampling) and is the standard method for comparing two AUCs computed on the same samples.

Parameters:
  • y_true (ndarray, shape (n,)) – Binary labels (the positive class is the larger value, typically 1).

  • scores_new (ndarray, shape (n,)) – Predicted scores from the two models on the same samples.

  • scores_reference (ndarray, shape (n,)) – Predicted scores from the two models on the same samples.

Returns:

auc_new, auc_ref, p_value (float) – The two AUCs and the two-sided p-value for auc_new == auc_ref.

Return type:

tuple[float, float, float]

References

DeLong ER, DeLong DM, Clarke-Pearson DL. Biometrics 44(3):837–845, 1988. Sun X, Xu W. IEEE Signal Process Lett 21(11):1389–1393, 2014.

spectralbrain.statistics.normative.centile_curves(descriptors, ages, *, percentiles=(2.5, 5, 25, 50, 75, 95, 97.5), n_age_bins=20, smooth=True, smooth_window=3)[source]#

Compute age-binned centile curves for a descriptor.

Parameters:
  • descriptors (ndarray, shape (S,) or (S, N))

  • ages (ndarray, shape (S,))

  • percentiles (sequence of float)

  • n_age_bins (int)

  • smooth (bool)

  • smooth_window (int)

Returns:

dict with "age_centers" and "centiles".

Return type:

dict[str, ndarray]

spectralbrain.statistics.normative.compare_methods(y_true, features_new, features_reference, *, method_new_name='spectral', method_ref_name='volumetric', n_folds=10, margin=0.05, seed=42)[source]#

Full head-to-head comparison between two methods.

Parameters:
  • y_true (ndarray, shape (n,))

  • features_new (ndarray, shape (n, d_new))

  • features_reference (ndarray, shape (n, d_ref))

  • method_new_name (str)

  • method_ref_name (str)

  • n_folds (int)

  • margin (float)

  • seed (int)

Returns:

MethodComparisonResult

Return type:

MethodComparisonResult

spectralbrain.statistics.normative.equivalence_test_tost(metric_new, metric_reference, *, margin=0.05, alpha=0.05)[source]#

Two One-Sided Tests (TOST) for equivalence.

Parameters:
  • metric_new (ndarray)

  • metric_reference (ndarray)

  • margin (float)

  • alpha (float)

Returns:

NonInferiorityResult

Return type:

NonInferiorityResult

spectralbrain.statistics.normative.extreme_value_map(z_scores, *, threshold=2.0)[source]#

Binary map of extreme deviations: +1 above, -1 below, 0 within.

Parameters:
  • z_scores (ndarray, shape (N,))

  • threshold (float)

Returns:

ndarray, shape (N,), int32

Return type:

ndarray

spectralbrain.statistics.normative.harmonize(data, sites, *, method='combat', **kwargs)[source]#

Unified harmonization interface dispatching to ComBat or ComBat-GAM.

Parameters:
  • data (np.ndarray, shape (n_samples, n_features)) – Data matrix.

  • sites (np.ndarray, shape (n_samples,)) – Site labels.

  • method (str) – "combat" or "combat_gam".

  • **kwargs – Forwarded to the selected harmonization function.

Returns:

HarmonizationResult

Return type:

HarmonizationResult

spectralbrain.statistics.normative.harmonize_combat(data, sites, *, covariates=None, covariate_names=None, empirical_bayes=True, parametric=True, mean_only=False, reference_site=None)[source]#

Remove multi-site batch effects using ComBat (Johnson et al., 2007).

ComBat uses an empirical Bayesian framework to estimate and remove additive and multiplicative batch (site) effects while preserving biological variability associated with covariates of interest.

Parameters:
  • data (np.ndarray, shape (n_samples, n_features)) – Data matrix with samples as rows and features as columns.

  • sites (np.ndarray, shape (n_samples,)) – Site/batch labels for each sample.

  • covariates (np.ndarray, shape (n_samples, n_covariates), optional) – Biological covariates to preserve (e.g., age, sex, diagnosis).

  • covariate_names (list of str, optional) – Names for each covariate column (for logging).

  • empirical_bayes (bool) – If True (default), use empirical Bayes to shrink batch estimates toward the grand mean.

  • parametric (bool) – If True (default), assume parametric priors (inverse-gamma / normal). If False, use non-parametric EB.

  • mean_only (bool) – If True, adjust only the mean (no variance adjustment).

  • reference_site (str, optional) – Harmonize all other sites to match this reference site.

Returns:

HarmonizationResult

Return type:

HarmonizationResult

References

Johnson WE, Li C, Rabinovic A. Adjusting batch effects in

microarray expression data using empirical Bayes methods. Biostatistics 8(1):118-127, 2007.

Fortin J-P et al. Harmonization of multi-site diffusion tensor

imaging data. NeuroImage 161:149-170, 2018.

Examples

>>> result = harmonize_combat(
...     descriptors, sites=site_labels,
...     covariates=np.column_stack([ages, sex]),
... )
>>> harmonized = result.data_harmonized
spectralbrain.statistics.normative.harmonize_combat_gam(data, sites, *, continuous_covariates=None, continuous_names=None, categorical_covariates=None, categorical_names=None, smooth_terms=None, n_splines=10, empirical_bayes=True)[source]#

Remove multi-site batch effects using ComBat-GAM (Pomponio et al., 2020).

Extends ComBat by modeling nonlinear covariate effects using Generalized Additive Models (GAMs) with penalized B-splines.

Parameters:
  • data (np.ndarray, shape (n_samples, n_features)) – Data matrix (samples x features).

  • sites (np.ndarray, shape (n_samples,)) – Site/batch labels.

  • continuous_covariates (np.ndarray, shape (n_samples, n_cont), optional) – Continuous covariates (e.g., age).

  • continuous_names (list of str, optional) – Names for continuous covariates.

  • categorical_covariates (np.ndarray, shape (n_samples, n_cat), optional) – Categorical covariates (e.g., sex, diagnosis).

  • categorical_names (list of str, optional) – Names for categorical covariates.

  • smooth_terms (list of str, optional) – Which continuous covariates to model with splines.

  • n_splines (int) – Number of B-spline basis functions.

  • empirical_bayes (bool) – Use empirical Bayes shrinkage.

Returns:

HarmonizationResult

Return type:

HarmonizationResult

References

Pomponio R et al. Harmonization of large MRI datasets for the

analysis of brain imaging patterns throughout the lifespan. NeuroImage 208:116450, 2020.

spectralbrain.statistics.normative.non_inferiority_test(metric_new, metric_reference, *, margin=0.05, alpha=0.025, paired=True)[source]#

Non-inferiority test for method comparison.

Parameters:
  • metric_new (ndarray)

  • metric_reference (ndarray)

  • margin (float)

  • alpha (float)

  • paired (bool)

Returns:

NonInferiorityResult

Return type:

NonInferiorityResult

spectralbrain.statistics.normative.z_score_map(subject, normative_mean, normative_std)[source]#

Simple z-score map (no covariates).

Parameters:
  • subject (ndarray, shape (N,))

  • normative_mean (ndarray, shape (N,))

  • normative_std (ndarray, shape (N,))

Returns:

ndarray, shape (N,)

Return type:

ndarray[tuple[Any, …], dtype[floating]]