spectralbrain.io.group#

Group-level loading for cohort statistics.

This module closes the gap between per-file I/O and the group-statistics functions in spectralbrain.statistics.analysis. The workflow is:

  1. Discover one file per subject from a BIDS/derivatives tree, a FreeSurfer SUBJECTS_DIR, or an explicit list/dict of paths.

  2. Load every subject in parallel (joblib via spectralbrain.backends.cpu.parallel_map()), fail-soft: a subject that errors is logged and dropped rather than aborting the cohort.

  3. Stack the per-subject arrays into a single (S, N) (or (S, N, T)) array, packaged in a GroupData object that carries subject IDs and parsed BIDS entities (for covariates).

Two loading modes:

  • mode="maps" — load a per-vertex overlay/metric or a precomputed descriptor field that is already vertex-corresponded on a common template (the light path).

  • mode="pipeline" — load each surface, build the Laplace–Beltrami decomposition, and compute a spectral descriptor per subject (the heavy path, where joblib and the GPU backends pay off).

The resulting GroupData plugs straight into group_comparison(), which dispatches to the vertex-wise tests.

Examples

>>> # HippUnfold-style derivatives, descriptor fields already on template:
>>> files = discover_bids(
...     "/data/derivatives/hippunfold",
...     "sub-{sub}/surf/sub-{sub}_hemi-L_*_thickness.shape.gii",
... )
>>> group = load_group(files, mode="maps", n_jobs=8)
>>> res = group_comparison(group, group.covariate("group"), test="ttest")
>>> # Full pipeline from FreeSurfer surfaces, HKS per subject, on GPU:
>>> files = discover_freesurfer("/data/fs", surface="white", hemi="lh")
>>> from spectralbrain.backends import TorchBackend
>>> group = load_group(
...     files, mode="pipeline", descriptor="hks", k=100,
...     backend=TorchBackend(), n_jobs=4,
... )

Functions

discover_bids(root, pattern, *[, subjects])

Discover one file per subject in a BIDS / derivatives tree.

discover_freesurfer(subjects_dir, *[, hemi, ...])

Discover FreeSurfer surface or morphometry files per subject.

group_comparison(group[, labels, test])

Run a vertex-wise group comparison on a loaded cohort.

load_group(files, *[, mode, loader, n_jobs, ...])

Load and stack a cohort for group statistics.

load_group_freesurfer(subjects_dir, *, measure)

Load a FreeSurfer morphometry measure across a cohort onto a template.

resample_to_template(values, subjects_dir, ...)

Resample a native-space per-vertex overlay onto a template surface.

Classes

GroupData(data, subject_ids, entities, paths)

A loaded cohort ready for group statistics.

class spectralbrain.io.group.GroupData(data, subject_ids, entities, paths, faces=None, metadata=<factory>)[source]#

Bases: object

A loaded cohort ready for group statistics.

Parameters:
data#

Stacked per-subject arrays, shape (S, N) or (S, N, T). Falls back to a list if subjects have heterogeneous shapes.

Type:

ndarray or list of ndarray

subject_ids#

Subject identifiers, aligned with data’s first axis.

Type:

list of str

entities#

BIDS entities parsed from each source filename (for covariates).

Type:

list of dict

paths#

Source file per subject.

Type:

list of Path

faces#

Template faces (F, 3) — useful for TFCE adjacency.

Type:

ndarray, optional

metadata#

Bookkeeping (mode, number of failed subjects, …).

Type:

dict

covariate(key, default=None)[source]#

Return a BIDS entity (e.g. "ses", "group") per subject.

Parameters:
  • key (str) – Entity key to pull from each subject’s parsed filename.

  • default (Any) – Value for subjects missing the entity.

Returns:

ndarray, shape (S,)

Return type:

ndarray

split(labels)[source]#

Split the stacked data into two groups by a 2-level label array.

Parameters:

labels (array-like, shape (S,)) – Exactly two distinct non-null values define the two groups.

Returns:

group_a, group_b (ndarray) – Subsets of data for the two label levels (in sorted order).

Return type:

tuple[ndarray, ndarray]

data: Any#
entities: list[dict[str, str]]#
faces: ndarray | None = None#
property is_stacked: bool#

True if data is a single stacked array.

metadata: dict[str, Any]#
property n_subjects: int#

Number of successfully loaded subjects.

paths: list[Path]#
subject_ids: list[str]#
spectralbrain.io.group.discover_bids(root, pattern, *, subjects=None)[source]#

Discover one file per subject in a BIDS / derivatives tree.

Parameters:
  • root (PathLike) – Dataset (or derivatives) root.

  • pattern (str) – Glob relative to root with a {sub} placeholder (the bare label, no sub- prefix), e.g. "sub-{sub}/anat/sub-{sub}_hemi-L_thickness.shape.gii".

  • subjects (list of str, optional) – Restrict to these subjects ("sub-01" or "01"). Defaults to every sub-* directory under root.

Returns:

dict of {subject_id (Path}) – Subjects with no match (or whose match is ambiguous) are logged; the first match is taken when several exist.

Return type:

dict[str, Path]

spectralbrain.io.group.discover_freesurfer(subjects_dir, *, hemi='lh', surface=None, measure=None, subjects=None)[source]#

Discover FreeSurfer surface or morphometry files per subject.

Provide exactly one of surface (geometry, e.g. "white") or measure (overlay, e.g. "thickness"). The path resolved is {subjects_dir}/{sub}/surf/{hemi}.{name}.

Parameters:
  • subjects_dir (PathLike) – FreeSurfer SUBJECTS_DIR.

  • hemi (str) – "lh" or "rh".

  • surface (str, optional) – Surface geometry name ("white", "pial", …).

  • measure (str, optional) – Morphometry overlay ("thickness", "curv", "sulc", …).

  • subjects (list of str, optional) – Restrict to these subject directory names.

Returns:

dict of {subject_id (Path})

Return type:

dict[str, Path]

spectralbrain.io.group.group_comparison(group, labels=None, *, test='ttest', **kwargs)[source]#

Run a vertex-wise group comparison on a loaded cohort.

Parameters:
  • group (GroupData or (group_a, group_b)) – A loaded cohort (split via labels) or a pre-split pair of arrays.

  • labels (array-like, shape (S,), optional) – Two-level grouping variable (required when group is a GroupData). Often group.covariate("group").

  • test ("ttest", "mannwhitney", or "permutation") – Which vertex-wise test from spectralbrain.statistics.analysis to run.

  • **kwargs – Forwarded to the chosen test (e.g. correction, alpha, n_permutations).

Returns:

VertexWiseResult

Return type:

Any

spectralbrain.io.group.load_group(files, *, mode='maps', loader=None, n_jobs=1, stack=True, descriptor='hks', k=100, backend=None, descriptor_kwargs=None, template_faces=None)[source]#

Load and stack a cohort for group statistics.

Parameters:
  • files (dict or list) – {subject_id: path} (e.g. from discover_bids() / discover_freesurfer()) or a plain list of paths (subject IDs are then parsed from the filenames).

  • mode ("maps" or "pipeline") – "maps" loads vertex-corresponded overlays/descriptor fields; "pipeline" loads each surface and computes a descriptor.

  • loader (callable, optional) – Custom path -> ndarray loader. Overrides mode.

  • n_jobs (int) – Parallel workers for loading (joblib). 1 = sequential.

  • stack (bool) – Stack into one array when subject shapes match (else keep a list).

  • descriptor (str) – Pipeline-mode options (descriptor name, eigenpairs, compute backend, and extra keyword arguments forwarded to the descriptor).

  • k (int) – Pipeline-mode options (descriptor name, eigenpairs, compute backend, and extra keyword arguments forwarded to the descriptor).

  • backend (Any | None) – Pipeline-mode options (descriptor name, eigenpairs, compute backend, and extra keyword arguments forwarded to the descriptor).

  • descriptor_kwargs (dict[str, Any] | None) – Pipeline-mode options (descriptor name, eigenpairs, compute backend, and extra keyword arguments forwarded to the descriptor).

  • template_faces (ndarray, optional) – Stored on the result for downstream TFCE adjacency.

Returns:

GroupData

Return type:

GroupData

spectralbrain.io.group.load_group_freesurfer(subjects_dir, *, measure, hemi='lh', template='fsaverage', resample=True, method='nearest', subjects=None, n_jobs=1)[source]#

Load a FreeSurfer morphometry measure across a cohort onto a template.

For each subject the native overlay ({hemi}.{measure}) is loaded and — unless resample=False — resampled to template via resample_to_template(), so the cohort stacks into a single vertex-corresponded (S, N_template) array ready for group_comparison().

Parameters:
  • subjects_dir (PathLike) – FreeSurfer SUBJECTS_DIR.

  • measure (str) – Morphometry overlay ("thickness", "curv", "sulc", …).

  • hemi (str) – "lh" or "rh".

  • template (str) – Target template subject (default "fsaverage").

  • resample (bool) – Resample to template (True) or keep native space (False; only stackable if every subject shares the vertex count).

  • method ("nearest" or "linear") – Resampling interpolation.

  • subjects (list of str, optional) – Restrict to these subject directory names.

  • n_jobs (int) – Parallel workers (joblib).

Returns:

GroupData

Return type:

GroupData

spectralbrain.io.group.resample_to_template(values, subjects_dir, subject_id, hemi, *, template='fsaverage', method='nearest', k=3)[source]#

Resample a native-space per-vertex overlay onto a template surface.

Mirrors FreeSurfer’s mri_surf2surf: both surfaces are brought into spherical-registration space ({hemi}.sphere.reg) and the template vertices sample the subject overlay there. "nearest" takes the closest subject vertex (matching SpectralBrain’s existing label projection); "linear" blends the k nearest by inverse distance, which is smoother for continuous metrics.

Parameters:
  • values (ndarray, shape (N_subject,)) – Per-vertex overlay on the subject’s native surface.

  • subjects_dir (PathLike) – FreeSurfer SUBJECTS_DIR (must also contain template).

  • subject_id (str) – Subject directory name.

  • hemi (str) – "lh" or "rh".

  • template (str) – Template subject (default "fsaverage").

  • method ("nearest" or "linear") – Interpolation on the registration sphere.

  • k (int) – Neighbours for "linear" inverse-distance weighting.

Returns:

ndarray, shape (N_template,) – Overlay resampled onto the template surface.

Return type:

ndarray