spectralbrain.io.group#

Group-level loading for cohort statistics.

This module closes the gap between per-file I/O and the group-statistics functions in spectralbrain.statistics.analysis. The workflow is:

Discover one file per subject from a BIDS/derivatives tree, a FreeSurfer SUBJECTS_DIR, or an explicit list/dict of paths.
Load every subject in parallel (joblib via spectralbrain.backends.cpu.parallel_map()), fail-soft: a subject that errors is logged and dropped rather than aborting the cohort.
Stack the per-subject arrays into a single (S, N) (or (S, N, T)) array, packaged in a GroupData object that carries subject IDs and parsed BIDS entities (for covariates).

Two loading modes:

mode="maps" — load a per-vertex overlay/metric or a precomputed descriptor field that is already vertex-corresponded on a common template (the light path).
mode="pipeline" — load each surface, build the Laplace–Beltrami decomposition, and compute a spectral descriptor per subject (the heavy path, where joblib and the GPU backends pay off).

The resulting GroupData plugs straight into group_comparison(), which dispatches to the vertex-wise tests.

Examples

>>> # HippUnfold-style derivatives, descriptor fields already on template:
>>> files = discover_bids(
...     "/data/derivatives/hippunfold",
...     "sub-{sub}/surf/sub-{sub}_hemi-L_*_thickness.shape.gii",
... )
>>> group = load_group(files, mode="maps", n_jobs=8)
>>> res = group_comparison(group, group.covariate("group"), test="ttest")

>>> # Full pipeline from FreeSurfer surfaces, HKS per subject, on GPU:
>>> files = discover_freesurfer("/data/fs", surface="white", hemi="lh")
>>> from spectralbrain.backends import TorchBackend
>>> group = load_group(
...     files, mode="pipeline", descriptor="hks", k=100,
...     backend=TorchBackend(), n_jobs=4,
... )

Functions

`discover_bids`(root, pattern, *[, subjects])	Discover one file per subject in a BIDS / derivatives tree.
`discover_freesurfer`(subjects_dir, *[, hemi, ...])	Discover FreeSurfer surface or morphometry files per subject.
`group_comparison`(group[, labels, test])	Run a vertex-wise group comparison on a loaded cohort.
`load_group`(files, *[, mode, loader, n_jobs, ...])	Load and stack a cohort for group statistics.
`load_group_freesurfer`(subjects_dir, *, measure)	Load a FreeSurfer morphometry measure across a cohort onto a template.
`resample_to_template`(values, subjects_dir, ...)	Resample a native-space per-vertex overlay onto a template surface.

Classes

GroupData(data, subject_ids, entities, paths)

A loaded cohort ready for group statistics.

class spectralbrain.io.group.GroupData(data, subject_ids, entities, paths, faces=None, metadata=<factory>)[source]#

Bases: object

A loaded cohort ready for group statistics.

Parameters:

data (Any)
subject_ids (list[str])
entities (list[dict[str, str]])
paths (list[Path])
faces (ndarray | None)
metadata (dict[str, Any])

data#

Stacked per-subject arrays, shape (S, N) or (S, N, T). Falls back to a list if subjects have heterogeneous shapes.

Type:: ndarray or list of ndarray

subject_ids#

Subject identifiers, aligned with data’s first axis.

Type:: list of str

entities#

BIDS entities parsed from each source filename (for covariates).

Type:: list of dict

paths#

Source file per subject.

Type:: list of Path

faces#

Template faces (F, 3) — useful for TFCE adjacency.

Type:: ndarray, optional

metadata#

Bookkeeping (mode, number of failed subjects, …).

Type:: dict

covariate(key, default=None)[source]#

Return a BIDS entity (e.g. "ses", "group") per subject.

Parameters:

key (str) – Entity key to pull from each subject’s parsed filename.
default (Any) – Value for subjects missing the entity.

Returns:

ndarray, shape (S,)

Return type:

ndarray

split(labels)[source]#

Split the stacked data into two groups by a 2-level label array.

Parameters:: labels (array-like, shape (S,)) – Exactly two distinct non-null values define the two groups.
Returns:: group_a, group_b (ndarray) – Subsets of data for the two label levels (in sorted order).
Return type:: tuple[ndarray, ndarray]

data: Any#

entities: list[dict[str, str]]#

faces: ndarray | None = None#

property is_stacked: bool#: True if data is a single stacked array.

metadata: dict[str, Any]#

property n_subjects: int#: Number of successfully loaded subjects.

paths: list[Path]#

subject_ids: list[str]#

spectralbrain.io.group.discover_bids(root, pattern, *, subjects=None)[source]#

Discover one file per subject in a BIDS / derivatives tree.

Parameters:

root (PathLike) – Dataset (or derivatives) root.
pattern (str) – Glob relative to root with a {sub} placeholder (the bare label, no sub- prefix), e.g. "sub-{sub}/anat/sub-{sub}_hemi-L_thickness.shape.gii".
subjects (list of str, optional) – Restrict to these subjects ("sub-01" or "01"). Defaults to every sub-* directory under root.

Returns:

dict of {subject_id (Path}) – Subjects with no match (or whose match is ambiguous) are logged; the first match is taken when several exist.

Return type:

dict[str, Path]

spectralbrain.io.group.discover_freesurfer(subjects_dir, *, hemi='lh', surface=None, measure=None, subjects=None)[source]#

Discover FreeSurfer surface or morphometry files per subject.

Provide exactly one of surface (geometry, e.g. "white") or measure (overlay, e.g. "thickness"). The path resolved is {subjects_dir}/{sub}/surf/{hemi}.{name}.

Parameters:

subjects_dir (PathLike) – FreeSurfer SUBJECTS_DIR.
hemi (str) – "lh" or "rh".
surface (str, optional) – Surface geometry name ("white", "pial", …).
measure (str, optional) – Morphometry overlay ("thickness", "curv", "sulc", …).
subjects (list of str, optional) – Restrict to these subject directory names.

Returns:

dict of {subject_id (Path})

Return type:

dict[str, Path]

spectralbrain.io.group.group_comparison(group, labels=None, *, test='ttest', **kwargs)[source]#

Run a vertex-wise group comparison on a loaded cohort.

Parameters:

group (GroupData or (group_a, group_b)) – A loaded cohort (split via labels) or a pre-split pair of arrays.
labels (array-like, shape (S,), optional) – Two-level grouping variable (required when group is a GroupData). Often group.covariate("group").
test ("ttest", "mannwhitney", or "permutation") – Which vertex-wise test from spectralbrain.statistics.analysis to run.
**kwargs – Forwarded to the chosen test (e.g. correction, alpha, n_permutations).

Returns:

VertexWiseResult

Return type:

Any

spectralbrain.io.group.load_group(files, *, mode='maps', loader=None, n_jobs=1, stack=True, descriptor='hks', k=100, backend=None, descriptor_kwargs=None, template_faces=None)[source]#

Load and stack a cohort for group statistics.

Parameters:

files (dict or list) – {subject_id: path} (e.g. from discover_bids() / discover_freesurfer()) or a plain list of paths (subject IDs are then parsed from the filenames).
mode ("maps" or "pipeline") – "maps" loads vertex-corresponded overlays/descriptor fields; "pipeline" loads each surface and computes a descriptor.
loader (callable, optional) – Custom path -> ndarray loader. Overrides mode.
n_jobs (int) – Parallel workers for loading (joblib). 1 = sequential.
stack (bool) – Stack into one array when subject shapes match (else keep a list).
descriptor (str) – Pipeline-mode options (descriptor name, eigenpairs, compute backend, and extra keyword arguments forwarded to the descriptor).
k (int) – Pipeline-mode options (descriptor name, eigenpairs, compute backend, and extra keyword arguments forwarded to the descriptor).
backend (Any | None) – Pipeline-mode options (descriptor name, eigenpairs, compute backend, and extra keyword arguments forwarded to the descriptor).
descriptor_kwargs (dict[str, Any] | None) – Pipeline-mode options (descriptor name, eigenpairs, compute backend, and extra keyword arguments forwarded to the descriptor).
template_faces (ndarray, optional) – Stored on the result for downstream TFCE adjacency.

Returns:

GroupData

Return type:

GroupData

spectralbrain.io.group.load_group_freesurfer(subjects_dir, *, measure, hemi='lh', template='fsaverage', resample=True, method='nearest', subjects=None, n_jobs=1)[source]#

Load a FreeSurfer morphometry measure across a cohort onto a template.

For each subject the native overlay ({hemi}.{measure}) is loaded and — unless resample=False — resampled to template via resample_to_template(), so the cohort stacks into a single vertex-corresponded (S, N_template) array ready for group_comparison().

Parameters:

subjects_dir (PathLike) – FreeSurfer SUBJECTS_DIR.
measure (str) – Morphometry overlay ("thickness", "curv", "sulc", …).
hemi (str) – "lh" or "rh".
template (str) – Target template subject (default "fsaverage").
resample (bool) – Resample to template (True) or keep native space (False; only stackable if every subject shares the vertex count).
method ("nearest" or "linear") – Resampling interpolation.
subjects (list of str, optional) – Restrict to these subject directory names.
n_jobs (int) – Parallel workers (joblib).

Returns:

GroupData

Return type:

GroupData

spectralbrain.io.group.resample_to_template(values, subjects_dir, subject_id, hemi, *, template='fsaverage', method='nearest', k=3)[source]#

Resample a native-space per-vertex overlay onto a template surface.

Mirrors FreeSurfer’s mri_surf2surf: both surfaces are brought into spherical-registration space ({hemi}.sphere.reg) and the template vertices sample the subject overlay there. "nearest" takes the closest subject vertex (matching SpectralBrain’s existing label projection); "linear" blends the k nearest by inverse distance, which is smoother for continuous metrics.

Parameters:

values (ndarray, shape (N_subject,)) – Per-vertex overlay on the subject’s native surface.
subjects_dir (PathLike) – FreeSurfer SUBJECTS_DIR (must also contain template).
subject_id (str) – Subject directory name.
hemi (str) – "lh" or "rh".
template (str) – Template subject (default "fsaverage").
method ("nearest" or "linear") – Interpolation on the registration sphere.
k (int) – Neighbours for "linear" inverse-distance weighting.

Returns:

ndarray, shape (N_template,) – Overlay resampled onto the template surface.

Return type:

ndarray