torch_brain.dataset

Overview

Base classes to ease creation of PyTorch datasets for your data.

  • The Dataset class is inherited by all datasets. These handle opening and accessing single datasets.

  • The NestedDataset class is for opening and accessing multiple datasets through a unified interface.

  • Mixin classes are provided to add modality-specific functionalities to the Dataset classes.

Dataset

torch_brain’s Dataset class (and its sub-classes) allow you to sample time-slices of your data. This is a major deviation from the standard torch.utils.data.Dataset, which is indexed by integers. To achieve arbitrary time-slice based access, our Dataset class is indexed by three things:

  1. The recording id from which you want the slice,

  2. Start time of the slice, and

  3. End time of the slice

These are put into a DatasetIndex object, which is then used to index the Dataset. Since different machine learning applications require different ways of sampling, we provide a collection of samplers which are responsible for creating these DatasetIndex objects.

NestedDataset

The Dataset class is designed to operate on a single dataset. However, many modern ML methods perform training over multiple datasets. For this, we provide NestedDataset that allows users to open and index through multple datasets.