A library for working with Flywheel datasets

These details have not been verified by PyPI

Project description

fw-dataset

This repository contains classes and functions for creating, managing, and serving Flywheel Datasets. Flywheel Datasets are a way to organize, share, and query data from the Flywheel Data Model.

[[TOC]]

[!important] This Python package is under active development and should be considered unstable. It is provided as-is, without any guarantee of support or maintenance at this stage. Features may be incomplete, change without notice, or be removed in future versions. Use at your own risk for experimental or development purposes only.

Getting started

Installation

The fw-dataset package has been built for use with Python 3.10 and above. It can be installed with pip:

pip install fw-dataset

or poetry:

poetry add fw-dataset

Usage

Rendering Datasets

See notebooks/quickstart_dataset_creation.ipynb for a walkthrough of using the DatasetBuilder to render a Flywheel dataset.

Accessing and Managing Datasets

See notebooks/quickstart_dataset_management.ipynb for a walkthrough of using the FWDatasetClient to access and query a Flywheel dataset.

Unassociated Datasets

If you have a valid dataset that is not associated with a Flywheel project, you can still use the FWDatasetClient to access the dataset. You will need to provide the type,bucket, prefix, and credentials of cloud or local filesystem to instantiate and query the dataset.

from fw_dataset import FWDatasetClient

# There is no need to provide an API-Key or instantiate the dataset client

fs_type = "s3" # or "gcs", "azure", "fs", "local"
bucket = "your-bucket"
prefix = "your-prefix"
credentials = {"url": "{bucket-specific-credential-string}"}

dataset = FWDatasetClient.get_dataset_from_filesystem(fs_type, bucket, prefix, credentials)

Merging Related Datasets

If you have multiple datasets that have related tables you want to query together, you can merge the datasets into a single dataset.

NOTE: Federated Querying is not yet enabled across datasets. This is a work in progress.

Requirements

The source dataset must have a valid tables directory structure.
The source dataset must have a valid schemas directory structure.
- Every table in the tables directory must have a valid corresponding schema file in the schemas directory.
- The schema file must be named {table_name}.schema.json where {table_name} is the name of the table that the schema describes.
- The schema file must be a valid JSON file with the minimum structure:
```
{
    "schema": "http://json-schema.org/draft-07/schema#",
    "id": "{table_name}",
    "description": "",
    "properties": {},
    "required": [],
    "type": "object"
}
```
The destination dataset must have the same requirements as the source dataset.
Tables and schemas selected from the source MUST NOT have the same names as existing ones in the destination

Once the above requirements have been met, you may merge the datasets by copying or moving the selected tables and schemas from the source dataset to the destination dataset.

Flywheel Project Requirements

For the Flywheel Dataset Client and the Dataset objects to function, the following requirements must be met:

Flywheel Project Structure

The Flywheel Project must have the following valid custom information metadata:

{
    "dataset": {
        "type": "s3",
        "bucket": "{bucket-name}",
        "prefix": "{path/to/dataset}",
        "storage_id": "storage-id-of-fw-storage-object"
    }
}

type

The type field must be one of the following:

s3: The dataset is stored in an S3 bucket.
gcs: The dataset is stored in a Google Cloud Storage bucket.
azure: The dataset is stored in an Azure Blob Storage container.
fs,local: The dataset is stored on a local filesystem.

bucket

The bucket field is the name of the bucket or container where the dataset is stored.

prefix

The prefix field is the path to the dataset within the bucket or container.

The directory structure beneath the prefix should be as described in the Dataset Structure section.

storage_id

The storage_id field is the Flywheel ID of the cloud storage record that describes the filesystem or cloud storage bucket that the dataset is stored in. This should be a valid storage object in the Flywheel database.

Dataset Structure

The dataset should be stored in the bucket or container with the following structure:

{bucket}/{prefix}/
├── latest/
|   └── latest/
|       ├── provenance/
|       │   └── dataset_description.json
|       ├── tables/
|       │   └── {table_name}/ (a directory structure of partitioned parquet files)
|       │       └── /{partitions}/{hash}.parquet
|       └── schemas/
|          └── {table_name}.schema.json
└── versions/          
  ├── latest_version.json (provenance/dataset_description.json of versions/latest)
  └── {version}/
      ├── provenance/
      │   └── dataset_description.json
      ├── tables/
      │   └── {table_name}/ (a directory structure of partitioned parquet files)
      │       └── /{partitions}/{hash}.parquet
      └── schemas/
         └── {table_name}.schema.json

The latest_version.json file is a copy of the provenance/dataset_description.json. Both of these are minimal descriptions of a dataset version. The latest directory represents the latest version of the dataset. Archived versions of the dataset are also stored in the versions directory for archival purposes. They can be deleted once they are no longer needed.

The above structure is more completely described in the Dataset Definition Document in the docs directory.

Schema Files

The schema files are JSON files that describe the schema of the tables in the dataset. The schema files are stored in the schemas directory. The schema files are named {table_name}.schema.json where {table_name} is the name of the table that the schema describes.

Ideally, the schema files should be fully descriptive. However, if a minimal schema is desired merely to allow the dataset to be queried, the schema file can be as simple as:

{
    "schema": "http://json-schema.org/draft-07/schema#",
    "id": "{table_name}",
    "description": "Table derived from Tabular Data File: conditions.csv",
    "properties": {},
    "required": [],
    "type": "object"
}

Appendix

Flywheel Data Model

The Flywheel Data Model is a hierarchical structure that organizes data in a Flywheel Project. The Flywheel Data Model is organized as follows:

Project (has files and analyses)
Subject (has files and analyses)
Session (has files and analyses)
Acquisition (has files and analyses)
File
Analysis

The SQLite snapshot of the Flywheel Data Model has each of the above entities as tables. The tables consist of an id column and a data column. The data column is a binary string containing the JSON representation of each entity.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.4.0

Apr 1, 2026

0.3.3

Apr 6, 2026

0.3.1

Apr 1, 2026

This version

0.3.0

Jan 22, 2026

0.2.0

May 29, 2025

0.1.3

May 14, 2025

0.1.2

May 14, 2025

0.1.1

May 13, 2025

0.1.0

Apr 30, 2025

0.1.0rc20 pre-release

Apr 23, 2025

0.1.0rc15 pre-release

Jan 24, 2025

0.1.0rc14 pre-release

Jan 23, 2025

0.1.0rc13 pre-release

Jan 17, 2025

0.1.0rc12 pre-release

Nov 9, 2024

0.1.0rc11 pre-release

Nov 8, 2024

0.1.0rc10 pre-release

Oct 22, 2024

0.1.0rc9 pre-release

Oct 15, 2024

0.1.0rc8 pre-release

Oct 15, 2024

0.1.0rc7 pre-release

Oct 15, 2024

0.1.0rc6 pre-release

Oct 15, 2024

0.1.0rc5 pre-release

Oct 11, 2024

0.1.0rc4 pre-release

Oct 11, 2024

0.1.0rc3 pre-release

Oct 10, 2024

0.1.0rc2 pre-release

Oct 10, 2024

0.0.1rc7 pre-release

Nov 11, 2021

0.0.1rc6 pre-release

Nov 11, 2021

0.0.1rc5 pre-release

Nov 11, 2021

0.0.1rc4 pre-release

Nov 11, 2021

0.0.1rc3 pre-release

Nov 10, 2021

0.0.1rc2 pre-release

Nov 10, 2021

0.0.1rc1 pre-release

Nov 9, 2021

0.0.1rc0 pre-release

Nov 9, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

fw_dataset-0.3.0-py3-none-any.whl (60.5 kB view details)

Uploaded Jan 22, 2026 Python 3

File details

Details for the file fw_dataset-0.3.0-py3-none-any.whl.

File metadata

Download URL: fw_dataset-0.3.0-py3-none-any.whl
Upload date: Jan 22, 2026
Size: 60.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.2.1 CPython/3.12.12 Linux/5.15.154+

File hashes

Hashes for fw_dataset-0.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`bc658d5639d3123c395ffc28bc2685abd7799f992ea09e4f227bb86ceef23e69`
MD5	`779c9a2ac99bd87bb8471a44c36752f2`
BLAKE2b-256	`c32dd6f5861b9fb01c0e072276aeec817a390af2ab6a86620e4e0d5cce721a62`

See more details on using hashes here.

fw-dataset 0.3.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

fw-dataset

Getting started

Installation

Usage

Rendering Datasets

Accessing and Managing Datasets

Unassociated Datasets

Merging Related Datasets

Requirements

Flywheel Project Requirements

Flywheel Project Structure

type

bucket

prefix

storage_id

Dataset Structure

Schema Files

Appendix

Flywheel Data Model

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes