Skip to main content

Multi-domain scientific dataset fetcher — neuroscience, biology, pharmacology, medical (OpenNeuro, DANDI, PhysioNet, GEO, ChEMBL, ClinicalTrials.gov)

Project description

SciTeX Dataset (scitex-dataset)

SciTeX

Unified access to neuroscience and scientific datasets

PyPI version Documentation Tests License: AGPL-3.0

Full Documentation · pip install scitex-dataset


Interfaces: Python ⭐⭐⭐ (primary) · CLI ⭐ · MCP ⭐⭐ · Skills ⭐⭐ · Hook — · HTTP —

Problem and Solution

# Problem Solution
1 Public dataset repositories balkanized -- OpenNeuro (BIDS) + DANDI (NWB) + PhysioNet (WFDB) + Zenodo (generic) + GEO / ChEMBL / ClinicalTrials — different APIs, auth, download tools Unified fetcher -- stx.dataset.neuroscience.openneuro.fetch_all_datasets() same call shape across all; local FTS5 search across metadata
2 "Download this BIDS dataset" means reading DataLad docs first -- the barrier is tooling, not knowledge One-line fetch -- no DataLad setup; the module handles auth, resumption, checksums transparently

Problem

Neuroscience datasets are scattered across multiple repositories -- OpenNeuro, DANDI Archive, PhysioNet, Zenodo -- each with its own API, data format, and query interface. Researchers waste time navigating incompatible APIs to discover relevant data. AI agents lack a unified way to search and evaluate datasets programmatically.

Solution

SciTeX Dataset provides a single Python API, CLI, and MCP (Model Context Protocol) server to discover and query metadata from major scientific data repositories. It focuses on fast metadata retrieval without downloading full datasets.

Repository Description Data Types
OpenNeuro Open platform for sharing neuroimaging data MRI, EEG, MEG, iEEG, PET
DANDI BRAIN Initiative data archive Electrophysiology, Ophys
PhysioNet Physiological signal databases ECG, EEG, clinical data
Zenodo General scientific data repository (CERN) Any research data

Table 1. Supported data repositories. Each source is queried via its public API; no authentication required for metadata access.

Installation

Requires Python >= 3.10.

pip install scitex-dataset

MCP support: pip install scitex-dataset[mcp]

Quick Start

from scitex_dataset import fetch_all_datasets, format_dataset

# Fetch datasets from OpenNeuro
datasets = fetch_all_datasets(max_datasets=10)

# Format for analysis
for ds in datasets:
    formatted = format_dataset(ds)
    print(f"{formatted['id']}: {formatted['name']} ({formatted['n_subjects']} subjects)")

Four Interfaces

Python API
from scitex_dataset import fetch_all_datasets, format_dataset, search_datasets, sort_datasets
from scitex_dataset import neuroscience, database

# Fetch from specific sources
datasets = fetch_all_datasets(max_datasets=100)                    # OpenNeuro
dandi_ds = neuroscience.dandi.fetch_all_datasets(max_datasets=50)  # DANDI
phys_ds = neuroscience.physionet.fetch_all_datasets()              # PhysioNet

# Search and filter
eeg_datasets = search_datasets(datasets, modality="eeg", min_subjects=20)
popular = sort_datasets(datasets, by="downloads", descending=True)

# Local database for fast full-text search
database.build()                                        # index all sources
results = database.search("alzheimer EEG", min_subjects=20)

Full API reference

CLI Commands
scitex-dataset --help-recursive             # Show all commands

# Fetch from repositories
scitex-dataset openneuro -n 100 -o datasets.json -v
scitex-dataset dandi -n 50 -o dandi.json -v
scitex-dataset physionet -n 50 -v
scitex-dataset zenodo -q "neuroscience" -n 20

# Local database
scitex-dataset db build                     # index all sources
scitex-dataset db search "epilepsy EEG"     # full-text search
scitex-dataset db stats                     # show statistics

# Introspection
scitex-dataset list-python-apis -v          # list Python API tree
scitex-dataset mcp list-tools -v            # list MCP tools

Full CLI reference

MCP Server -- for AI Agents

AI agents can discover and query neuroscience datasets autonomously.

Tool Description
dataset_openneuro_fetch Fetch datasets from OpenNeuro
dataset_dandi_fetch Fetch datasets from DANDI Archive
dataset_physionet_fetch Fetch datasets from PhysioNet
dataset_zenodo_fetch Fetch datasets from Zenodo
dataset_search Filter datasets by modality, subjects, etc.
dataset_list_sources List available data repositories
dataset_db_build Build local search database
dataset_db_search Full-text search across all sources
dataset_db_stats Database statistics

Table 2. Nine MCP tools available for AI-assisted dataset discovery. All tools accept JSON parameters and return JSON results.

scitex-dataset mcp start

Full MCP specification

Skills — for AI Agent Discovery

Skills provide workflow-oriented guides that AI agents query to discover capabilities and usage patterns.

scitex-dataset skills list              # List available skill pages
scitex-dataset skills get SKILL         # Show main skill page
scitex-dev skills export --package scitex-dataset  # Export to Claude Code
Skill Content
quick-start Basic usage
data-sources OpenNeuro, DANDI, PhysioNet
cli-reference CLI commands
mcp-tools MCP tools for AI agents

Part of SciTeX

SciTeX Dataset is part of SciTeX. When used inside the SciTeX framework, dataset discovery integrates with reproducible research sessions:

import scitex
from scitex_dataset import fetch_all_datasets, format_dataset

@scitex.session
def main(logger=scitex.INJECTED):
    datasets = fetch_all_datasets(max_datasets=100, logger=logger)
    formatted = [format_dataset(ds) for ds in datasets]
    scitex.io.save(formatted, "openneuro_datasets.json")
    return 0

The SciTeX ecosystem follows the Four Freedoms for Research, inspired by the Free Software Definition:

Four Freedoms for Research

  1. The freedom to run your research anywhere -- your machine, your terms.
  2. The freedom to study how every step works -- from raw data to final manuscript.
  3. The freedom to redistribute your workflows, not just your papers.
  4. The freedom to modify any module and share improvements with the community.

AGPL-3.0 -- because we believe research infrastructure deserves the same freedoms as the software it runs on.


SciTeX

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scitex_dataset-0.3.3.tar.gz (432.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scitex_dataset-0.3.3-py3-none-any.whl (66.3 kB view details)

Uploaded Python 3

File details

Details for the file scitex_dataset-0.3.3.tar.gz.

File metadata

  • Download URL: scitex_dataset-0.3.3.tar.gz
  • Upload date:
  • Size: 432.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.0rc1

File hashes

Hashes for scitex_dataset-0.3.3.tar.gz
Algorithm Hash digest
SHA256 284f060e1d6f1591ab8d60464ed43a519077a3f092a48ccdd4c7a300f8fa96d5
MD5 fe0970f64634640ce21c8dce1338a4e6
BLAKE2b-256 79375e75532f79566aa7f908ebf4fb3e3d5b63c3b9131e814f95f80b4a729e2a

See more details on using hashes here.

File details

Details for the file scitex_dataset-0.3.3-py3-none-any.whl.

File metadata

  • Download URL: scitex_dataset-0.3.3-py3-none-any.whl
  • Upload date:
  • Size: 66.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.0rc1

File hashes

Hashes for scitex_dataset-0.3.3-py3-none-any.whl
Algorithm Hash digest
SHA256 03cb707df3abdd6fbb5b89814d3365c168cc48752313190a7fb4b446c5223c76
MD5 e7c5ef31ff10a1f0cb2aa54f58dda455
BLAKE2b-256 04999dc1e006b46a5f3660f5218e0503be0a7dfcb1357f4b26fe1fc31d345d68

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page