Multi-domain scientific dataset fetcher — neuroscience, biology, pharmacology, medical (OpenNeuro, DANDI, PhysioNet, GEO, ChEMBL, ClinicalTrials.gov)
Project description
SciTeX Dataset (scitex-dataset)
Unified access to neuroscience and scientific datasets
Full Documentation · pip install scitex-dataset
Problem
Neuroscience datasets are scattered across multiple repositories -- OpenNeuro, DANDI Archive, PhysioNet, Zenodo -- each with its own API, data format, and query interface. Researchers waste time navigating incompatible APIs to discover relevant data. AI agents lack a unified way to search and evaluate datasets programmatically.
Solution
SciTeX Dataset provides a single Python API, CLI, and MCP (Model Context Protocol) server to discover and query metadata from major scientific data repositories. It focuses on fast metadata retrieval without downloading full datasets.
| Repository | Description | Data Types |
|---|---|---|
| OpenNeuro | Open platform for sharing neuroimaging data | MRI, EEG, MEG, iEEG, PET |
| DANDI | BRAIN Initiative data archive | Electrophysiology, Ophys |
| PhysioNet | Physiological signal databases | ECG, EEG, clinical data |
| Zenodo | General scientific data repository (CERN) | Any research data |
Table 1. Supported data repositories. Each source is queried via its public API; no authentication required for metadata access.
Installation
Requires Python >= 3.10.
pip install scitex-dataset
MCP support:
pip install scitex-dataset[mcp]
Quick Start
from scitex_dataset import fetch_all_datasets, format_dataset
# Fetch datasets from OpenNeuro
datasets = fetch_all_datasets(max_datasets=10)
# Format for analysis
for ds in datasets:
formatted = format_dataset(ds)
print(f"{formatted['id']}: {formatted['name']} ({formatted['n_subjects']} subjects)")
Four Interfaces
Python API
from scitex_dataset import fetch_all_datasets, format_dataset, search_datasets, sort_datasets
from scitex_dataset import neuroscience, database
# Fetch from specific sources
datasets = fetch_all_datasets(max_datasets=100) # OpenNeuro
dandi_ds = neuroscience.dandi.fetch_all_datasets(max_datasets=50) # DANDI
phys_ds = neuroscience.physionet.fetch_all_datasets() # PhysioNet
# Search and filter
eeg_datasets = search_datasets(datasets, modality="eeg", min_subjects=20)
popular = sort_datasets(datasets, by="downloads", descending=True)
# Local database for fast full-text search
database.build() # index all sources
results = database.search("alzheimer EEG", min_subjects=20)
CLI Commands
scitex-dataset --help-recursive # Show all commands
# Fetch from repositories
scitex-dataset openneuro -n 100 -o datasets.json -v
scitex-dataset dandi -n 50 -o dandi.json -v
scitex-dataset physionet -n 50 -v
scitex-dataset zenodo -q "neuroscience" -n 20
# Local database
scitex-dataset db build # index all sources
scitex-dataset db search "epilepsy EEG" # full-text search
scitex-dataset db stats # show statistics
# Introspection
scitex-dataset list-python-apis -v # list Python API tree
scitex-dataset mcp list-tools -v # list MCP tools
MCP Server -- for AI Agents
AI agents can discover and query neuroscience datasets autonomously.
| Tool | Description |
|---|---|
dataset_openneuro_fetch |
Fetch datasets from OpenNeuro |
dataset_dandi_fetch |
Fetch datasets from DANDI Archive |
dataset_physionet_fetch |
Fetch datasets from PhysioNet |
dataset_zenodo_fetch |
Fetch datasets from Zenodo |
dataset_search |
Filter datasets by modality, subjects, etc. |
dataset_list_sources |
List available data repositories |
dataset_db_build |
Build local search database |
dataset_db_search |
Full-text search across all sources |
dataset_db_stats |
Database statistics |
Table 2. Nine MCP tools available for AI-assisted dataset discovery. All tools accept JSON parameters and return JSON results.
scitex-dataset mcp start
Skills — for AI Agent Discovery
Skills provide workflow-oriented guides that AI agents query to discover capabilities and usage patterns.
scitex-dataset skills list # List available skill pages
scitex-dataset skills get SKILL # Show main skill page
scitex-dev skills export --package scitex-dataset # Export to Claude Code
| Skill | Content |
|---|---|
quick-start |
Basic usage |
data-sources |
OpenNeuro, DANDI, PhysioNet |
cli-reference |
CLI commands |
mcp-tools |
MCP tools for AI agents |
Part of SciTeX
SciTeX Dataset is part of SciTeX. When used inside the SciTeX framework, dataset discovery integrates with reproducible research sessions:
import scitex
from scitex_dataset import fetch_all_datasets, format_dataset
@scitex.session
def main(logger=scitex.INJECTED):
datasets = fetch_all_datasets(max_datasets=100, logger=logger)
formatted = [format_dataset(ds) for ds in datasets]
scitex.io.save(formatted, "openneuro_datasets.json")
return 0
The SciTeX ecosystem follows the Four Freedoms for Research, inspired by the Free Software Definition:
Four Freedoms for Research
- The freedom to run your research anywhere -- your machine, your terms.
- The freedom to study how every step works -- from raw data to final manuscript.
- The freedom to redistribute your workflows, not just your papers.
- The freedom to modify any module and share improvements with the community.
AGPL-3.0 -- because we believe research infrastructure deserves the same freedoms as the software it runs on.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file scitex_dataset-0.3.1.tar.gz.
File metadata
- Download URL: scitex_dataset-0.3.1.tar.gz
- Upload date:
- Size: 423.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.0rc1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e57b07dcc0cc2ea84618000505b60721222b5d96c1a88886ba5e821710321bd2
|
|
| MD5 |
3552ac4bed38cb7cf6ca5b7887d1b11b
|
|
| BLAKE2b-256 |
617fbbbb0ca0c47a73896ad1912318c86c20e4dfdd07f8d718f27d8ff3b47331
|
File details
Details for the file scitex_dataset-0.3.1-py3-none-any.whl.
File metadata
- Download URL: scitex_dataset-0.3.1-py3-none-any.whl
- Upload date:
- Size: 55.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.0rc1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
18010d1b92b9df018e04794166cb2585835d1d0c450647dfaf3383ce5e9b2aae
|
|
| MD5 |
4c18de2d83a61e6880eacb450d7bc1b9
|
|
| BLAKE2b-256 |
76f77336437d9dc1e36e35934966790ed71ce4a7d36fbbecc7f5d83b2a92b408
|