Universal scientific data I/O with plugin registry
Project description
SciTeX IO (scitex-io)
Universal scientific data I/O with plugin registry
Full Documentation · uv pip install scitex-io[all]
Problem and Solution
| # | Problem | Solution |
|---|---|---|
| 1 | Format zoo — every format has its own API (pd.read_csv, np.load, pickle, h5py, torch.save, …). |
One call — sio.save(obj, "x.ext") / sio.load("x.ext") dispatches across 30+ formats. |
| 2 | Outputs scatter — saves land relative to cwd, drift away from the script that produced them. | Caller-anchored paths — sio.save(df, "results.csv") writes to {caller}_out/results.csv. |
| 3 | Magic numbers everywhere — hyperparameters, paths, thresholds duplicated across scripts. | Centralized config — load_configs() merges every config/*.yaml into one DotDict. |
Quick Start
import scitex_io as sio
import pandas as pd
import numpy as np
# Demo Data
df_orig = pd.DataFrame({"x": [1, 2, 3]})
arr_orig = np.array([1, 2, 3])
params_orig = {"lr": 1e-3, "epochs": 10}
# Unified Saving API
sio.save(df_orig, "data.csv")
sio.save(arr_orig, "data.npy")
sio.save(params_orig, "config.yaml")
# Unified Loading API
df_loaded = sio.load("data.csv")
arr_loaded = sio.load("data.npy")
params_loaded = sio.load("config.yaml")
# Round-trip check
assert df_loaded.equals(df_orig)
assert np.array_equal(arr_loaded, arr_orig)
assert params_loaded == params_orig
Supported Formats (30+) and Customization
| Category | Extensions |
|---|---|
| Spreadsheet | .csv, .tsv, .xlsx, .xls, .xlsm, .xlsb |
| Columnar | .parquet, .feather |
| Scientific | .npy, .npz, .mat, .hdf5, .h5, .zarr |
| Serialization | .pkl, .pickle, .pkl.gz, .joblib |
| ML/DL | .pth, .pt, .cbm |
| Config | .json, .yaml, .yml, .xml |
| Database | .db (SQLite3) |
| Documents | .txt, .md, .pdf, .docx, .tex, .log |
| Code | .py, .sh, .css, .js |
| Images | .png, .jpg, .jpeg, .gif, .tiff, .tif, .svg |
| Media | .mp4 |
| Web | .html |
| Bibliography | .bib |
| EEG | .vhdr, .vmrk, .edf, .bdf, .gdf, .cnt, .egi, .eeg, .set, .con |
Need a format not listed above? Register a custom handler with
register_saver / register_loader and sio.save() / sio.load()
will dispatch to it by extension just like a built-in.
from scitex_io import register_saver, register_loader
@register_saver(".custom")
def save_custom(obj, path, **kw):
open(path, "w").write(str(obj))
@register_loader(".custom")
def load_custom(path, **kw):
return open(path).read()
sio.save("hello", "data.custom")
assert sio.load("data.custom") == "hello"
Installation
uv pip install "scitex-io[all]"
Per-module extras
| Extra | Pulls in |
|---|---|
scientific |
scipy, h5py, zarr, numcodecs, matplotlib (HDF5 / zarr / scientific I/O) |
mcp |
fastmcp (MCP server for agents) |
all |
scientific + mcp (recommended) |
dev |
pytest, pytest-cov, plotly, Pillow, + every optional dep so the test suite runs |
docs |
Sphinx + RTD theme + myst-parser (docs build only) |
uv pip install "scitex-io[scientific]" # HDF5 / zarr / parquet stack
uv pip install "scitex-io[mcp]" # MCP server only
uv pip install -e ".[dev]" # editable install for contributors
How it works
1. Format detection by extension
save() / load() pick the right reader/writer from the file
extension via a plugin registry — just as OS does. Custom handlers
can be available using register_saver / register_loader.
%%{init: {'flowchart': {'nodeSpacing': 20, 'rankSpacing': 40, 'curve': 'linear'}, 'themeVariables': {'fontSize': '12px'}}}%%
flowchart LR
A["save(obj, 'x.ext')"] --> B{Registry}
L["load('x.ext')"] --> B
B -->|.csv .parquet .feather| C[pandas]
B -->|.npy .npz| D[numpy]
B -->|.h5 .zarr| E[h5py / zarr]
B -->|.pkl .joblib| F[pickle / joblib]
B -->|.pt .pth| G[torch]
B -->|.png .jpg .svg| H[Pillow]
B -->|.yaml .json| I[PyYAML / json]
B -->|.bib .pdf .docx ...| J[30+ handlers]
B -.->|register_*| K[Custom format]
2. save(obj, out.ext) in /path/to/script.py → /path/to/script_out/out.ext
Relative paths in save() resolve relative to the calling script /
notebook, not the working directory. Scripts and outputs are tied as locations.
/path/to/project/
├── config/ # see §3
│ └── ...
└── scripts/
└── xxx/
├── filename.py # sio.save(df, "results.csv")
└── filename_out/ # auto-created sibling of the script
└── results.csv # output lands here
Caller sio.save(df, "sub/dir/results.csv")writes to/path/to/analysis.py(script)/path/to/analysis_out/sub/dir/results.csv/path/to/exp.ipynb(notebook)/path/to/exp_out/sub/dir/results.csvpython -i/ IPython / REPL~/.scitex/io/runtime/cache/sub/dir/results.csv
Bare filename or any relative path —
"results.csv","sub/dir/results.csv", and"./sub/dir/results.csv"all work; the whole path is appended under the caller's output anchor.Intermediate directories created automatically — no
os.makedirs()/Path.mkdir()calls needed on the caller side.
Advanced save() — absolute paths, symlinks, dry-run
Absolute paths bypass auto-routing.
sio.save(df, "/data/x.csv")writes to/data/x.csvas-is — caller-anchored routing (§2) only applies when the path is relative.
sio.save(df, "/data/x.csv") # absolute → used as-is
Symlinks and dry-run.
symlink_from_cwd=Truedrops a symlink at./results.csvpointing into the auto-routed location;symlink_to=…plants a symlink at a custom path;dry_run=Trueprints the resolved path without writing.
sio.save(df, "results.csv", symlink_from_cwd=True)
sio.save(fig, "fig1.png", symlink_to="/data/latest/fig1.png")
sio.save(df, "results.csv", use_caller_path=True) # resolve from caller script
sio.save(df, "results.csv", dry_run=True) # print path, don't write
3. Centralized project configuration
Scientific projects benefit from keeping parameters — hyperparameters, paths, thresholds — out of the scripts that consume them, as a single source of truth.
CONFIG = load_configs() collects every YAML under
<project-root>/config/ into one nested DotDict. Parameters are
then accessible as CONFIG.YAML_FILE_NAME.FIELD_NAME.
UPPER_CASE normalisation. YAML filenames and field names are recognised in UPPER_CASE, following Python's convention for user-defined parameters.
model.yamlwithhidden_dim: 256lands atCONFIG.MODEL.HIDDEN_DIMregardless of source casing.Conflict handling. When an UPPER/lower pair collide (e.g.
MODEL.yamlnext tomodel.yaml, orHIDDEN_DIMnext tohidden_dim), the UPPER variant is prioritised and aUserWarningis emitted pointing at the conflict.
/path/to/project/
├── config/
│ ├── PATHS.yaml # DATA_DIR: /data/experiment_01
│ ├── PREPROCESS.yaml # SAMPLE_RATE: 1000, BANDPASS: [0.5, 40]
│ ├── MODEL.yaml # HIDDEN_DIM: 256, DROPOUT: 0.3
│ └── IS_DEBUG.yaml # IS_DEBUG: true
└── scripts/
└── xxx/
└── filename.py # CONFIG = sio.load_configs()
# CONFIG.MODEL.DROPOUT → 0.3
# CONFIG.PREPROCESS.SAMPLE_RATE → 1000
CONFIG = sio.load_configs() # loads ./config/*.yaml
CONFIG.PREPROCESS.SAMPLE_RATE # 1000
# Debug mode: DEBUG_ prefixed keys override their counterparts
# In MODEL.yaml: { HIDDEN_DIM: 256, DEBUG_HIDDEN_DIM: 32 }
CONFIG = sio.load_configs(IS_DEBUG=True)
CONFIG.MODEL.HIDDEN_DIM # 32 (debug value promoted)
Debug mode for parameters
When debugging or developing, flipping parameters speeds up iteration.
Any DEBUG_* sibling overrides its non-debug counterpart at load time
(e.g. CONFIG.MY.DEBUG_PARAM replaces CONFIG.MY.PARAM), so a single
IS_DEBUG.yaml flips the whole project between production and debug
values.
Equivalent triggers — these three all enable debug mode:
IS_DEBUG.yamlwithIS_DEBUG: true,load_configs(IS_DEBUG=True), or running underCI=True.
4. Linter for Migration and Hooks
scitex-io ships 14 IO-specific (STX-IO001..014) and 5 path-handling
(STX-PA001..005) lint rules. They are detected automatically by
scitex-dev's linter,
which is already a hard dependency of scitex-io — no extra install
needed.
scitex-dev linter check-files src/ # lint a tree
scitex-dev linter list-rules --category io # show live rule definitions
Rule reference (STX-IO001..014 + STX-PA001..005)
| Rule | Severity | Trigger |
|---|---|---|
STX-IO001 |
warning | np.save / savez / savez_compressed / savetxt → use sio.save() |
STX-IO002 |
warning | np.load / loadtxt / genfromtxt → use sio.load() |
STX-IO003 |
warning | pd.read_csv / parquet / excel / hdf / pickle / json / feather / orc / table → use sio.load() |
STX-IO004 |
warning | df.to_csv / parquet / excel / hdf / pickle / json / feather / html / orc → use sio.save() |
STX-IO005 |
warning | pickle.dump / dumps / load / loads (incl. cPickle) → use sio.save()/load() |
STX-IO006 |
warning | json.dump / dumps / load / loads → use sio.save()/load() |
STX-IO007 |
warning | .savefig(...) → use sio.save(fig, path) for metadata embedding |
STX-IO008 |
warning | torch.save / load → use sio.save()/load() |
STX-IO009 |
warning | joblib.dump / load → use sio.save()/load() |
STX-IO010 |
warning | yaml.dump / safe_dump / dump_all / load / safe_load / full_load → use sio.save()/load() |
STX-IO011 |
warning | scipy.io.savemat / loadmat → use sio.save()/load() |
STX-IO012 |
warning | cv2.imread / imwrite, PIL.Image.open, plt.imsave / imread, imageio.* → use sio.save()/load() |
STX-IO013 |
warning | h5py.File(...) → use sio.save()/load() for HDF5 |
STX-IO014 |
warning | sio.save / load called with an extension that has no registered handler — register one with register_saver/register_loader |
STX-PA001 |
warning | Absolute path passed to sio.* — prefer relative for reproducibility |
STX-PA002 |
warning | open(...) → use sio.save()/load() for auto-logging |
STX-PA003 |
info | os.makedirs / mkdir — sio.save() auto-creates directories |
STX-PA004 |
warning | os.chdir(...) — scripts should run from project root |
STX-PA005 |
info | Relative path missing ./ prefix — use ./file.ext for explicit intent |
Claude Code Integration as a Hook
Wire scitex-io's lint rules into Claude Code so every Edit / Write
to a Python file is checked automatically — errors block the turn,
warnings surface as feedback.
Reference implementation:
examples/scitex_io_lint.sh— a self-contained PostToolUse hook (~15 lines) that you can copy into~/.claude/hooks/post-tool-use/(or into your project's.claude/hooks/).
1. Install the hook script:
cp examples/scitex_io_lint.sh ~/.claude/hooks/post-tool-use/
chmod +x ~/.claude/hooks/post-tool-use/scitex_io_lint.sh
2. Wire it up — add to ~/.claude/settings.json (or
<project>/.claude/settings.json for project-scoped):
{
"hooks": {
"PostToolUse": [
{
"matcher": "Edit|Write|MultiEdit",
"hooks": [
{ "type": "command",
"command": "~/.claude/hooks/post-tool-use/scitex_io_lint.sh" }
]
}
]
}
}
After that, every time Claude Code edits a .py file, an
STX-IO001..014 / STX-PA001..005 error blocks the turn and Claude
sees the rule message inline — agents converge on the canonical
sio.save() / sio.load() patterns instead of np.save / pd.read_csv / pickle.dump / ….
5. Etc.
Glob, parse, cache
paths = sio.glob("data/**/*.csv") # natural sort: 1, 2, 10
paths = sio.glob("results/{exp1,exp2}/*.npy") # brace expansion
paths, parsed = sio.parse_glob("sub_{id}/ses_{session}/*.vhdr")
# parsed = [{'id': '001', 'session': 'pre'}, ...]
dfs = sio.load("results/*.csv") # list of DataFrames
data = sio.load("large.hdf5"); data = sio.load("large.hdf5") # 2nd call: cache hit
Embed provenance into figures (embed_metadata)
sio.embed_metadata("figure.png", {
"experiment": "exp_042", "model": "resnet50",
"accuracy": 0.94, "timestamp": "2026-03-11",
})
meta = sio.read_metadata("figure.png")
meta["experiment"] # "exp_042"
Supports PNG (tEXt), JPEG (EXIF), SVG (XML metadata), PDF (XMP).
Four Interfaces
Python API ⭐⭐⭐
from scitex_io import save, load, list_formats, register_saver, register_loader
from scitex_io import load_configs, DotDict
from scitex_io import embed_metadata, read_metadata, has_metadata
save(obj, "path.ext") # Save any object
data = load("path.ext") # Load any file
fmts = list_formats() # Show all registered formats
cfg = load_configs() # Load ./config/*.yaml as DotDict
embed_metadata("fig.png", d) # Embed provenance into figure
CLI Commands ⭐
scitex-io --help-recursive # Show all commands
scitex-io info # Show registered formats
scitex-io configs # Load and display project configs
scitex-io configs -d ./my_configs # Custom config directory
scitex-io configs --json # Output as JSON
scitex-io list-python-apis -vv # List Python APIs with signatures
scitex-io --version # Show version
scitex-io mcp start # Start MCP server
scitex-io mcp doctor # Check MCP health
scitex-io mcp list-tools -vv # List MCP tools with parameters
MCP Server ⭐
AI agents can save, load, and discover formats autonomously.
| Tool | Description |
|---|---|
io_list_formats |
List all registered save/load formats |
io_load / io_save |
Load / save data in any supported format |
io_load_configs |
Load YAML project configurations |
io_register_info |
Show how to register custom formats |
io_glob / io_parse_glob |
Natsorted globbing with {placeholder} parsing |
io_get_loader / io_get_saver |
Look up the registered handler for an extension |
io_read_metadata / io_has_metadata / io_embed_metadata |
Image provenance metadata |
io_get_cache_info / io_clear_load_cache / io_configure_cache |
Load-cache management |
io_explore_h5 / io_explore_zarr |
Print group/dataset trees |
io_has_h5_key / io_has_zarr_key |
Cheap existence checks |
io_json2md |
Render JSON as Markdown |
io_skills_list / io_skills_get |
Discover and fetch skill pages |
scitex-io mcp start
Skills ⭐⭐
Skills provide structured documentation that AI agents can query to discover package capabilities, API signatures, and usage patterns.
scitex-io skills list # List available skill pages
scitex-io skills get save-and-load # Get detailed save/load documentation
scitex-io skills get glob # Get glob/parse_glob patterns
scitex-io skills get supported-formats # Get all format tables
| Skill | Content |
|---|---|
save-and-load |
Core API, path routing, symlinks, use_caller_path |
centralized-config |
load_configs(), DotDict, DEBUG_ override |
metadata-embedding |
Provenance in PNG/JPEG/SVG/PDF |
cache |
Load caching, reload, flush |
glob |
Pattern matching with natural sort and parsing |
linting-rules |
STX-IO001–007 lint rules |
supported-formats |
All 30+ format tables |
path-resolution |
Auto save-path routing, scitex.path utilities |
Also available via MCP: io_skills_list() / io_skills_get(name).
Part of SciTeX
scitex-io is part of SciTeX. Install via
the umbrella with pip install scitex[io] to use as
scitex.io (Python) or scitex io ... (CLI).
import scitex
@scitex.session
def main(CONFIG=scitex.INJECTED):
data = scitex.io.load("input.csv") # auto-tracked by clew
result = process(data)
scitex.io.save(result, "output.csv") # auto-tracked by clew
return 0
scitex.io delegates to scitex_io — they share the same API and registry.
The SciTeX system follows the Four Freedoms for Research below, inspired by the Free Software Definition:
Four Freedoms for Research
- The freedom to run your research anywhere — your machine, your terms.
- The freedom to study how every step works — from raw data to final manuscript.
- The freedom to redistribute your workflows, not just your papers.
- The freedom to modify any module and share improvements with the community.
AGPL-3.0 — because we believe research infrastructure deserves the same freedoms as the software it runs on.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file scitex_io-0.2.20.tar.gz.
File metadata
- Download URL: scitex_io-0.2.20.tar.gz
- Upload date:
- Size: 11.4 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f5cc2e22dcc0ea9ec51e67d46273247c98ab546c0b2a045d8b621dd6a58fce3b
|
|
| MD5 |
32cb25dcbf195a036a13291f9d2ef32e
|
|
| BLAKE2b-256 |
cb85e26039bde73385ce400a9b23af0249f7d61059d944e6a9ca0a88fd02f82f
|
Provenance
The following attestation bundles were made for scitex_io-0.2.20.tar.gz:
Publisher:
pypi-publish-and-github-release-on-tag.yml on ywatanabe1989/scitex-io
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
scitex_io-0.2.20.tar.gz -
Subject digest:
f5cc2e22dcc0ea9ec51e67d46273247c98ab546c0b2a045d8b621dd6a58fce3b - Sigstore transparency entry: 1691965040
- Sigstore integration time:
-
Permalink:
ywatanabe1989/scitex-io@da402c22cfb7d46cc98aa71d091c54352540c657 -
Branch / Tag:
refs/tags/v0.2.20 - Owner: https://github.com/ywatanabe1989
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi-publish-and-github-release-on-tag.yml@da402c22cfb7d46cc98aa71d091c54352540c657 -
Trigger Event:
push
-
Statement type:
File details
Details for the file scitex_io-0.2.20-py3-none-any.whl.
File metadata
- Download URL: scitex_io-0.2.20-py3-none-any.whl
- Upload date:
- Size: 6.0 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
82966243fbe58b3a07c5cac968b99c1943dc5419f6379bc275cacc9e884a2743
|
|
| MD5 |
cb30770893af222c8850de69cadfb6a8
|
|
| BLAKE2b-256 |
b71b8800b05579954b5c5de6b40a957831c49a796056c40b617e33800fa2c51c
|
Provenance
The following attestation bundles were made for scitex_io-0.2.20-py3-none-any.whl:
Publisher:
pypi-publish-and-github-release-on-tag.yml on ywatanabe1989/scitex-io
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
scitex_io-0.2.20-py3-none-any.whl -
Subject digest:
82966243fbe58b3a07c5cac968b99c1943dc5419f6379bc275cacc9e884a2743 - Sigstore transparency entry: 1691965354
- Sigstore integration time:
-
Permalink:
ywatanabe1989/scitex-io@da402c22cfb7d46cc98aa71d091c54352540c657 -
Branch / Tag:
refs/tags/v0.2.20 - Owner: https://github.com/ywatanabe1989
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi-publish-and-github-release-on-tag.yml@da402c22cfb7d46cc98aa71d091c54352540c657 -
Trigger Event:
push
-
Statement type: