Universal scientific data I/O with plugin registry
Project description
scitex-io
Universal scientific data I/O with plugin registry
Full Documentation · pip install scitex-io
Problem
Three problems recur in every scientific Python project:
-
Format fragmentation. Loading a CSV requires
pandas.read_csv(), an HDF5 file requiresh5py.File(), a NumPy array requiresnumpy.load(). Each format demands its own library, its own API, and its own boilerplate. Operating systems solved this decades ago — double-click any file and the OS dispatches to the right application. Python has no equivalent. -
Hard-coded parameters scattered across scripts. Sample rates, thresholds, model hyperparameters, plot dimensions — magic numbers buried in code, duplicated across files, impossible to track or share. Changing one parameter means grepping through the entire project.
-
Figures without provenance. A saved PNG has no record of the code, parameters, or session that produced it. Months later, reproducing a figure means reverse-engineering which script with which settings generated it.
Solution
scitex-io addresses all three:
save()/load()— One interface for 30+ formats with automatic extension-based dispatch. A plugin registry lets you add custom formats without modifying the library.load_configs()— Loads all YAML files from aconfig/directory into a singleDotDictwith dot-notation access. Parameters are version-controlled, centralized, and separate from code.embed_metadata()/read_metadata()— Embeds provenance (timestamps, session IDs, parameters) directly into image and PDF files. The figure carries its own history.
Supported Formats (30+)
| Category | Extensions |
|---|---|
| Spreadsheet | .csv, .tsv, .xlsx, .xls |
| Scientific | .npy, .npz, .mat, .hdf5, .h5, .zarr |
| Serialization | .pkl, .pickle, .pkl.gz, .joblib |
| ML/DL | .pth, .pt, .cbm |
| Config | .json, .yaml, .yml |
| Documents | .txt, .md, .pdf, .docx, .tex |
| Images | .png, .jpg, .jpeg, .gif, .tiff, .tif, .svg |
| Media | .mp4 |
| Web | .html |
| Bibliography | .bib |
Installation
Requires Python >= 3.9.
pip install scitex-io
For MCP server support:
pip install scitex-io[mcp]
SciTeX users:
pip install scitexalready includes scitex-io.
Quickstart
Save and Load
from scitex_io import save, load
# Universal save/load — format auto-detected from extension
import pandas as pd
df = pd.DataFrame({"x": [1, 2, 3], "y": [4, 5, 6]})
save(df, "data.csv")
loaded = load("data.csv")
# 30+ formats work the same way
import numpy as np
save(np.array([1, 2, 3]), "data.npy")
save({"key": "value"}, "config.yaml")
save({"nested": [1, 2]}, "data.json")
Project Configuration
Hard-coded parameters belong in config files, not in code. Use UPPER_CASE keys — Python's convention for constants — to signal that these are user-defined values:
project/
config/
PATHS.yaml # DATA_DIR: /data/experiment_01
PREPROCESS.yaml # SAMPLE_RATE: 1000, BANDPASS: [0.5, 40]
MODEL.yaml # HIDDEN_DIM: 256, DROPOUT: 0.3
PLOT.yaml # FIGSIZE: [180, 60], DPI: 300
IS_DEBUG.yaml # IS_DEBUG: true
from scitex_io import load_configs
CONFIG = load_configs() # loads ./config/*.yaml
CONFIG.PATHS.DATA_DIR # "/data/experiment_01"
CONFIG.PREPROCESS.SAMPLE_RATE # 1000
CONFIG.MODEL.HIDDEN_DIM # 256
# Debug mode: DEBUG_ prefixed keys override their counterparts
# In MODEL.yaml: { HIDDEN_DIM: 256, DEBUG_HIDDEN_DIM: 32 }
CONFIG = load_configs(IS_DEBUG=True)
CONFIG.MODEL.HIDDEN_DIM # 32 (debug value promoted)
Returns a DotDict — a nested dictionary with dot-notation access. Parameters become version-controlled, shareable, and separate from code.
Metadata Embedding
Embed provenance into figures so they carry their own history:
from scitex_io import embed_metadata, read_metadata, has_metadata
# Embed metadata into an image
embed_metadata("figure.png", {
"experiment": "exp_042",
"model": "resnet50",
"accuracy": 0.94,
"timestamp": "2026-03-11",
})
# Read it back — months later, from the file alone
meta = read_metadata("figure.png")
print(meta["experiment"]) # "exp_042"
# Check if a file has embedded metadata
has_metadata("figure.png") # True
Supports PNG (tEXt chunks), JPEG (EXIF), SVG (XML metadata), and PDF (Info Dictionary).
Custom Format Registration
from scitex_io import register_saver, register_loader, save, load
@register_saver(".custom")
def save_custom(obj, path, **kwargs):
with open(path, "w") as f:
f.write(str(obj))
@register_loader(".custom")
def load_custom(path, **kwargs):
with open(path) as f:
return f.read()
save("hello", "data.custom")
assert load("data.custom") == "hello"
Three Interfaces
Python API
from scitex_io import save, load, list_formats, register_saver, register_loader
from scitex_io import load_configs, DotDict
from scitex_io import embed_metadata, read_metadata, has_metadata
save(obj, "path.ext") # Save any object
data = load("path.ext") # Load any file
fmts = list_formats() # Show all registered formats
cfg = load_configs() # Load ./config/*.yaml as DotDict
embed_metadata("fig.png", d) # Embed provenance into figure
CLI Commands
scitex-io --help-recursive # Show all commands
scitex-io info # Show registered formats
scitex-io configs # Load and display project configs
scitex-io configs -d ./my_configs # Custom config directory
scitex-io configs --json # Output as JSON
scitex-io list-python-apis -vv # List Python APIs with signatures
scitex-io version # Show version
scitex-io mcp start # Start MCP server
scitex-io mcp doctor # Check MCP health
scitex-io mcp list-tools -vv # List MCP tools with parameters
MCP Server — for AI Agents
AI agents can save, load, and discover formats autonomously.
| Tool | Description |
|---|---|
io_list_formats |
List all registered save/load formats |
io_load |
Load data from any supported format |
io_save |
Save data to any supported format |
io_load_configs |
Load YAML project configurations |
io_register_info |
Show how to register custom formats |
scitex-io mcp start
Lint Rules
Detected by scitex-linter when this package is installed.
| Rule | Severity | Message |
|---|---|---|
STX-IO001 |
warning | np.save() detected — use stx.io.save() for provenance tracking |
STX-IO002 |
warning | np.load() detected — use stx.io.load() for provenance tracking |
STX-IO003 |
warning | pd.read_csv() detected — use stx.io.load() for provenance tracking |
STX-IO004 |
warning | .to_csv() detected — use stx.io.save() for provenance tracking |
STX-IO005 |
warning | pickle.dump() detected — use stx.io.save() for provenance tracking |
STX-IO006 |
warning | json.dump() detected — use stx.io.save() for provenance tracking |
STX-IO007 |
warning | .savefig() detected — use stx.io.save(fig, path) for metadata embedding |
Part of SciTeX
scitex-io is part of SciTeX. When used inside the SciTeX framework, I/O is seamless:
import scitex
@scitex.session
def main(CONFIG=scitex.INJECTED):
data = scitex.io.load("input.csv") # auto-tracked by clew
result = process(data)
scitex.io.save(result, "output.csv") # auto-tracked by clew
return 0
scitex.io delegates to scitex_io — they share the same API and registry.
The SciTeX ecosystem follows the Four Freedoms for researchers:
Four Freedoms for Research
- The freedom to run your research anywhere — your machine, your terms.
- The freedom to study how every step works — from raw data to final manuscript.
- The freedom to redistribute your workflows, not just your papers.
- The freedom to modify any module and share improvements with the community.
AGPL-3.0 — because research infrastructure deserves the same freedoms as the software it runs on.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file scitex_io-0.2.0.tar.gz.
File metadata
- Download URL: scitex_io-0.2.0.tar.gz
- Upload date:
- Size: 467.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7e5d3a934ae8747130711623b96681825f63ab4fa05d0ea9b3aa222621dc8bf5
|
|
| MD5 |
18517291549eb4c4e14efc33ef37e3e3
|
|
| BLAKE2b-256 |
d42ab7608a956f99eda83fb55178939c453f65cf1503841cc47e008717c145d8
|
Provenance
The following attestation bundles were made for scitex_io-0.2.0.tar.gz:
Publisher:
publish-pypi.yml on ywatanabe1989/scitex-io
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
scitex_io-0.2.0.tar.gz -
Subject digest:
7e5d3a934ae8747130711623b96681825f63ab4fa05d0ea9b3aa222621dc8bf5 - Sigstore transparency entry: 1079809116
- Sigstore integration time:
-
Permalink:
ywatanabe1989/scitex-io@a083684c15827113f03f45967a1cdd444aba2f29 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/ywatanabe1989
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@a083684c15827113f03f45967a1cdd444aba2f29 -
Trigger Event:
push
-
Statement type:
File details
Details for the file scitex_io-0.2.0-py3-none-any.whl.
File metadata
- Download URL: scitex_io-0.2.0-py3-none-any.whl
- Upload date:
- Size: 133.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c411f729d980e454342dcbe912cfe6228ce3d3d4792a452821622cfca06a605d
|
|
| MD5 |
079e74e79f2ee9fcf5b24594a6fc81db
|
|
| BLAKE2b-256 |
000bd57412d484582fd332ce6f3798cbd1c4a41c0e0e1078e5cc2a471c4ebbe9
|
Provenance
The following attestation bundles were made for scitex_io-0.2.0-py3-none-any.whl:
Publisher:
publish-pypi.yml on ywatanabe1989/scitex-io
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
scitex_io-0.2.0-py3-none-any.whl -
Subject digest:
c411f729d980e454342dcbe912cfe6228ce3d3d4792a452821622cfca06a605d - Sigstore transparency entry: 1079809178
- Sigstore integration time:
-
Permalink:
ywatanabe1989/scitex-io@a083684c15827113f03f45967a1cdd444aba2f29 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/ywatanabe1989
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@a083684c15827113f03f45967a1cdd444aba2f29 -
Trigger Event:
push
-
Statement type: