The unified scientific research toolkit — reproducibility, lineage, units, literature graphs, hypothesis tracking, and bio-aware validation in one package.
Project description
sciforge
The unified scientific research toolkit — six modules, one coherent package that addresses the deepest pain points in reproducible, rigorous science.
pip install sciforge
Why sciforge?
| Problem | Module |
|---|---|
| "My experiment isn't reproducible" | repro — scores your codebase, pinpoints the exact issues |
| "I can't track what I believed and why" | hypotest — version-controlled hypothesis ledger |
| "A unit mismatch crashed my analysis" | unitflow — physical units propagate through NumPy/Pandas |
| "My CV scores are inflated (same patient in train+test)" | crossvalbio — patient/batch/phylo/temporal-aware splits |
| "My figure is stale but I don't know which one" | papertrail — data→figure→claim lineage with staleness detection |
| "I can't see which papers my work builds on" | litmap — semantic citation dependency graphs |
All six modules share a common provenance store and work together seamlessly.
Quick start
import sciforge as sf
# 1 — Score reproducibility of a project
report = sf.repro.score("my_experiment/")
print(report.summary())
# Score: 78/100 (B — Good)
# Findings: 2 high, 1 medium
# [HIGH] SEEDS train.py:12 — stochastic ops but no seed set
# [HIGH] HARDPATHS preprocess.py:8 — /home/alice/data/train.csv
# 2 — Declare and test a hypothesis
h = sf.hypotest.declare("Drug X reduces IL-6 at 10mg/kg (in vivo)")
h.link_experiment("runs/exp_042")
h.test(p_value=0.028, effect_size=0.52, n=48,
test_name="one-tailed t-test",
notes="vs vehicle control")
print(h.status) # SUPPORTED
# 3 — Physical units in pipelines
force = sf.unitflow.Quantity([9.8, 12.1], "N")
mass = sf.unitflow.Quantity([1.0, 1.5], "kg")
accel = force / mass # Quantity([9.8, 8.07], 'm / s²')
# Add unit contracts to functions
@sf.unitflow.requires_units(distance="m", time="s")
def speed(distance, time):
return distance / time
# 4 — Biologically-aware cross-validation
from sciforge.crossvalbio import PatientSplit
for train, test in PatientSplit(n_splits=5, random_state=42).split(X, y, groups=patient_ids):
model.fit(X[train], y[train])
# Guaranteed: no patient in both train and test
# 5 — Data-to-claim lineage
with sf.papertrail.session("results/paper_v3") as trail:
df = trail.load("data/cleaned.csv")
df2 = trail.transform(df, remove_outliers, "remove_outliers")
fig = trail.figure("fig2_roc", df2)
trail.claim("AUC > 0.90 on held-out test set", fig)
# Later, after data changes:
report = sf.papertrail.check_stale("results/paper_v3")
print(report.summary())
# STALE: fig2_roc — data changed since last hash
# STALE: AUC > 0.90 on held-out test set — upstream data changed
# 6 — Literature dependency graph
graph = sf.litmap.build("papers.bib")
graph.show_dependencies("Attention Is All You Need", depth=2)
central = graph.most_central(n=5)
clusters = graph.cluster_by_keywords()
Module reference
sciforge.repro
report = repro.score("path/to/project/") # or .py / .ipynb file
report.score # int 0-100
report.findings # list[Finding]
report.critical_findings()
report.summary() # human-readable string
report.to_dict() # JSON-serialisable
Checks: SEEDS, ENVLOCK, HARDPATHS, DATETIME, NOTEBOOK_ORDER, FLOATPREC
sciforge.hypotest
h = hypotest.declare("statement", alpha=0.05, tags=["bio"])
h.revise("new statement", reason="new data")
h.link_experiment("runs/exp_01")
h.test(p_value=0.02, effect_size=0.6, n=120, test_name="t-test")
h.retract("error found")
h.summary()
ledger = hypotest.ledger()
ledger.supported() # list[Hypothesis]
ledger.pending()
ledger.by_tag("bio")
sciforge.unitflow
q = unitflow.Quantity(array, "N")
q + q2 # unit-checked addition
q / q2 # auto unit derivation (e.g. N/kg → m/s²)
q.to("kN") # unit conversion (requires pint)
uf = unitflow.UnitFrame(df, units={"col": "m"})
uf.derived("speed", lambda df: df["dist"] / df["time"], "m/s")
uf.to_quantity("speed")
@unitflow.requires_units(force="N", mass="kg")
def acceleration(force, mass): ...
sciforge.crossvalbio
| Splitter | Use case |
|---|---|
PatientSplit |
Clinical / longitudinal data |
BatchAwareSplit |
Multi-batch experiments |
PhyloSplit |
Species / microbiome data |
TemporalBioSplit |
Time-series studies |
ComboSplit |
Multiple constraints combined |
All are sklearn-compatible (implement split(X, y, groups)).
sciforge.papertrail
with papertrail.session("project/paper_v2") as trail:
df = trail.load("data.csv")
df2 = trail.transform(df, fn, "step_name")
fig = trail.figure("fig1", df2)
claim = trail.claim("Model X > baseline", fig)
report = papertrail.check_stale("project/paper_v2")
report.is_clean # bool
report.stale # list[TrailNode]
report.summary()
trail = papertrail.open_trail("project/paper_v2")
trail.lineage_of("fig1") # all ancestor nodes
trail.claims()
trail.figures()
sciforge.litmap
graph = litmap.build("papers.bib") # BibTeX file or dir
graph = litmap.build(["10.xxxx/doi"]) # DOI list (uses Crossref API)
graph.find("Attention") # title search
graph.dependencies("paper_id", depth=2) # papers it cites
graph.dependents("paper_id") # papers that cite it
graph.most_central(n=10) # by betweenness centrality
graph.roots() # uncited papers
graph.leaves() # papers not cited by others
graph.shortest_path("a", "b") # citation path
graph.cluster_by_keywords()
graph.show_dependencies("Title fragment")
Requirements
- Python ≥ 3.10
numpy,pandas,networkx,scipy,scikit-learn,rich,jinja2- Optional:
pint(for full unit conversion inunitflow)
Running tests
pip install -e ".[dev]"
pytest tests/ -v
Contributing
Pull requests welcome! See CONTRIBUTING.md.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sciforge-0.1.0.tar.gz.
File metadata
- Download URL: sciforge-0.1.0.tar.gz
- Upload date:
- Size: 34.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a7bcb175e4682cf8e693c03a7604a6758f2bd4000918cc982fb5c77c94f040d5
|
|
| MD5 |
6e1aade8877cf1ca184d972e42286c99
|
|
| BLAKE2b-256 |
286109b50ee7abccd1fe8df5c32d6c2bcf349ff631022a8760cfde9e8e6c8f53
|
File details
Details for the file sciforge-0.1.0-py3-none-any.whl.
File metadata
- Download URL: sciforge-0.1.0-py3-none-any.whl
- Upload date:
- Size: 31.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c7ae4df24f43810a768a0fee452aa73b810f89d3eed57894bd9d91ee46f7e132
|
|
| MD5 |
eafb128a467d1bec025fce59d81db71f
|
|
| BLAKE2b-256 |
93d94d0e9be5a991f397068303ed3663518466f3d45f7b4fb5f0b18a1eb6baef
|