The unified scientific research toolkit — reproducibility, lineage, units, literature graphs, hypothesis tracking, and bio-aware validation in one package.

Project description

sciforge

The unified scientific research toolkit — six modules, one coherent package that addresses the deepest pain points in reproducible, rigorous science.

pip install sciforge

Why sciforge?

Problem	Module
"My experiment isn't reproducible"	`repro` — scores your codebase, pinpoints the exact issues
"I can't track what I believed and why"	`hypotest` — version-controlled hypothesis ledger
"A unit mismatch crashed my analysis"	`unitflow` — physical units propagate through NumPy/Pandas
"My CV scores are inflated (same patient in train+test)"	`crossvalbio` — patient/batch/phylo/temporal-aware splits
"My figure is stale but I don't know which one"	`papertrail` — data→figure→claim lineage with staleness detection
"I can't see which papers my work builds on"	`litmap` — semantic citation dependency graphs
"I need a professional audit for submission"	`audit` — generates submission-ready Research Audit Trails (RAT)
"I need to share my work with reviewers"	`bundle` — packages everything into a reproducible ZIP archive

All modules share a common provenance store and work together seamlessly.

Quick start

import sciforge as sf

# 1 — Score reproducibility of a project
report = sf.repro.score("my_experiment/")
print(report.summary())
# Score: 78/100 (B — Good)
# Findings: 2 high, 1 medium
#   [HIGH] SEEDS  train.py:12 — stochastic ops but no seed set
#   [HIGH] HARDPATHS  preprocess.py:8 — /home/alice/data/train.csv

# 2 — Declare and test a hypothesis
h = sf.hypotest.declare("Drug X reduces IL-6 at 10mg/kg (in vivo)")
h.link_experiment("runs/exp_042")
h.test(p_value=0.028, effect_size=0.52, n=48,
       test_name="one-tailed t-test",
       notes="vs vehicle control")
print(h.status)   # SUPPORTED

# 3 — Physical units in pipelines
force = sf.unitflow.Quantity([9.8, 12.1], "N")
mass  = sf.unitflow.Quantity([1.0,  1.5], "kg")
accel = force / mass      # Quantity([9.8, 8.07], 'm / s²')

# Add unit contracts to functions
@sf.unitflow.requires_units(distance="m", time="s")
def speed(distance, time):
    return distance / time

# 4 — Biologically-aware cross-validation
from sciforge.crossvalbio import PatientSplit

for train, test in PatientSplit(n_splits=5, random_state=42).split(X, y, groups=patient_ids):
    model.fit(X[train], y[train])
    # Guaranteed: no patient in both train and test

# 5 — Data-to-claim lineage
with sf.papertrail.session("results/paper_v3") as trail:
    df    = trail.load("data/cleaned.csv")
    df2   = trail.transform(df, remove_outliers, "remove_outliers")
    fig   = trail.figure("fig2_roc", df2)
    trail.claim("AUC > 0.90 on held-out test set", fig)

# Later, after data changes:
report = sf.papertrail.check_stale("results/paper_v3")
print(report.summary())
# STALE: fig2_roc — data changed since last hash
# STALE: AUC > 0.90 on held-out test set — upstream data changed

# 6 — Literature dependency graph
graph = sf.litmap.build("papers.bib")
graph.show_dependencies("Attention Is All You Need", depth=2)
central = graph.most_central(n=5)
clusters = graph.cluster_by_keywords()

Module reference

`sciforge.repro`

report = repro.score("path/to/project/")    # or .py / .ipynb file
report.score           # int 0-100
report.findings        # list[Finding]
report.critical_findings()
report.summary()       # human-readable string
report.to_dict()       # JSON-serialisable

Checks: SEEDS, ENVLOCK, HARDPATHS, DATETIME, NOTEBOOK_ORDER, FLOATPREC, SECRETS, LEAKAGE

`sciforge.audit`

Generate a full Research Audit Trail (RAT) report for peer review:

from sciforge import audit

report = audit.generate_report()
print(report)

Or via the CLI:

sciforge audit

Includes reproducibility scores, hypothesis outcomes, data lineage, hardware snapshots, and carbon footprint estimation.

`sciforge.hypotest`

h = hypotest.declare("statement", alpha=0.05, tags=["bio"])
h.revise("new statement", reason="new data")
h.link_experiment("runs/exp_01")
h.test(p_value=0.02, effect_size=0.6, n=120, test_name="t-test")
h.retract("error found")
h.summary()

ledger = hypotest.ledger()
ledger.supported()        # list[Hypothesis]
ledger.pending()
ledger.by_tag("bio")

`sciforge.unitflow`

q = unitflow.Quantity(array, "N")
q + q2     # unit-checked addition
q / q2     # auto unit derivation (e.g. N/kg → m/s²)
q.to("kN") # unit conversion (requires pint)

uf = unitflow.UnitFrame(df, units={"col": "m"})
uf.derived("speed", lambda df: df["dist"] / df["time"], "m/s")
uf.to_quantity("speed")

@unitflow.requires_units(force="N", mass="kg")
def acceleration(force, mass): ...

`sciforge.crossvalbio`

Splitter	Use case
`PatientSplit`	Clinical / longitudinal data
`BatchAwareSplit`	Multi-batch experiments
`PhyloSplit`	Species / microbiome data
`TemporalBioSplit`	Time-series studies
`ComboSplit`	Multiple constraints combined

All are sklearn-compatible (implement split(X, y, groups)).

`sciforge.papertrail`

with papertrail.session("project/paper_v2") as trail:
    df    = trail.load("data.csv")
    df2   = trail.transform(df, fn, "step_name")
    fig   = trail.figure("fig1", df2)
    claim = trail.claim("Model X > baseline", fig)

report = papertrail.check_stale("project/paper_v2")
report.is_clean           # bool
report.stale              # list[TrailNode]
report.summary()

trail = papertrail.open_trail("project/paper_v2")
trail.lineage_of("fig1")  # all ancestor nodes
trail.claims()
trail.figures()

`sciforge.litmap`

graph = litmap.build("papers.bib")           # BibTeX file or dir
graph = litmap.build(["10.xxxx/doi"])        # DOI list (uses Crossref API)

graph.find("Attention")                      # title search
graph.dependencies("paper_id", depth=2)     # papers it cites
graph.dependents("paper_id")                # papers that cite it
graph.most_central(n=10)                    # by betweenness centrality
graph.roots()                               # uncited papers
graph.leaves()                              # papers not cited by others
graph.shortest_path("a", "b")              # citation path
graph.cluster_by_keywords()
graph.show_dependencies("Title fragment")

Requirements

Python ≥ 3.10
numpy, pandas, networkx, scipy, scikit-learn, rich, jinja2
Optional: pint (for full unit conversion in unitflow)

Running tests

pip install -e ".[dev]"
pytest tests/ -v

Contributing

Pull requests welcome! See CONTRIBUTING.md.

License

MIT

Project details

Release history Release notifications | RSS feed

This version

0.1.1

Mar 29, 2026

0.1.0

Mar 25, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sciforge-0.1.1.tar.gz (61.2 kB view details)

Uploaded Mar 29, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

sciforge-0.1.1-py3-none-any.whl (36.2 kB view details)

Uploaded Mar 29, 2026 Python 3

File details

Details for the file sciforge-0.1.1.tar.gz.

File metadata

Download URL: sciforge-0.1.1.tar.gz
Upload date: Mar 29, 2026
Size: 61.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for sciforge-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`8dd927a48bc9b1694c5b2db1174dc7756dcb278bdd6fc17e83b2d68ab74ac93f`
MD5	`bbdeadc6c389a644ba166f764c428349`
BLAKE2b-256	`0b3a039d1ae93738693800fc6c3d00d78525441e7230df761416c1816222434a`

See more details on using hashes here.

File details

Details for the file sciforge-0.1.1-py3-none-any.whl.

File metadata

Download URL: sciforge-0.1.1-py3-none-any.whl
Upload date: Mar 29, 2026
Size: 36.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for sciforge-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b53b85370fab457d6c500016948463ef11bee4725a703ee45f86474ab8a23332`
MD5	`4b423e572bfa16abb75456fe7de52caa`
BLAKE2b-256	`e809dfc81adeb62a3132739a33075c0f7100cc96c10b900e3fec418e505ae00f`

See more details on using hashes here.

sciforge 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

sciforge

Why sciforge?

Quick start

Module reference

`sciforge.repro`

`sciforge.audit`

`sciforge.hypotest`

`sciforge.unitflow`

`sciforge.crossvalbio`

`sciforge.papertrail`

`sciforge.litmap`

Requirements

Running tests

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes