Skip to main content

The unified scientific research toolkit — reproducibility, lineage, units, literature graphs, hypothesis tracking, and bio-aware validation in one package.

Project description

sciforge

The unified scientific research toolkit — six modules, one coherent package that addresses the deepest pain points in reproducible, rigorous science.

pip install sciforge

Why sciforge?

Problem Module
"My experiment isn't reproducible" repro — scores your codebase, pinpoints the exact issues
"I can't track what I believed and why" hypotest — version-controlled hypothesis ledger
"A unit mismatch crashed my analysis" unitflow — physical units propagate through NumPy/Pandas
"My CV scores are inflated (same patient in train+test)" crossvalbio — patient/batch/phylo/temporal-aware splits
"My figure is stale but I don't know which one" papertrail — data→figure→claim lineage with staleness detection
"I can't see which papers my work builds on" litmap — semantic citation dependency graphs

All six modules share a common provenance store and work together seamlessly.


Quick start

import sciforge as sf

# 1 — Score reproducibility of a project
report = sf.repro.score("my_experiment/")
print(report.summary())
# Score: 78/100 (B — Good)
# Findings: 2 high, 1 medium
#   [HIGH] SEEDS  train.py:12 — stochastic ops but no seed set
#   [HIGH] HARDPATHS  preprocess.py:8 — /home/alice/data/train.csv

# 2 — Declare and test a hypothesis
h = sf.hypotest.declare("Drug X reduces IL-6 at 10mg/kg (in vivo)")
h.link_experiment("runs/exp_042")
h.test(p_value=0.028, effect_size=0.52, n=48,
       test_name="one-tailed t-test",
       notes="vs vehicle control")
print(h.status)   # SUPPORTED

# 3 — Physical units in pipelines
force = sf.unitflow.Quantity([9.8, 12.1], "N")
mass  = sf.unitflow.Quantity([1.0,  1.5], "kg")
accel = force / mass      # Quantity([9.8, 8.07], 'm / s²')

# Add unit contracts to functions
@sf.unitflow.requires_units(distance="m", time="s")
def speed(distance, time):
    return distance / time

# 4 — Biologically-aware cross-validation
from sciforge.crossvalbio import PatientSplit

for train, test in PatientSplit(n_splits=5, random_state=42).split(X, y, groups=patient_ids):
    model.fit(X[train], y[train])
    # Guaranteed: no patient in both train and test

# 5 — Data-to-claim lineage
with sf.papertrail.session("results/paper_v3") as trail:
    df    = trail.load("data/cleaned.csv")
    df2   = trail.transform(df, remove_outliers, "remove_outliers")
    fig   = trail.figure("fig2_roc", df2)
    trail.claim("AUC > 0.90 on held-out test set", fig)

# Later, after data changes:
report = sf.papertrail.check_stale("results/paper_v3")
print(report.summary())
# STALE: fig2_roc — data changed since last hash
# STALE: AUC > 0.90 on held-out test set — upstream data changed

# 6 — Literature dependency graph
graph = sf.litmap.build("papers.bib")
graph.show_dependencies("Attention Is All You Need", depth=2)
central = graph.most_central(n=5)
clusters = graph.cluster_by_keywords()

Module reference

sciforge.repro

report = repro.score("path/to/project/")    # or .py / .ipynb file
report.score           # int 0-100
report.findings        # list[Finding]
report.critical_findings()
report.summary()       # human-readable string
report.to_dict()       # JSON-serialisable

Checks: SEEDS, ENVLOCK, HARDPATHS, DATETIME, NOTEBOOK_ORDER, FLOATPREC


sciforge.hypotest

h = hypotest.declare("statement", alpha=0.05, tags=["bio"])
h.revise("new statement", reason="new data")
h.link_experiment("runs/exp_01")
h.test(p_value=0.02, effect_size=0.6, n=120, test_name="t-test")
h.retract("error found")
h.summary()

ledger = hypotest.ledger()
ledger.supported()        # list[Hypothesis]
ledger.pending()
ledger.by_tag("bio")

sciforge.unitflow

q = unitflow.Quantity(array, "N")
q + q2     # unit-checked addition
q / q2     # auto unit derivation (e.g. N/kg → m/s²)
q.to("kN") # unit conversion (requires pint)

uf = unitflow.UnitFrame(df, units={"col": "m"})
uf.derived("speed", lambda df: df["dist"] / df["time"], "m/s")
uf.to_quantity("speed")

@unitflow.requires_units(force="N", mass="kg")
def acceleration(force, mass): ...

sciforge.crossvalbio

Splitter Use case
PatientSplit Clinical / longitudinal data
BatchAwareSplit Multi-batch experiments
PhyloSplit Species / microbiome data
TemporalBioSplit Time-series studies
ComboSplit Multiple constraints combined

All are sklearn-compatible (implement split(X, y, groups)).


sciforge.papertrail

with papertrail.session("project/paper_v2") as trail:
    df    = trail.load("data.csv")
    df2   = trail.transform(df, fn, "step_name")
    fig   = trail.figure("fig1", df2)
    claim = trail.claim("Model X > baseline", fig)

report = papertrail.check_stale("project/paper_v2")
report.is_clean           # bool
report.stale              # list[TrailNode]
report.summary()

trail = papertrail.open_trail("project/paper_v2")
trail.lineage_of("fig1")  # all ancestor nodes
trail.claims()
trail.figures()

sciforge.litmap

graph = litmap.build("papers.bib")           # BibTeX file or dir
graph = litmap.build(["10.xxxx/doi"])        # DOI list (uses Crossref API)

graph.find("Attention")                      # title search
graph.dependencies("paper_id", depth=2)     # papers it cites
graph.dependents("paper_id")                # papers that cite it
graph.most_central(n=10)                    # by betweenness centrality
graph.roots()                               # uncited papers
graph.leaves()                              # papers not cited by others
graph.shortest_path("a", "b")              # citation path
graph.cluster_by_keywords()
graph.show_dependencies("Title fragment")

Requirements

  • Python ≥ 3.10
  • numpy, pandas, networkx, scipy, scikit-learn, rich, jinja2
  • Optional: pint (for full unit conversion in unitflow)

Running tests

pip install -e ".[dev]"
pytest tests/ -v

Contributing

Pull requests welcome! See CONTRIBUTING.md.


License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sciforge-0.1.0.tar.gz (34.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sciforge-0.1.0-py3-none-any.whl (31.3 kB view details)

Uploaded Python 3

File details

Details for the file sciforge-0.1.0.tar.gz.

File metadata

  • Download URL: sciforge-0.1.0.tar.gz
  • Upload date:
  • Size: 34.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for sciforge-0.1.0.tar.gz
Algorithm Hash digest
SHA256 a7bcb175e4682cf8e693c03a7604a6758f2bd4000918cc982fb5c77c94f040d5
MD5 6e1aade8877cf1ca184d972e42286c99
BLAKE2b-256 286109b50ee7abccd1fe8df5c32d6c2bcf349ff631022a8760cfde9e8e6c8f53

See more details on using hashes here.

File details

Details for the file sciforge-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: sciforge-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 31.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for sciforge-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c7ae4df24f43810a768a0fee452aa73b810f89d3eed57894bd9d91ee46f7e132
MD5 eafb128a467d1bec025fce59d81db71f
BLAKE2b-256 93d94d0e9be5a991f397068303ed3663518466f3d45f7b4fb5f0b18a1eb6baef

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page