Command-line research software for format-aware transcriptomic neurodegeneration risk scoring.
Project description
NeuroFate
NeuroFate: format-aware command-line software for endpoint-locked transcriptomic neurodegeneration risk scoring.
NeuroFate is a Python command-line research software package for reproducible donor/sample-level transcriptomic neurodegeneration-axis analysis. It inspects user-supplied expression and metadata tables, detects common table layouts, harmonizes gene/probe identifiers, locks endpoints before scoring, builds curated neurodegeneration-axis scores, writes research-use risk scores, and creates auditable reports.
Repository: https://github.com/sinhakrishnendu/NeuroFate.git
Current release-candidate version: 0.3.0
What NeuroFate Does
NeuroFate standardizes compact transcriptomic expression and metadata tables, locks user-specified endpoints, scores curated NeuroFate axes, writes research-use risk summaries, and produces reviewer-friendly audit reports.
Research-Use-Only Notice
NeuroFate is intended for research use only. It is not a clinical biomarker and is not validated for clinical diagnosis, patient-level decision-making, treatment selection, or care-delivery use. NeuroFate outputs are intended for cohort-level transcriptomic research, diagnosis-oriented research, endpoint-locked disease-state modelling, and reproducible software demonstrations.
Key Features
- CLI/PyPI-ready package with the console command
neurofate. - Format-aware ingestion through
neurofate ingest. - Complete public workflow through
neurofate run. - GEO series matrix support through direct parsing of
!series_matrix_table_beginexpression sections. - CSV, TSV, TXT, and
.gzinput support. - Genes-by-samples, samples-by-genes, and long-format expression support.
- Ensembl ID, gene-symbol, and microarray probe mapping support.
- Endpoint locking with explicit positive and negative classes.
- Curated NeuroFate axis scoring.
- Research-use risk scoring and Markdown reports.
- Leakage-audit and no-overclaiming audit scripts for repository-level validation.
- Endpoint adapter for compatibility between public CLI outputs and validation scripts.
- Real-world public GEO smoke test using GSE20141 and GPL570.
- Buildable wheel/sdist artifacts and reviewer-facing manuscript assets.
Installation
Install From PyPI
After public release:
python -m pip install neurofate
Install From GitHub
python -m pip install git+https://github.com/sinhakrishnendu/NeuroFate.git
Developer Install
git clone https://github.com/sinhakrishnendu/NeuroFate.git
cd NeuroFate/NeuroFate
python -m pip install -e ".[dev]"
Optional Extras
python -m pip install -e ".[plotting]"
python -m pip install -e ".[docs]"
python -m pip install -e ".[mps]"
python -m pip install -e ".[dev]"
The default package does not require Scanpy, AnnData, PyTorch, or matplotlib. PyTorch/MPS and plotting dependencies are optional.
Quick Start
Check the installation:
neurofate check-system
neurofate doctor
Run the bundled no-download demo:
neurofate run-demo
Run the full public workflow on a compact user dataset:
neurofate run \
--expression examples/format_examples/genes_by_samples/expression.tsv \
--metadata examples/format_examples/genes_by_samples/metadata.tsv \
--outdir results/neurofate_run
Expected top-level outputs include:
ingest/standardized_expression.tsv.gzingest/standardized_metadata.tsvaxis/axis_scores.tsvaxis/axis_feature_coverage.tsvaxis/label_summary.tsvrisk/neurofate_risk_scores.tsvrisk/risk_score_report.mdneurofate_run_report.mdrun_config.yaml
Public CLI Overview
Stable user-facing commands:
neurofate check-system
neurofate doctor
neurofate run-demo
neurofate ingest
neurofate build-axis-scores
neurofate score-risk
neurofate run
neurofate adapt-endpoint
neurofate check-system
Reports Python version, platform, and optional dependency availability.
neurofate doctor
Checks packaged resources and, in a repository checkout, core project files.
neurofate run-demo
Runs a small synthetic dataset without downloads and writes demo outputs under results/demo/.
neurofate ingest
Inspects expression and metadata tables, infers format, validates sample overlap and endpoint labels, maps genes/probes, writes standardized inputs, and reports warnings.
neurofate ingest \
--expression expression.tsv.gz \
--metadata metadata.tsv \
--outdir results/neurofate_ingest
neurofate build-axis-scores
Builds sample-level NeuroFate axis scores from compact or standardized inputs.
neurofate build-axis-scores \
--expression results/neurofate_ingest/standardized_expression.tsv.gz \
--metadata results/neurofate_ingest/standardized_metadata.tsv \
--axis-registry metadata/neurofate_axis_registry.tsv \
--sample-id-column sample_id \
--endpoint-column label__endpoint \
--positive-class 1 \
--negative-class 0 \
--outdir results/neurofate_axis
neurofate score-risk
Computes an exploratory research-use score from axis scores.
neurofate score-risk \
--axis-scores results/neurofate_axis/axis_scores.tsv \
--outdir results/neurofate_axis
neurofate run
Runs the complete public workflow:
ingest -> build-axis-scores -> score-risk -> report
neurofate run \
--expression expression.tsv.gz \
--metadata metadata.tsv \
--outdir results/neurofate_run
neurofate adapt-endpoint
Creates explicit endpoint aliases for validation scripts that expect task-specific label columns.
neurofate adapt-endpoint \
--metadata results/neurofate_run/ingest/standardized_metadata.tsv \
--endpoint-column label__endpoint \
--task pd_vs_control \
--outdir results/neurofate_run/adapted
Outputs:
adapted_metadata.tsvendpoint_aliases.tsvendpoint_adapter_report.md
The adapter copies binary 0/1 labels only. It does not reinterpret biological class direction.
neurofate make-report
make-report is a guarded repository workflow for generating reports from existing project outputs. It is useful in the full repository checkout but is not required for the public ingest/run workflow.
Advanced or experimental commands such as train-baseline, train-mps, validate-external, benchmark, and historical phase wrappers are retained for reproducibility. They are not the recommended first commands for new users.
Input Formats
NeuroFate public ingestion accepts compact text tables. It does not process raw FASTQ/FQ, SRA, CEL/CHP, H5AD/AnnData, or HDF5 single-cell containers.
Genes-by-Samples Matrix
gene_symbol S01 S02 S03
SNCA 0.2 0.4 0.8
GFAP 0.1 0.3 1.1
NEFL 1.2 1.0 0.7
Samples-by-Genes Matrix
sample_id SNCA GFAP NEFL
S01 0.2 0.1 1.2
S02 0.4 0.3 1.0
S03 0.8 1.1 0.7
Long Format
sample_id gene_symbol expression_value
S01 SNCA 0.2
S01 GFAP 0.1
S02 SNCA 0.4
GEO Series Matrix
!Series_title "Example GEO dataset"
!Sample_geo_accession "GSM1" "GSM2"
!series_matrix_table_begin
"ID_REF" "GSM1" "GSM2"
"1007_s_at" 1.2 1.5
!series_matrix_table_end
NeuroFate reads the expression table between !series_matrix_table_begin and !series_matrix_table_end. Supply a separate metadata table with sample identifiers matching the expression columns.
Ensembl-ID Matrix
ensembl_gene_id S01 S02
ENSG00000145335 0.2 0.4
ENSG00000131095 0.1 0.3
NeuroFate maps curated axis genes using metadata/neurofate_axis_gene_aliases.tsv.
Microarray Probe Matrix With Gene Map
Expression:
ID_REF GSM1 GSM2
probe_SNCA 0.2 0.4
probe_GFAP 0.1 0.3
Probe map:
probe_id gene_symbol
probe_SNCA SNCA
probe_GFAP GFAP
Command:
neurofate run \
--expression expression.tsv.gz \
--metadata metadata.tsv \
--gene-map probe_map.tsv \
--outdir results/neurofate_run
Compressed .gz files are supported for CSV, TSV, TXT, and GEO series matrix inputs.
Metadata Requirements
Metadata must contain:
- A sample identifier column such as
sample_id,geo_accession,donor_id,subject_id, orparticipant_id. - An endpoint column such as
diagnosis,disease_state,condition,group,status,phenotype, orlabel. - Positive and negative classes, either inferred or passed explicitly.
Example:
sample_id diagnosis
S01 Control
S02 AD
S03 AD
Explicit endpoint locking:
neurofate run \
--expression expression.tsv.gz \
--metadata metadata.tsv \
--endpoint-column diagnosis \
--positive-class AD \
--negative-class Control \
--outdir results/neurofate_run
Endpoint locking ensures the disease-state contrast is defined before score interpretation. NeuroFate does not scan all metadata labels to choose the strongest result.
Optional covariates such as age, sex, postmortem interval, brain region, and batch can be retained in source metadata, but the public axis-scoring workflow uses only the locked endpoint label and expression values.
Output File Dictionary
neurofate ingest writes:
standardized_expression.tsv.gz: NeuroFate axis-gene expression matrix with genes as rows and samples as columns.standardized_metadata.tsv: standardized sample metadata withsample_id,endpoint,label__endpoint, andresearch_use_only.input_schema_detected.tsv: detected delimiter, orientation, endpoint settings, feature counts, and retained genes.expression_metadata_join.tsv: expression/metadata sample-overlap audit.gene_mapping_report.tsv: input feature mapping and retention status.ingest_warnings.tsv: non-fatal warnings.ingest_report.md: human-readable ingest report.run_config.yaml: reproducibility settings for ingestion.
neurofate run additionally writes:
axis/axis_scores.tsv: sample-level axis scores.axis/axis_feature_coverage.tsv: mapped and missing genes per axis.axis/label_summary.tsv: locked endpoint label counts.axis/warnings.tsv: scoring warnings.risk/neurofate_risk_scores.tsv: exploratory research-use sample scores.risk/risk_score_report.md: risk-score report with research-use-only notice.neurofate_run_report.md: complete workflow report.run_config.yaml: top-level workflow configuration.
neurofate adapt-endpoint writes:
adapted_metadata.tsv: standardized metadata plus endpoint aliases.endpoint_aliases.tsv: alias mapping audit.endpoint_adapter_report.md: human-readable adapter report.
Real-World Example: GSE20141
GSE20141 is a public GEO laser-dissected substantia nigra pars compacta microarray cohort for Parkinson's disease versus control research. The final public CLI smoke test used:
GSE20141_series_matrix.txt.gzGPL570.annot.gz- parsed sample metadata
- GPL570 NeuroFate axis probe map
Command:
neurofate run \
--expression data/raw/end_user_smoke/gse20141/GSE20141_series_matrix.txt.gz \
--metadata results/end_user_smoke/gse20141/sample_metadata.tsv \
--gene-map results/end_user_smoke/gse20141/gpl570_axis_probe_mapping.tsv \
--outdir results/end_user_smoke/gse20141/neurofate_public_run_final \
--sample-id-column geo_accession \
--endpoint-column label__pd_vs_control \
--positive-class 1 \
--negative-class 0 \
--orientation auto \
--min-axis-genes 10
Result:
- Run status: passed.
- Samples matched: 18/18.
- Label counts: 10 PD and 8 controls.
- Retained NeuroFate genes: 29/30.
- Retained GPL570 probes: 79.
- Axes scored: 10/10.
- Research-use risk scores generated for 18 samples.
- No fatal ingest errors.
- Informative warnings: incomplete axis-gene coverage (29/30), unmapped non-axis probes, and multiple probes per retained gene.
Outputs are written under:
results/end_user_smoke/gse20141/neurofate_public_run_final/
Detailed smoke-test documentation:
docs/real_world_geo_smoke_test_gse20141.md
results/reports/final_gse20141_public_cli_smoke_test.md
NeuroFate Axes
The default axis registry is stored in metadata/neurofate_axis_registry.tsv and bundled as package data.
neuronal_vulnerability_axis: inhibitory/excitatory neuronal vulnerability markers and neurofilament genes.synuclein_mitochondrial_axis: synuclein, mitochondrial stress, and PD-relevant genes.astrocyte_stress_axis: astrocyte activation and stress-associated genes.inflammatory_microglial_axis: microglial and inflammatory response genes.myelin_oligodendrocyte_axis: myelin and oligodendrocyte genes.proteostasis_autophagy_axis: proteostasis, autophagy, and lysosomal/mitochondrial stress genes.amyloid_tau_axis: amyloid, presenilin, tau, and APOE-related genes.immune_antigen_presentation_axis: immune and antigen-presentation genes.vascular_barrier_axis: vascular, barrier, and inflammatory interaction genes.global_neurodegeneration_axis: broad neurodegeneration-associated axis.
Axes are research summaries of available expression features. They are not by themselves causal mechanisms or care-delivery tools.
Reproducibility
Install from source:
python -m pip install -e .
Run the demo:
neurofate run-demo
Run the real GEO smoke test after acquiring the public files:
neurofate run \
--expression data/raw/end_user_smoke/gse20141/GSE20141_series_matrix.txt.gz \
--metadata results/end_user_smoke/gse20141/sample_metadata.tsv \
--gene-map results/end_user_smoke/gse20141/gpl570_axis_probe_mapping.tsv \
--outdir results/end_user_smoke/gse20141/neurofate_public_run_final \
--sample-id-column geo_accession \
--endpoint-column label__pd_vs_control \
--positive-class 1 \
--negative-class 0 \
--orientation auto \
--min-axis-genes 10
GSE20141 checksums used in the local smoke test:
GSE20141_series_matrix.txt.gz:8975344b5a4715032bd07e08a7a94a68b811fddc59b1fbc53dcf204d1005cf4bGPL570.annot.gz:d7cd44352127b1e34f3a720ebea86093ef255a38f1612a85a2962b71bde8f394
Build the package:
python -m build --outdir dist_final
python -m twine check dist_final/*
Compile the manuscript:
latexmk -pdf manuscript/bioinformatics/neurofate_bioinformatics_full_methods_paper.tex
Testing
Core checks:
python -m py_compile scripts/*.py neurofate/*.py
python -m pytest \
tests/test_ingest_geo_series_matrix.py \
tests/test_ingest_format_detection.py \
tests/test_ingest_orientation_detection.py \
tests/test_ingest_gene_identifier_mapping.py \
tests/test_ingest_expression_metadata_join.py \
tests/test_neurofate_run_end_to_end.py \
tests/test_endpoint_adapter.py \
tests/test_public_cli_reports.py \
tests/test_research_use_only_outputs.py \
tests/test_pypi_packaging.py \
tests/test_cli_public_commands.py \
tests/test_bioinformatics_full_methods_manuscript.py
Test coverage includes:
- Public CLI availability.
- GEO series matrix parsing.
- CSV/TSV/GZ detection.
- Expression orientation detection.
- Ensembl and probe mapping.
- Expression/metadata sample joins.
- End-to-end
neurofate run. - Endpoint adapter safety.
- Research-use-only report language.
- Bioinformatics manuscript claim-safety checks.
Packaging and Release
Version: 0.3.0
dist/ is reserved for PyPI artifacts. Review ZIPs and manuscript/reviewer
packages should use release_artifacts/ or another explicit review directory.
Build artifacts:
python -m build --outdir dist_final
python -m twine check dist_final/*
Historical reviewer archive builders remain separate from PyPI artifacts. When used, they write review ZIPs such as:
release_artifacts/neurofate_source_release_<timestamp>.ziprelease_artifacts/neurofate_results_review_<timestamp>.zip
Before release:
- Confirm version consistency in
pyproject.toml,neurofate/__init__.py,CITATION.cff,codemeta.json,CHANGELOG.md, README, docs, and manuscript. - Confirm tests pass.
- Confirm wheel and source distribution pass
twine check. - Confirm GitHub repository visibility.
- Create release tag
v0.3.0. - Optionally dry-run TestPyPI.
- Publish to PyPI.
- Archive a GitHub release on Zenodo and update citation metadata with DOI.
Do not bundle large public datasets, controlled data, raw matrices, trained real-data models, or generated heavy outputs in the PyPI package.
Safety And Memory Design
NeuroFate public commands operate on compact donor/sample-level or axis-gene/probe tables. The public ingestion workflow does not process raw FASTQ/SRA, CEL/CHP, H5AD/AnnData, UMAP, clustering, or dense genome-wide converted matrices.
Current Validation Status
The current release is validated as research software through public CLI tests, format-aware ingestion tests, a bundled tiny demo, a real-world GSE20141 GEO smoke test, package build checks, and no-overclaiming audits. Biological cohort results are demonstration evidence and should not be interpreted as care-delivery validation.
Reviewer report generators remain lightweight and can be run from existing outputs, for example:
python scripts/51_generate_end_user_report.py --tables-dir results/tables --reports-dir results/reports
Troubleshooting
Sample IDs Do Not Match
Inspect:
ingest/expression_metadata_join.tsv
Common causes include whitespace, punctuation differences, using sample titles instead of accessions, or choosing the wrong sample ID column. Rerun with --sample-id-column.
Ambiguous Endpoint Column
Rerun with explicit endpoint settings:
--endpoint-column diagnosis --positive-class AD --negative-class Control
Too Few Axis Genes
Check:
ingest/gene_mapping_report.tsv
axis/axis_feature_coverage.tsv
Use --gene-map for microarray probes or an alias table for Ensembl IDs.
Unsupported Raw Formats
The public CLI rejects FASTQ/FQ, SRA, CEL/CHP, H5AD/AnnData, and HDF5 containers. Convert outside NeuroFate to compact sample-level or target-gene tables first.
Missing Gene Map for Microarray
Prepare a table with at least:
probe_id gene_symbol
Then pass:
--gene-map probe_map.tsv
GEO File Not Parsed
Confirm the file contains:
!series_matrix_table_begin
If the file is a SOFT/MINiML/platform annotation rather than a series matrix expression table, prepare the expression table separately.
Low Coverage Warnings
Low axis-gene coverage does not necessarily mean the run failed. It means interpretation should be cautious and platform coverage should be reported.
Citation
Use CITATION.cff for the software citation. Cite the Bioinformatics manuscript after publication and cite each external dataset according to its source-specific instructions.
Manuscript citation placeholder:
Ghosh N, Sinha K. NeuroFate: format-aware command-line software for endpoint-locked transcriptomic neurodegeneration risk scoring. Bioinformatics. In preparation.
Zenodo DOI placeholder: add after archiving the release.
License
NeuroFate is released under the MIT License. See LICENSE.
Contributing
See:
CONTRIBUTING.mdCODE_OF_CONDUCT.md
Contributions should preserve the research-use-only safety boundary, avoid care-delivery claims, and keep public commands reproducible on compact donor/sample-level inputs.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file neurofate-0.3.0.tar.gz.
File metadata
- Download URL: neurofate-0.3.0.tar.gz
- Upload date:
- Size: 155.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d583f575e067b3620af9806f79908a050d93f7519d7b39477f238f22b9da3e77
|
|
| MD5 |
2f7448b3247e2deef4ab79c5799adfe2
|
|
| BLAKE2b-256 |
318a1db8c6d77169a5c23909c66407db8add68f56db9ae6d132091c0457ecc0b
|
File details
Details for the file neurofate-0.3.0-py3-none-any.whl.
File metadata
- Download URL: neurofate-0.3.0-py3-none-any.whl
- Upload date:
- Size: 43.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8ba16a9d984288f1604ba5eb87aae2953da1df3e9e84b2a39b6d3a68e590dbde
|
|
| MD5 |
4360949bbb83f8724f80a07f0baa25a8
|
|
| BLAKE2b-256 |
f3dbdbc89cb99e69e8869315cbe5842c97b3d010662ef365860dbf0de4fe750a
|