Local map reliability from cryo-EM density and half-maps
Project description
cryoem-halfmap-qc
Python tools for local map reliability in cryo-EM reconstructions: density statistics, half-map reproducibility, windowed local FSC (Å), a reproducibility score (H_repro), and build/caution/omit zones.
The goal is to test whether inexpensive map features track half-map cross-correlation and local FSC well enough to guide modeling. This is not a claim that density alone defines molecular flexibility.
All volumes use NumPy 3D arrays in (Z, Y, X) order (section, row, column), consistent with typical mrcfile layouts.
Install
PyPI: not published yet (pip install cryoem-halfmap-qc will fail until the first release is uploaded). See Publishing to PyPI below.
Until then, install from GitHub or a local checkout:
git clone https://github.com/sarthaktexas/cryoem-halfmap-qc.git
cd cryoem-halfmap-qc
pip install -e .
# or without cloning:
pip install "git+https://github.com/sarthaktexas/cryoem-halfmap-qc.git@v0.3.2"
This installs the halfmap-qc command on your PATH (PyPI package name will be cryoem-halfmap-qc once published).
Help & interactive mode:
halfmap-qc # interactive menu (when run in a terminal)
halfmap-qc help # full command reference
halfmap-qc --help # argparse summary + examples
halfmap-qc cohort --help # flags for one subcommand
halfmap-qc interactive # menu explicitly
Dependencies: NumPy, SciPy, mrcfile, Matplotlib, gemmi, pandas.
Data layout
Cryo-EM maps are not stored in this repository (too large for git). After cloning, create local directories:
data/emd_<ID>-<label>/ # deposited map + half-maps (.map or .mrc)
outputs/emd_<ID>/ # pipeline products (created by scripts)
cohort/manifest.csv # EMDB IDs, relative paths, contours, validation labels
Download deposited and half maps from EMDB. Use the depositor-recommended contour for each entry (listed in cohort/manifest.csv). See docs/COHORT.md for download status and pipeline progress.
Quick start
Run from the project root (where data/ and cohort/manifest.csv live).
Single-map features:
halfmap-qc features path/to/map.mrc --out map_features.npz --float32
# shorthand (legacy): halfmap-qc path/to/map.mrc --out map_features.npz --float32
Typical workflow (features on avg-of-halves; reliability MRCs on deposited primary grid):
EMD=49450
CONTOUR=0.116
DATA=data/emd_${EMD}-mgtA_e2p+e1
halfmap-qc analyze \
--features "${DATA}/emd_${EMD}_avg_features_t0116.npz" \
--half1 "${DATA}/emd_${EMD}_half_map_1.map" \
--half2 "${DATA}/emd_${EMD}_half_map_2.map" \
--reference "${DATA}/emd_${EMD}.map" \
--contour "${CONTOUR}" \
--out-dir "outputs/emd_${EMD}/analysis"
halfmap-qc reliability --emd-id "${EMD}" --contour "${CONTOUR}" \
--features "${DATA}/emd_${EMD}_avg_features_t0116.npz" \
--halfmap-npz "outputs/emd_${EMD}/analysis/halfmap_metrics.npz"
Cohort batch (all active manifest entries with local data):
halfmap-qc cohort --pending
ARC / SLURM (one map per array task; save a local *.sbatch — not in git):
# After pip install -e . and rsync data/ + cohort/manifest.csv to $SCRATCH/thesis
N=$(($(halfmap-qc cohort-ids | wc -l) - 1))
sbatch --account=wrz135 --array=0-${N} --cpus-per-task=4 --mem=32G --time=00:45:00 \
--wrap='halfmap-qc cohort --emd-id $(halfmap-qc cohort-ids | sed -n "$((SLURM_ARRAY_TASK_ID+1))p")'
Or save a multi-line script as e.g. ~/halfmap-qc_array.sbatch (gitignored) and sbatch --array=0-${N} ~/halfmap-qc_array.sbatch.
CLI (halfmap-qc)
| Command | Purpose |
|---|---|
| (no args, TTY) | Interactive menu |
halfmap-qc help |
Full reference + install notes |
halfmap-qc features |
Local density / multiscale features → .npz |
halfmap-qc analyze |
Windowed half-map CC + feature correlations |
halfmap-qc reliability |
Reliability score, build zones, MRC export |
halfmap-qc cohort |
Batch pipeline from cohort/manifest.csv |
halfmap-qc cohort-ids |
Print EMDB IDs (for SLURM array jobs) |
halfmap-qc interactive |
Interactive menu (same as bare halfmap-qc) |
Legacy: python -m cryoem_mrc still works (same as halfmap-qc features).
Publishing to PyPI
One-time setup:
- Create an account at pypi.org (and optionally test.pypi.org for a dry run).
- On PyPI → Your projects → Add new project → name it
cryoem-halfmap-qc(or claim it when uploading). - On PyPI → Account settings → Publishing → Add a new pending publisher:
- PyPI project:
cryoem-halfmap-qc - Owner:
sarthaktexas(your GitHub user/org) - Repository:
cryoem-halfmap-qc - Workflow:
publish.yml - Environment: (leave blank unless you use one)
- PyPI project:
Release:
# bump version in pyproject.toml first, then:
git add pyproject.toml cryoem_mrc/__init__.py
git commit -m "Release v0.3.2"
git tag v0.3.2
git push origin main --tags
On GitHub → Releases → Draft a new release → choose tag v0.3.2 → Publish release. The .github/workflows/publish.yml workflow builds the wheel and uploads to PyPI.
Test install after publish:
pip install cryoem-halfmap-qc
halfmap-qc --version
Manual upload (without GitHub Actions):
pip install build twine
python -m build
twine upload dist/*
Scripts (thesis / optional)
Thesis figure runners (scripts/rerun_all_figures.py, scripts/run_cohort_summary_figures.py, Figma export scripts, etc.) and cryoem_mrc/thesis_figures.py are local-only (gitignored) like figma-plugins/. Clone the repo on a machine that already has those files, or keep a local copy from before they were untracked.
Python API (high level)
import numpy as np
from cryoem_mrc import load_full_and_half_maps, run_pipeline, half_map_local_metrics
from cryoem_mrc.reliability import compute_reliability_maps, classify_build_zones
bundle = load_full_and_half_maps(
"full.mrc", "half1.mrc", "half2.mrc", dtype=np.float32, resample_if_needed=True
)
metrics = half_map_local_metrics(bundle.half1, bundle.half2, window=5)
# metrics["windowed_halfmap_correlation"], etc.
features = run_pipeline("map.mrc", use_float32=True)
reliability = compute_reliability_maps(
bundle.half1, bundle.half2,
density_normalized=features["density_normalized"],
window=5,
)
zones = classify_build_zones(reliability["reliability_score"])
Package modules: io, map_grid, local_stats, multiscale, half_map_repro, local_fsc, mechanics, reliability, analysis, structure_validation. Path helpers: cryoem_mrc/repo_paths.py.
Methods summary
- Windowed half-map correlation is the fast internal reproducibility target for feature validation; local FSC resolution (Å) is the field-standard reference.
- Local FSC is computed in-repo (
cryoem_mrc.local_fsc); external BlocRes / ResMap / MonoRes maps are not loaded. - H_repro is the windowed gradient-constraint map V (legacy export name; ranked as reliability_score); reliability_score is an in-mask percentile used for build/caution/omit terciles. Resolvability gating uses windowed half-map CC or local FSC, not a separate disagreement map.
- Local variance is often the strongest single feature predictor of windowed half-map correlation.
Thesis prose: full narrative draft in docs/THESIS_NARRATIVE.md. Writing guide and defense notes in docs/THESIS_AND_PUBLICATION.md.
Tests
python -m unittest discover -s tests -v
Citation
Before the manuscript is published, cite the software with the Zenodo concept DOI (resolves to the latest release; pin v0.3.2 or a commit hash for exact reproducibility):
@software{mohanty2026cryoem_halfmap_qc,
author = {Mohanty, Sarthak},
title = {cryoem-halfmap-qc: local map reliability from cryo-EM density and half-maps},
year = {2026},
doi = {10.5281/zenodo.20618526},
url = {https://doi.org/10.5281/zenodo.20618526},
version = {0.3.2}
}
GitHub also reads CITATION.cff for the Cite this repository button.
After publication, cite the paper as the primary reference. Also cite this Zenodo archive when you need the exact pipeline version used in the work.
When the manuscript exists, add a preferred-citation block to CITATION.cff (template included there) and drop the BibTeX for the article into this section.
License
MIT License. See LICENSE.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cryoem_halfmap_qc-0.3.2.tar.gz.
File metadata
- Download URL: cryoem_halfmap_qc-0.3.2.tar.gz
- Upload date:
- Size: 157.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
46ffc3990866544d0dbdb17a1105fc7f159ca0d950a133633d56602a6d2660b8
|
|
| MD5 |
addb462ebf1ca9275ad446b6f323665b
|
|
| BLAKE2b-256 |
9be1ec6f6fe18979b6113443df97f0890d7088be41a1adc2c3c7ed51951dfd82
|
Provenance
The following attestation bundles were made for cryoem_halfmap_qc-0.3.2.tar.gz:
Publisher:
publish.yml on sarthaktexas/cryoem-halfmap-qc
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
cryoem_halfmap_qc-0.3.2.tar.gz -
Subject digest:
46ffc3990866544d0dbdb17a1105fc7f159ca0d950a133633d56602a6d2660b8 - Sigstore transparency entry: 1969285864
- Sigstore integration time:
-
Permalink:
sarthaktexas/cryoem-halfmap-qc@cb5df4ad53d6c3ae4e6dcc4d16a50295da123260 -
Branch / Tag:
refs/tags/v0.3.2 - Owner: https://github.com/sarthaktexas
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@cb5df4ad53d6c3ae4e6dcc4d16a50295da123260 -
Trigger Event:
release
-
Statement type:
File details
Details for the file cryoem_halfmap_qc-0.3.2-py3-none-any.whl.
File metadata
- Download URL: cryoem_halfmap_qc-0.3.2-py3-none-any.whl
- Upload date:
- Size: 164.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
49a332f156854f2dcb17457c25b2d0426a478ead67ef455d0d02ab9a58519b0a
|
|
| MD5 |
58b1abe0ebf7fef1b1e7d7eebeef2849
|
|
| BLAKE2b-256 |
4a6c6596365debd12e799c80a3a226acde938e172623bf4ab677063728301a6f
|
Provenance
The following attestation bundles were made for cryoem_halfmap_qc-0.3.2-py3-none-any.whl:
Publisher:
publish.yml on sarthaktexas/cryoem-halfmap-qc
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
cryoem_halfmap_qc-0.3.2-py3-none-any.whl -
Subject digest:
49a332f156854f2dcb17457c25b2d0426a478ead67ef455d0d02ab9a58519b0a - Sigstore transparency entry: 1969285943
- Sigstore integration time:
-
Permalink:
sarthaktexas/cryoem-halfmap-qc@cb5df4ad53d6c3ae4e6dcc4d16a50295da123260 -
Branch / Tag:
refs/tags/v0.3.2 - Owner: https://github.com/sarthaktexas
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@cb5df4ad53d6c3ae4e6dcc4d16a50295da123260 -
Trigger Event:
release
-
Statement type: