Skip to main content

Local map reliability from cryo-EM density and half-maps

Project description

cryoem-halfmap-qc

DOI

Python tools for local map reliability in cryo-EM reconstructions: density statistics, half-map reproducibility, windowed local FSC (Å), a reproducibility score (H_repro), and build/caution/omit zones.

The goal is to test whether inexpensive map features track half-map cross-correlation and local FSC well enough to guide modeling. This is not a claim that density alone defines molecular flexibility.

All volumes use NumPy 3D arrays in (Z, Y, X) order (section, row, column), consistent with typical mrcfile layouts.


Install

PyPI: not published yet (pip install cryoem-halfmap-qc will fail until the first release is uploaded). See Publishing to PyPI below.

Until then, install from GitHub or a local checkout:

git clone https://github.com/sarthaktexas/cryoem-halfmap-qc.git
cd cryoem-halfmap-qc
pip install -e .

# or without cloning:
pip install "git+https://github.com/sarthaktexas/cryoem-halfmap-qc.git@v0.3.2"

This installs the halfmap-qc command on your PATH (PyPI package name will be cryoem-halfmap-qc once published).

Help & interactive mode:

halfmap-qc                  # interactive menu (when run in a terminal)
halfmap-qc help             # full command reference
halfmap-qc --help           # argparse summary + examples
halfmap-qc cohort --help    # flags for one subcommand
halfmap-qc interactive      # menu explicitly

Dependencies: NumPy, SciPy, mrcfile, Matplotlib, gemmi, pandas.


Data layout

Cryo-EM maps are not stored in this repository (too large for git). After cloning, create local directories:

data/emd_<ID>-<label>/     # deposited map + half-maps (.map or .mrc)
outputs/emd_<ID>/           # pipeline products (created by scripts)
cohort/manifest.csv         # EMDB IDs, relative paths, contours, validation labels

Download deposited and half maps from EMDB. Use the depositor-recommended contour for each entry (listed in cohort/manifest.csv). See docs/COHORT.md for download status and pipeline progress.


Quick start

Run from the project root (where data/ and cohort/manifest.csv live).

Single-map features:

halfmap-qc features path/to/map.mrc --out map_features.npz --float32
# shorthand (legacy): halfmap-qc path/to/map.mrc --out map_features.npz --float32

Typical workflow (features on avg-of-halves; reliability MRCs on deposited primary grid):

EMD=49450
CONTOUR=0.116
DATA=data/emd_${EMD}-mgtA_e2p+e1

halfmap-qc analyze \
  --features "${DATA}/emd_${EMD}_avg_features_t0116.npz" \
  --half1 "${DATA}/emd_${EMD}_half_map_1.map" \
  --half2 "${DATA}/emd_${EMD}_half_map_2.map" \
  --reference "${DATA}/emd_${EMD}.map" \
  --contour "${CONTOUR}" \
  --out-dir "outputs/emd_${EMD}/analysis"

halfmap-qc reliability --emd-id "${EMD}" --contour "${CONTOUR}" \
  --features "${DATA}/emd_${EMD}_avg_features_t0116.npz" \
  --halfmap-npz "outputs/emd_${EMD}/analysis/halfmap_metrics.npz"

Cohort batch (all active manifest entries with local data):

halfmap-qc cohort --pending

ARC / SLURM (one map per array task; save a local *.sbatch — not in git):

# After pip install -e . and rsync data/ + cohort/manifest.csv to $SCRATCH/thesis
N=$(($(halfmap-qc cohort-ids | wc -l) - 1))
sbatch --account=wrz135 --array=0-${N} --cpus-per-task=4 --mem=32G --time=00:45:00 \
  --wrap='halfmap-qc cohort --emd-id $(halfmap-qc cohort-ids | sed -n "$((SLURM_ARRAY_TASK_ID+1))p")'

Or save a multi-line script as e.g. ~/halfmap-qc_array.sbatch (gitignored) and sbatch --array=0-${N} ~/halfmap-qc_array.sbatch.


CLI (halfmap-qc)

Command Purpose
(no args, TTY) Interactive menu
halfmap-qc help Full reference + install notes
halfmap-qc features Local density / multiscale features → .npz
halfmap-qc analyze Windowed half-map CC + feature correlations
halfmap-qc reliability Reliability score, build zones, MRC export
halfmap-qc cohort Batch pipeline from cohort/manifest.csv
halfmap-qc cohort-ids Print EMDB IDs (for SLURM array jobs)
halfmap-qc interactive Interactive menu (same as bare halfmap-qc)

Legacy: python -m cryoem_mrc still works (same as halfmap-qc features).

Publishing to PyPI

One-time setup:

  1. Create an account at pypi.org (and optionally test.pypi.org for a dry run).
  2. On PyPI → Your projectsAdd new project → name it cryoem-halfmap-qc (or claim it when uploading).
  3. On PyPI → Account settingsPublishingAdd a new pending publisher:
    • PyPI project: cryoem-halfmap-qc
    • Owner: sarthaktexas (your GitHub user/org)
    • Repository: cryoem-halfmap-qc
    • Workflow: publish.yml
    • Environment: (leave blank unless you use one)

Release:

# bump version in pyproject.toml first, then:
git add pyproject.toml cryoem_mrc/__init__.py
git commit -m "Release v0.3.2"
git tag v0.3.2
git push origin main --tags

On GitHub → ReleasesDraft a new release → choose tag v0.3.2Publish release. The .github/workflows/publish.yml workflow builds the wheel and uploads to PyPI.

Test install after publish:

pip install cryoem-halfmap-qc
halfmap-qc --version

Manual upload (without GitHub Actions):

pip install build twine
python -m build
twine upload dist/*

Scripts (thesis / optional)

Thesis figure runners (scripts/rerun_all_figures.py, scripts/run_cohort_summary_figures.py, Figma export scripts, etc.) and cryoem_mrc/thesis_figures.py are local-only (gitignored) like figma-plugins/. Clone the repo on a machine that already has those files, or keep a local copy from before they were untracked.


Python API (high level)

import numpy as np
from cryoem_mrc import load_full_and_half_maps, run_pipeline, half_map_local_metrics
from cryoem_mrc.reliability import compute_reliability_maps, classify_build_zones

bundle = load_full_and_half_maps(
    "full.mrc", "half1.mrc", "half2.mrc", dtype=np.float32, resample_if_needed=True
)
metrics = half_map_local_metrics(bundle.half1, bundle.half2, window=5)
# metrics["windowed_halfmap_correlation"], etc.

features = run_pipeline("map.mrc", use_float32=True)
reliability = compute_reliability_maps(
    bundle.half1, bundle.half2,
    density_normalized=features["density_normalized"],
    window=5,
)
zones = classify_build_zones(reliability["reliability_score"])

Package modules: io, map_grid, local_stats, multiscale, half_map_repro, local_fsc, mechanics, reliability, analysis, structure_validation. Path helpers: cryoem_mrc/repo_paths.py.


Methods summary

  • Windowed half-map correlation is the fast internal reproducibility target for feature validation; local FSC resolution (Å) is the field-standard reference.
  • Local FSC is computed in-repo (cryoem_mrc.local_fsc); external BlocRes / ResMap / MonoRes maps are not loaded.
  • H_repro is the windowed gradient-constraint map V (legacy export name; ranked as reliability_score); reliability_score is an in-mask percentile used for build/caution/omit terciles. Resolvability gating uses windowed half-map CC or local FSC, not a separate disagreement map.
  • Local variance is often the strongest single feature predictor of windowed half-map correlation.

Thesis prose: full narrative draft in docs/THESIS_NARRATIVE.md. Writing guide and defense notes in docs/THESIS_AND_PUBLICATION.md.


Tests

python -m unittest discover -s tests -v

Citation

Before the manuscript is published, cite the software with the Zenodo concept DOI (resolves to the latest release; pin v0.3.2 or a commit hash for exact reproducibility):

@software{mohanty2026cryoem_halfmap_qc,
  author = {Mohanty, Sarthak},
  title = {cryoem-halfmap-qc: local map reliability from cryo-EM density and half-maps},
  year = {2026},
  doi = {10.5281/zenodo.20618526},
  url = {https://doi.org/10.5281/zenodo.20618526},
  version = {0.3.2}
}

GitHub also reads CITATION.cff for the Cite this repository button.

After publication, cite the paper as the primary reference. Also cite this Zenodo archive when you need the exact pipeline version used in the work.

When the manuscript exists, add a preferred-citation block to CITATION.cff (template included there) and drop the BibTeX for the article into this section.

License

MIT License. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cryoem_halfmap_qc-0.3.2.tar.gz (157.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cryoem_halfmap_qc-0.3.2-py3-none-any.whl (164.4 kB view details)

Uploaded Python 3

File details

Details for the file cryoem_halfmap_qc-0.3.2.tar.gz.

File metadata

  • Download URL: cryoem_halfmap_qc-0.3.2.tar.gz
  • Upload date:
  • Size: 157.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for cryoem_halfmap_qc-0.3.2.tar.gz
Algorithm Hash digest
SHA256 46ffc3990866544d0dbdb17a1105fc7f159ca0d950a133633d56602a6d2660b8
MD5 addb462ebf1ca9275ad446b6f323665b
BLAKE2b-256 9be1ec6f6fe18979b6113443df97f0890d7088be41a1adc2c3c7ed51951dfd82

See more details on using hashes here.

Provenance

The following attestation bundles were made for cryoem_halfmap_qc-0.3.2.tar.gz:

Publisher: publish.yml on sarthaktexas/cryoem-halfmap-qc

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cryoem_halfmap_qc-0.3.2-py3-none-any.whl.

File metadata

File hashes

Hashes for cryoem_halfmap_qc-0.3.2-py3-none-any.whl
Algorithm Hash digest
SHA256 49a332f156854f2dcb17457c25b2d0426a478ead67ef455d0d02ab9a58519b0a
MD5 58b1abe0ebf7fef1b1e7d7eebeef2849
BLAKE2b-256 4a6c6596365debd12e799c80a3a226acde938e172623bf4ab677063728301a6f

See more details on using hashes here.

Provenance

The following attestation bundles were made for cryoem_halfmap_qc-0.3.2-py3-none-any.whl:

Publisher: publish.yml on sarthaktexas/cryoem-halfmap-qc

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page