Skip to main content

Fast Python implementation of confind — protein side-chain contact-degree analysis.

Project description

pyconfind

CI Python License

A modern Python implementation of confind — the rotamer-based protein side-chain contact-degree analysis introduced in Zheng & Grigoryan's work on tertiary structural motifs.

The Python output is byte-for-byte identical to the upstream C++ binary on 248 of 253 real structures tested (100 single-chain PDB + 100 AlphaFold DB + 50 multi-chain + 3 high-resolution; see docs/stress_test_results.md) and on all 11 example structures shipped with the original codebase. The 5 exceptions are insertion-code structures where the C++ ordering relies on undefined behavior (documented).

pyconfind is also faster than the C++ binary, with two interchangeable contact-degree backends (both byte-identical to the reference):

  • a pure NumPy/SciPy reference, which on its own already beats the C++ binary;
  • an optional Numba JIT/multi-threaded backend (pip install pyconfind[fast]) that is ~2-3× faster again.

With the Numba backend and the rotamer library amortized across a batch, the per-structure analysis is ~8-18× faster than the C++ binary.

runtime vs sequence length

Runtime scales sub-quadratically with sequence length (the CA-distance cutoff bounds each residue's neighbor count). See docs/benchmark.md for details.

Install

pip install -e ".[dev]"      # includes the Numba fast backend
# or, runtime only:
pip install -e .             # pure-Python reference backend
pip install -e ".[fast]"     # + Numba backend

Example notebook

Open In Colab

examples/pyconfind_demo.ipynb is a runnable walkthrough (install → fetch a PDB → analyze via the library API → visualize a contact map, per-residue scores, and a 3D structure colored by contact degree). Click the badge to run it on a free Colab CPU runtime.

Quick start

CLI (matches the original confind flag names, so existing pipelines drop in):

pyconfind --p input.pdb --rLib path/to/rotlibs --o out.cont
# Inputs may be PDB or mmCIF (format auto-detected via gemmi):
pyconfind --p input.cif --rLib path/to/rotlibs --o out.cont
# Modern structured output:
pyconfind --p input.pdb --rLib path/to/rotlibs --json --o out.json
# Only consider the native AA at each position (no AA substitution):
pyconfind --p input.pdb --rLib path/to/rotlibs --native-only --o out.cont
# Restrict the computed/output residues (MSL selection language):
pyconfind --p input.pdb --rLib path/to/rotlibs --sel "chain A AND resi 20-60" --o out.cont
# Pre-select part of the structure before anything runs:
pyconfind --p input.pdb --rLib path/to/rotlibs --psel "NAME CA WITHIN 25 OF CHAIN A" --o out.cont

Library API:

from pyconfind import analyze, format_confind_text

result = analyze("input.pdb", rotamer_library="path/to/rotlibs")
print(format_confind_text(result.positions, result.report))

# Inspect raw contacts:
for c in result.report.contacts:
    pi, pj = result.positions[c.pos_i], result.positions[c.pos_j]
    print(f"{pi.position.chain},{pi.position.resnum} <-> "
          f"{pj.position.chain},{pj.position.resnum}: degree={c.degree}")

Rotamer libraries

Out of the box, pyconfind supports the Dunbrack 2010 MSL-format library that ships with the upstream confind source (EBL.out + BEBL.out). Point --rLib at a directory containing both files (backbone-dependent) or at a single EBL.out-style file (backbone-independent).

Modern Dunbrack and Richardson-style libraries are next on the roadmap.

Native-only mode (extension over the C++ binary)

The original C++ confind substitutes in all 18 non-Gly/Pro amino acids at every position and computes contact degree across the full rotamer space. pyconfind adds --native-only: at each position, only place rotamers of the native amino acid (but still consider every rotamer of that AA). Useful when you want a contact-degree estimate that holds the sequence fixed.

Validation

The C++ reference binary is built from the upstream tarball by:

scripts/build-reference.sh

The byte-identity tests then compare pyconfind's output against the C++ output on every example PDB. To run them yourself:

pytest tests/

References

  • "Sequence statistics of tertiary structural motifs reflect protein stability", F. Zheng, G. Grigoryan, PLoS ONE, 12(5): e0178272, 2017.

  • "Tertiary Structural Propensities Reveal Fundamental Sequence/Structure Relationships", F. Zheng, J. Zhang, G. Grigoryan, Structure, 23(5): 961-971, 2015.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyconfind-0.1.0.tar.gz (5.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyconfind-0.1.0-py3-none-any.whl (66.2 kB view details)

Uploaded Python 3

File details

Details for the file pyconfind-0.1.0.tar.gz.

File metadata

  • Download URL: pyconfind-0.1.0.tar.gz
  • Upload date:
  • Size: 5.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pyconfind-0.1.0.tar.gz
Algorithm Hash digest
SHA256 1a0d0794807f38eac43dccdc47e5023f7ff041c5124bb873c8a7a6e241ccbcf9
MD5 9251b8520931f244ab710fbde6877a6a
BLAKE2b-256 661374e7ba544186d4b9858d6605b43cd1b1a39131dd3b3fe27183b835bff07c

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyconfind-0.1.0.tar.gz:

Publisher: publish.yml on timodonnell/pyconfind

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pyconfind-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: pyconfind-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 66.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pyconfind-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 76c242330c4abe4c188e2d760b04d4eec75b1dc3d7063d77ca81e806fcb8da76
MD5 27b608de679a94e3b8489ebce7ea90c0
BLAKE2b-256 b2d0aebe7e79f231a67c40d8bd5dada4eb18a6a57a61076c26a968df0acff9f5

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyconfind-0.1.0-py3-none-any.whl:

Publisher: publish.yml on timodonnell/pyconfind

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page