Skip to main content

Fast Python implementation of confind — protein side-chain contact-degree analysis.

Project description

pyconfind

CI PyPI Python License

A modern Python implementation of confind — the rotamer-based protein side-chain contact-degree analysis introduced in Zheng & Grigoryan's work on tertiary structural motifs.

The Python output is byte-for-byte identical to the upstream C++ binary on 248 of 253 real structures tested (100 single-chain PDB + 100 AlphaFold DB + 50 multi-chain + 3 high-resolution; see docs/stress_test_results.md), plus a further 100 RCSB entries cross-checked as both PDB and mmCIF. The 5 exceptions are insertion-code structures where the C++ ordering relies on undefined behavior (documented). The test suite runs against real PDB/mmCIF structures with committed C++-reference contact maps.

pyconfind is also faster than the C++ binary, with two interchangeable contact-degree backends (both byte-identical to the reference):

  • a pure NumPy/SciPy reference, which on its own already beats the C++ binary;
  • an optional Numba JIT/multi-threaded backend (pip install pyconfind[fast]) that is ~2-3× faster again.

With the Numba backend and the rotamer library amortized across a batch, the per-structure analysis is ~8-18× faster than the C++ binary.

runtime vs sequence length

Runtime scales sub-quadratically with sequence length (the CA-distance cutoff bounds each residue's neighbor count). See docs/benchmark.md for details.

Install

pip install pyconfind            # pure-Python reference backend
pip install "pyconfind[fast]"    # + Numba JIT/multi-threaded backend

From source (for development):

pip install -e ".[dev]"          # editable install with test/lint tooling

Example notebook

Open In Colab

examples/pyconfind_demo.ipynb is a runnable walkthrough (install → fetch a PDB → analyze via the library API → visualize a contact map, per-residue scores, and a 3D structure colored by contact degree). Click the badge to run it on a free Colab CPU runtime.

Quick start

CLI (matches the original confind flag names, so existing pipelines drop in):

pyconfind --p input.pdb --rLib path/to/rotlibs --o out.cont
# Inputs may be PDB or mmCIF (format auto-detected via gemmi):
pyconfind --p input.cif --rLib path/to/rotlibs --o out.cont
# Modern structured output:
pyconfind --p input.pdb --rLib path/to/rotlibs --json --o out.json
# Only consider the native AA at each position (no AA substitution):
pyconfind --p input.pdb --rLib path/to/rotlibs --native-only --o out.cont
# Restrict the computed/output residues (MSL selection language):
pyconfind --p input.pdb --rLib path/to/rotlibs --sel "chain A AND resi 20-60" --o out.cont
# Pre-select part of the structure before anything runs:
pyconfind --p input.pdb --rLib path/to/rotlibs --psel "NAME CA WITHIN 25 OF CHAIN A" --o out.cont

Library API:

from pyconfind import analyze, format_confind_text

result = analyze("input.pdb", rotamer_library="path/to/rotlibs")
print(format_confind_text(result.positions, result.report))

# Inspect raw contacts:
for c in result.report.contacts:
    pi, pj = result.positions[c.pos_i], result.positions[c.pos_j]
    print(f"{pi.position.chain},{pi.position.resnum} <-> "
          f"{pj.position.chain},{pj.position.resnum}: degree={c.degree}")

Rotamer libraries

Out of the box, pyconfind supports the Dunbrack 2010 MSL-format library that ships with the upstream confind source (EBL.out + BEBL.out). Point --rLib at a directory containing both files (backbone-dependent) or at a single EBL.out-style file (backbone-independent).

Modern Dunbrack and Richardson-style libraries are next on the roadmap.

Native-only mode (extension over the C++ binary)

The original C++ confind substitutes in all 18 non-Gly/Pro amino acids at every position and computes contact degree across the full rotamer space. pyconfind adds --native-only: at each position, only place rotamers of the native amino acid (but still consider every rotamer of that AA). Useful when you want a contact-degree estimate that holds the sequence fixed.

Validation

The C++ reference binary is built from the upstream tarball by:

scripts/build-reference.sh

The byte-identity tests then compare pyconfind's output against the C++ output on every example PDB. To run them yourself:

pytest tests/

References

  • "Sequence statistics of tertiary structural motifs reflect protein stability", F. Zheng, G. Grigoryan, PLoS ONE, 12(5): e0178272, 2017.

  • "Tertiary Structural Propensities Reveal Fundamental Sequence/Structure Relationships", F. Zheng, J. Zhang, G. Grigoryan, Structure, 23(5): 961-971, 2015.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyconfind-0.2.0.tar.gz (6.6 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyconfind-0.2.0-py3-none-any.whl (67.3 kB view details)

Uploaded Python 3

File details

Details for the file pyconfind-0.2.0.tar.gz.

File metadata

  • Download URL: pyconfind-0.2.0.tar.gz
  • Upload date:
  • Size: 6.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pyconfind-0.2.0.tar.gz
Algorithm Hash digest
SHA256 9b5bac61bf771d48b40a3787818202b2c0246c8db726463014568497e38f38d5
MD5 4509814b78334747e93e318ac269d74d
BLAKE2b-256 eb09a82f412ab66e5cdc179a80f0674bcb93cba956a3b05dc9f72d513fbf53e8

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyconfind-0.2.0.tar.gz:

Publisher: publish.yml on timodonnell/pyconfind

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pyconfind-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: pyconfind-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 67.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pyconfind-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 59e7a47607b6a6d41add826a49dcf07d9002c81b339a3a72d044d9b152973b2b
MD5 2ff37bb68ba08983d4b2bcdfd6302176
BLAKE2b-256 6f734d67fe01e05f23c89f9038fbf9fc4fea41ce87d502fc3fde576853c8902f

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyconfind-0.2.0-py3-none-any.whl:

Publisher: publish.yml on timodonnell/pyconfind

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page