Skip to main content

Fast Python implementation of confind — protein side-chain contact-degree analysis.

Project description

pyconfind

CI PyPI Python License Open In Colab

A modern Python implementation of confind — the rotamer-based protein side-chain contact-degree analysis introduced in Zheng & Grigoryan's work on tertiary structural motifs.

The Python output is byte-for-byte identical to the upstream C++ binary on 248 of 253 real structures tested (100 single-chain PDB + 100 AlphaFold DB + 50 multi-chain + 3 high-resolution; see docs/stress_test_results.md), plus a further 100 RCSB entries cross-checked as both PDB and mmCIF. The 5 exceptions are insertion-code structures where the C++ ordering relies on undefined behavior (documented). The test suite runs against real PDB/mmCIF structures with committed C++-reference contact maps.

pyconfind is also faster than the C++ binary, with two interchangeable contact-degree backends (both byte-identical to the reference):

  • a pure NumPy/SciPy reference, which on its own already beats the C++ binary;
  • an optional Numba JIT/multi-threaded backend (pip install pyconfind[fast]) that is ~2-3× faster again.

With the Numba backend and the rotamer library amortized across a batch, the per-structure analysis is ~8-18× faster than the C++ binary.

runtime vs sequence length

Runtime scales sub-quadratically with sequence length (the CA-distance cutoff bounds each residue's neighbor count). See docs/benchmark.md for details.

Install

pip install pyconfind            # pure-Python reference backend
pip install "pyconfind[fast]"    # + Numba JIT/multi-threaded backend

From source (for development):

pip install -e ".[dev]"          # editable install with test/lint tooling

Example notebook

Open In Colab

examples/pyconfind_demo.ipynb is a runnable walkthrough (install → fetch a PDB → analyze via the library API → visualize a contact map, per-residue scores, and a 3D structure colored by contact degree). Click the badge to run it on a free Colab CPU runtime.

Quick start

The rotamer library is optional — if you don't pass one, pyconfind downloads the Dunbrack 2010 library once (~6 MB) and caches it per-user (via platformdirs), so the simplest invocation is just:

pyconfind --p input.pdb --o out.cont          # library auto-downloaded + cached

CLI (matches the original confind flag names, so existing pipelines drop in; pass --rLib to use your own library):

# Inputs may be PDB or mmCIF (format auto-detected via gemmi):
pyconfind --p input.cif --o out.cont
# Modern structured output:
pyconfind --p input.pdb --json --o out.json
# Only consider the native AA at each position (no AA substitution):
pyconfind --p input.pdb --native-only --o out.cont
# Restrict the computed/output residues (MSL selection language):
pyconfind --p input.pdb --sel "chain A AND resi 20-60" --o out.cont
# Pre-select part of the structure before anything runs:
pyconfind --p input.pdb --psel "NAME CA WITHIN 25 OF CHAIN A" --o out.cont
# Use your own library:
pyconfind --p input.pdb --rLib path/to/rotlibs --o out.cont

Library API:

from pyconfind import analyze

result = analyze("input.pdb")           # library auto-downloaded + cached
positions = result.positions_dataframe()  # one row per residue
contacts  = result.contacts_dataframe()   # one row per residue-residue contact
contacts.nlargest(10, "degree")

analyze() takes an assembly= argument too — by default it picks the first biological assembly, which is what you want for crystal structures whose asymmetric unit contains multiple independent copies of the complex (e.g. antibody/antigen structures like 5TRU). Pass assembly=None to keep the asymmetric unit as-is.

Rotamer libraries

Out of the box, pyconfind supports the Dunbrack 2010 MSL-format library that ships with the upstream confind source (EBL.out + BEBL.out). Point --rLib at a directory containing both files (backbone-dependent) or at a single EBL.out-style file (backbone-independent).

Modern Dunbrack and Richardson-style libraries are next on the roadmap.

Native-only mode (extension over the C++ binary)

The original C++ confind substitutes in all 18 non-Gly/Pro amino acids at every position and computes contact degree across the full rotamer space. pyconfind adds --native-only: at each position, only place rotamers of the native amino acid (but still consider every rotamer of that AA). Useful when you want a contact-degree estimate that holds the sequence fixed.

Validation

The C++ reference binary is built from the upstream tarball by:

scripts/build-reference.sh

The byte-identity tests then compare pyconfind's output against the C++ output on every example PDB. To run them yourself:

pytest tests/

References

  • "Sequence statistics of tertiary structural motifs reflect protein stability", F. Zheng, G. Grigoryan, PLoS ONE, 12(5): e0178272, 2017.

  • "Tertiary Structural Propensities Reveal Fundamental Sequence/Structure Relationships", F. Zheng, J. Zhang, G. Grigoryan, Structure, 23(5): 961-971, 2015.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyconfind-0.4.0.tar.gz (6.9 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyconfind-0.4.0-py3-none-any.whl (69.5 kB view details)

Uploaded Python 3

File details

Details for the file pyconfind-0.4.0.tar.gz.

File metadata

  • Download URL: pyconfind-0.4.0.tar.gz
  • Upload date:
  • Size: 6.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pyconfind-0.4.0.tar.gz
Algorithm Hash digest
SHA256 ef2cf6f129bc1535d7397e0e9d80c52de39668d8fd975512ed5e1ab5278f9c30
MD5 532fcb1576c3840b1327539363c9fb44
BLAKE2b-256 14095a0b89eda0f1ad72696a21b0e0da74666136c7624b3a0ff74a4ced62f8f1

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyconfind-0.4.0.tar.gz:

Publisher: publish.yml on timodonnell/pyconfind

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pyconfind-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: pyconfind-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 69.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pyconfind-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 202fb5d287b72260db4b6a65cb246f68a07d08b61c7933f862b3f9384a6e35c2
MD5 a59e70a7869ae5de300e259b323b75b5
BLAKE2b-256 321624e80766d2d531ccd224bd88d69bff0fd064ac1068d869d32edef4087154

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyconfind-0.4.0-py3-none-any.whl:

Publisher: publish.yml on timodonnell/pyconfind

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page