Skip to main content

Fast Python implementation of confind — protein side-chain contact-degree analysis.

Project description

pyconfind

CI PyPI Python License Open In Colab

A modern Python implementation of ConFind — the rotamer-based protein side-chain contact-degree analysis introduced in Zheng et al 2015 and Zheng et al 2017.

The Python output is byte-for-byte identical to the upstream C++ binary on 248 of 253 real structures tested (100 single-chain PDB + 100 AlphaFold DB + 50 multi-chain + 3 high-resolution; see docs/stress_test_results.md), plus a further 100 RCSB entries cross-checked as both PDB and mmCIF. The 5 exceptions are insertion-code structures where the C++ ordering relies on undefined behavior (documented). The test suite runs against real PDB/mmCIF structures with committed C++-reference contact maps.

pyconfind is also faster than the C++ binary, with two interchangeable contact-degree backends (both byte-identical to the reference):

  • a pure NumPy/SciPy reference, which on its own already beats the C++ binary;
  • an optional Numba JIT/multi-threaded backend (pip install pyconfind[fast]) that is ~2-3× faster again.

With the Numba backend and the rotamer library pre-loaded, per-structure analysis is ~5-8× faster than the C++ binary (median ~7.8× over the benchmark set), and native_only=True is another ~20× faster again — sub-second for hundreds of residues.

runtime vs sequence length

Left: full analysis (every position considers all 18 substitutable AAs). Right: native_only=True — only the native AA is placed at each position (see native-only mode). The rotamer library is loaded once before measurement and excluded from every timing, so the numbers reflect per-structure analysis only. See docs/benchmark.md for the structure set and the harness.

Install

pip install pyconfind            # pure-Python reference backend
pip install "pyconfind[fast]"    # + Numba JIT/multi-threaded backend

From source (for development):

pip install -e ".[dev]"          # editable install with test/lint tooling

Example notebook

Open In Colab

examples/pyconfind_demo.ipynb is a runnable walkthrough (install → fetch a PDB → analyze via the library API → visualize a contact map, per-residue scores, and a 3D structure colored by contact degree). Click the badge to run it on a free Colab CPU runtime.

Quick start

The rotamer library is optional — if you don't pass one, pyconfind downloads the Dunbrack 2010 library once (~6 MB) and caches it per-user (via platformdirs), so the simplest invocation is just:

pyconfind --p input.pdb --o out.cont          # library auto-downloaded + cached

CLI (matches the original confind flag names, so existing pipelines drop in; pass --rLib to use your own library):

# Inputs may be PDB or mmCIF (format auto-detected via gemmi):
pyconfind --p input.cif --o out.cont
# Modern structured output:
pyconfind --p input.pdb --json --o out.json
# Only consider the native AA at each position (no AA substitution):
pyconfind --p input.pdb --native-only --o out.cont
# Restrict the computed/output residues (MSL selection language):
pyconfind --p input.pdb --sel "chain A AND resi 20-60" --o out.cont
# Pre-select part of the structure before anything runs:
pyconfind --p input.pdb --psel "NAME CA WITHIN 25 OF CHAIN A" --o out.cont
# Use your own library:
pyconfind --p input.pdb --rLib path/to/rotlibs --o out.cont

Library API:

from pyconfind import analyze

result = analyze("input.pdb")           # library auto-downloaded + cached
positions = result.positions_dataframe()  # one row per residue
contacts  = result.contacts_dataframe()   # one row per residue-residue contact
contacts.nlargest(10, "degree")

analyze() takes an assembly= argument too — by default it picks the first biological assembly, which is what you want for crystal structures whose asymmetric unit contains multiple independent copies of the complex (e.g. antibody/antigen structures like 5TRU). Pass assembly=None to keep the asymmetric unit as-is.

Rotamer libraries

Out of the box, pyconfind supports the Dunbrack 2010 MSL-format library that ships with the upstream confind source (EBL.out + BEBL.out); leave --rLib unset to auto-download it. Point --rLib at your own directory containing both files to use a different library. Only backbone-dependent libraries are supported.

Modern Dunbrack and Richardson-style libraries are next on the roadmap.

Native-only mode (extension over the C++ binary)

The original C++ confind substitutes in all 18 non-Gly/Pro amino acids at every position and computes contact degree across the full rotamer space. pyconfind adds --native-only: at each position, only place rotamers of the native amino acid (but still consider every rotamer of that AA).

Validation

The C++ reference binary is built from the upstream tarball by:

scripts/build-reference.sh

The byte-identity tests then compare pyconfind's output against the C++ output on every example PDB. To run them yourself:

pytest tests/

References

  • "Sequence statistics of tertiary structural motifs reflect protein stability", F. Zheng, G. Grigoryan, PLoS ONE, 12(5): e0178272, 2017.

  • "Tertiary Structural Propensities Reveal Fundamental Sequence/Structure Relationships", F. Zheng, J. Zhang, G. Grigoryan, Structure, 23(5): 961-971, 2015.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyconfind-0.5.0.tar.gz (6.9 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyconfind-0.5.0-py3-none-any.whl (70.9 kB view details)

Uploaded Python 3

File details

Details for the file pyconfind-0.5.0.tar.gz.

File metadata

  • Download URL: pyconfind-0.5.0.tar.gz
  • Upload date:
  • Size: 6.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pyconfind-0.5.0.tar.gz
Algorithm Hash digest
SHA256 78f3633d95167d96d0d84e26798cbf20c7affa4eddbb7d25852775b1033dd311
MD5 6b016bf678b8939f0e988d567fc1804f
BLAKE2b-256 24384cf5c31398c8be7ff610a2b29ab98aa458dc42a0ac9ddaff663e399b084a

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyconfind-0.5.0.tar.gz:

Publisher: publish.yml on timodonnell/pyconfind

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pyconfind-0.5.0-py3-none-any.whl.

File metadata

  • Download URL: pyconfind-0.5.0-py3-none-any.whl
  • Upload date:
  • Size: 70.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pyconfind-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b9371507945d5cc2ca74547c650afff04f9428c973d29029c642130c383eeadb
MD5 b955cf302292556b980791d7e2b0f604
BLAKE2b-256 0affe21bb985d09815c1ea073b3132c0e6d02db7c34a5df76ebe6b14170cad4f

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyconfind-0.5.0-py3-none-any.whl:

Publisher: publish.yml on timodonnell/pyconfind

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page