Skip to main content

Fast Python implementation of confind — protein side-chain contact-degree analysis.

Project description

pyconfind

CI PyPI Python License Open In Colab

A modern Python implementation of ConFind — the rotamer-based protein side-chain contact-degree analysis introduced in Zheng et al 2015 and Zheng et al 2017.

The Python output is byte-for-byte identical to the upstream C++ binary on 248 of 253 real structures tested (100 single-chain PDB + 100 AlphaFold DB + 50 multi-chain + 3 high-resolution; see docs/stress_test_results.md), plus a further 100 RCSB entries cross-checked as both PDB and mmCIF. The 5 exceptions are insertion-code structures where the C++ ordering relies on undefined behavior (documented). The test suite runs against real PDB/mmCIF structures with committed C++-reference contact maps.

pyconfind is also faster than the C++ binary, with two interchangeable contact-degree backends (both byte-identical to the reference):

  • a pure NumPy/SciPy reference, which on its own already beats the C++ binary;
  • an optional Numba JIT/multi-threaded backend (pip install pyconfind[fast]) that is ~3× faster again.

With the Numba backend and the rotamer library pre-loaded, per-structure analysis is ~7-11× faster than the C++ binary in full mode (median ~10× over the benchmark set). native_only=True runs another ~26× faster again — under 0.36 s for everything in the benchmark set (largest 555 residues), and ~0.07 s for small ones.

runtime vs sequence length

Left: full analysis (every position considers all 18 substitutable AAs). Right: native_only=True — only the native AA is placed at each position (see native-only mode). The rotamer library is loaded once before measurement and excluded from every timing, so the numbers reflect per-structure analysis only. See docs/benchmark.md for the structure set and the harness.

Install

pip install pyconfind            # pure-Python reference backend
pip install "pyconfind[fast]"    # + Numba JIT/multi-threaded backend

From source (for development):

pip install -e ".[dev]"          # editable install with test/lint tooling

Example notebook

Open In Colab

examples/pyconfind_demo.ipynb is a runnable walkthrough (install → fetch a PDB → analyze via the library API → visualize a contact map, per-residue scores, and a 3D structure colored by contact degree). Click the badge to run it on a free Colab CPU runtime.

Quick start

The rotamer library is optional — if you don't pass one, pyconfind downloads the Dunbrack 2010 library once (~6 MB) and caches it per-user (via platformdirs), so the simplest invocation is just:

pyconfind --p input.pdb --o out.cont          # library auto-downloaded + cached

CLI (matches the original confind flag names, so existing pipelines drop in; pass --rLib to use your own library):

# Inputs may be PDB or mmCIF (format auto-detected via gemmi):
pyconfind --p input.cif --o out.cont
# Modern structured output:
pyconfind --p input.pdb --json --o out.json
# Only consider the native AA at each position (no AA substitution):
pyconfind --p input.pdb --native-only --o out.cont
# Restrict the computed/output residues (MSL selection language):
pyconfind --p input.pdb --sel "chain A AND resi 20-60" --o out.cont
# Pre-select part of the structure before anything runs:
pyconfind --p input.pdb --psel "NAME CA WITHIN 25 OF CHAIN A" --o out.cont
# Use your own library:
pyconfind --p input.pdb --rLib path/to/rotlibs --o out.cont

Library API:

from pyconfind import analyze

result = analyze("input.pdb")           # library auto-downloaded + cached
positions = result.positions_dataframe()  # one row per residue
contacts  = result.contacts_dataframe()   # one row per residue-residue contact
contacts.nlargest(10, "degree")

analyze() takes an assembly= argument too — by default it picks the first biological assembly, which is what you want for crystal structures whose asymmetric unit contains multiple independent copies of the complex (e.g. antibody/antigen structures like 5TRU). Pass assembly=None to keep the asymmetric unit as-is.

Rotamer libraries

Out of the box, pyconfind supports the Dunbrack 2010 MSL-format library that ships with the upstream confind source (EBL.out + BEBL.out); leave --rLib unset to auto-download it. Point --rLib at your own directory containing both files to use a different library. Only backbone-dependent libraries are supported.

Modern Dunbrack and Richardson-style libraries are next on the roadmap.

Native-only mode (extension over the C++ binary)

The original C++ confind substitutes in all 18 non-Gly/Pro amino acids at every position and computes contact degree across the full rotamer space. pyconfind adds --native-only: at each position, only place rotamers of the native amino acid (but still consider every rotamer of that AA).

Validation

The C++ reference binary is built from the upstream tarball by:

scripts/build-reference.sh

The byte-identity tests then compare pyconfind's output against the C++ output on every example PDB. To run them yourself:

pytest tests/

References

  • "Sequence statistics of tertiary structural motifs reflect protein stability", F. Zheng, G. Grigoryan, PLoS ONE, 12(5): e0178272, 2017.

  • "Tertiary Structural Propensities Reveal Fundamental Sequence/Structure Relationships", F. Zheng, J. Zhang, G. Grigoryan, Structure, 23(5): 961-971, 2015.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyconfind-0.6.0.tar.gz (6.9 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyconfind-0.6.0-py3-none-any.whl (73.1 kB view details)

Uploaded Python 3

File details

Details for the file pyconfind-0.6.0.tar.gz.

File metadata

  • Download URL: pyconfind-0.6.0.tar.gz
  • Upload date:
  • Size: 6.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pyconfind-0.6.0.tar.gz
Algorithm Hash digest
SHA256 c6871b399ca63052931dc0dc7ce541804c531b8c5328cc9fa4d8c9613f7dc056
MD5 94fdf9b442aea9953ad541a4fac58c30
BLAKE2b-256 c8d2260bd0faf94c7df2c4b58ee1773b7072d655ef9b7160c05adeec3183c84a

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyconfind-0.6.0.tar.gz:

Publisher: publish.yml on timodonnell/pyconfind

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pyconfind-0.6.0-py3-none-any.whl.

File metadata

  • Download URL: pyconfind-0.6.0-py3-none-any.whl
  • Upload date:
  • Size: 73.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pyconfind-0.6.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ebbecf9c18bbef4ffd6290a20d03c2b7eb0eec8b8022b1d904ab74af92ffde4c
MD5 f6235f3f8ab83d59330267aa08409aa2
BLAKE2b-256 eb1560b0ee430e1ba48f953de12201beea28b22b1adba1c8f5e2bf18b70423bf

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyconfind-0.6.0-py3-none-any.whl:

Publisher: publish.yml on timodonnell/pyconfind

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page