Pdb_cpp is a python library allowing simple operations on pdb coor files.

These details have not been verified by PyPI

Project links

Project description

pdb_cpp

pdb_cpp is a structural bioinformatics toolkit with a C++ core and Python API for fast PDB/mmCIF parsing, atom selection, sequence/structure alignment, TM-score, and DockQ evaluation.

What is included

Read/write .pdb, .cif, .pqr, and .gro files
Atom/residue/chain selections (including geometric within queries)
Sequence extraction and pairwise sequence alignment
Sequence-based structural superposition and chain-permutation alignment
TM-align/TM-score through the bundled USalign/TM-align core
DockQ metrics (DockQ, Fnat, Fnonnat, LRMS, iRMS, rRMS)
Hydrogen bond detection (Baker & Hubbard geometric method, no explicit H required)
Secondary structure assignment
Core geometric helpers (e.g., distance matrix)

Installation

From PyPI

python -m pip install pdb-cpp

From source

git clone https://github.com/samuelmurail/pdb_cpp
cd pdb_cpp
python -m pip install -e .

For development:

python -m pip install -r requirements.txt
pytest

Quick start

from pdb_cpp import Coor

# Load from local file
coor = Coor("tests/input/1y0m.cif")

# Or fetch by PDB ID (mmCIF is downloaded and cached)
coor_pdb = Coor(pdb_id="1y0m")

# Or use the RCSB helper for explicit structure choices
from pdb_cpp import rcsb

bio_assembly = rcsb.load("5a9z", structure="biological_assembly", assembly_id=1)
asym_unit = rcsb.load("5a9z", structure="asymmetric_unit")

print(coor.model_num)        # number of models
print(coor.get_aa_seq())     # chain -> sequence

# Write selection/structure back to disk
coor.write("out_structure.pdb")

Selection language (including complex selections)

select_atoms() supports boolean logic, numeric comparisons, and spatial queries:

# Backbone residues 6..58 from chain A
sel_1 = coor.select_atoms("backbone and chain A and residue >= 6 and residue <= 58")

# Interface-like query: atoms of chain A within 5 Å of chain B
sel_2 = coor.select_atoms("chain A and within 5.0 of chain B")

# Combination with negation
sel_3 = coor.select_atoms("name CA and not within 5.0 of resname HOH")

# Numeric filters
sel_4 = coor.select_atoms("resname HOH and x >= 20.0")

Common keywords: name, resname, chain, resid, residue, x, y, z, beta, occ, protein, backbone, noh, within.

Sequence alignment and structure superposition

from pdb_cpp import Coor, alignment, core, analysis

coor_1 = Coor("tests/input/1u85.pdb")
coor_2 = Coor("tests/input/1ubd.pdb")

seq_1 = coor_1.get_aa_seq()["A"]
seq_2 = coor_2.get_aa_seq()["C"]
aln_1, aln_2, score = alignment.align_seq(seq_1, seq_2)
alignment.print_align_seq(aln_1, aln_2)

# Get atom correspondences and align coordinates in-place
idx_1, idx_2 = core.get_common_atoms(coor_1, coor_2, chain_1=["A"], chain_2=["C"])
core.coor_align(coor_1, coor_2, idx_1, idx_2, frame_ref=0)

# RMSD after alignment
rmsd_values = analysis.rmsd(coor_1, coor_2, index_list=[idx_1, idx_2])
print(rmsd_values[0])

For multi-chain complexes with uncertain chain mapping, use chain permutation:

rmsds, index_mappings = alignment.align_chain_permutation(coor_1, coor_2)

TM-align / TM-score

from pdb_cpp import Coor
from pdb_cpp.core import tmalign_ca

ref = Coor("tests/input/1y0m.cif")
mob = Coor("tests/input/1ubd.pdb")

result = tmalign_ca(ref, mob, chain_1=["A"], chain_2=["C"], mm=1)

print(result.L_ali)  # aligned length
print(result.rmsd)   # RMSD on aligned residues
print(result.TM1)    # TM-score normalized by structure 1
print(result.TM2)    # TM-score normalized by structure 2

If you use the USalign/TM-align functionality in pdb_cpp, please cite:

Chengxin Zhang, Morgan Shine, Anna Marie Pyle, Yang Zhang (2022) Nat Methods. 19(9), 1109-1115.
Chengxin Zhang, Anna Marie Pyle (2022) iScience. 25(10), 105218.

Secondary structure helper:

from pdb_cpp import TMalign

ss = TMalign.compute_secondary_structure(ref)
print(ss[0]["A"])  # DSSP-like secondary structure string for chain A

DockQ scoring

from pdb_cpp import Coor, analysis

model = Coor("tests/input/1rxz_colabfold_model_1.pdb")
native = Coor("tests/input/1rxz.pdb")

# Chain roles can be inferred automatically,
# or provided explicitly with rec_chains/lig_chains arguments.
scores = analysis.dockQ(model, native)

print(scores["DockQ"][0])
print(scores["Fnat"][0], scores["Fnonnat"][0])
print(scores["LRMS"][0], scores["iRMS"][0], scores["rRMS"][0])

If you use DockQ scoring in pdb_cpp, please cite:

DockQ, DOI: 10.1093/bioinformatics/btae586

Hydrogen bond detection

pdb_cpp.hbond identifies hydrogen bonds using the Baker & Hubbard geometric criteria. Hydrogen positions are reconstructed algebraically when not present in the file, so no pre-processing step is required.

from pdb_cpp import Coor
from pdb_cpp import hbond

coor = Coor("tests/input/2rri.cif")

# One list of HBond objects per model frame
all_bonds = hbond.hbonds(coor)
print(f"Model 0: {len(all_bonds[0])} H-bonds")

# Inspect bond geometry
b = all_bonds[0][0]
print(f"Donor  : {b.donor_chain}{b.donor_resid} {b.donor_heavy_name}")
print(f"Acceptor: {b.acceptor_chain}{b.acceptor_resid} {b.acceptor_name}")
print(f"d(D..A) = {b.dist_DA:.2f} Å  ∠DHA = {b.angle_DHA:.1f}°")

# Cross-selection: protein donors to nucleic-acid acceptors
rna_bonds = hbond.hbonds(coor, donor_sel="protein", acceptor_sel="nucleic")

Default cutoffs follow Baker & Hubbard (1984): dist_DA_cutoff=3.5 Å, dist_HA_cutoff=2.5 Å, angle_cutoff=90°.

Geometry utilities

from pdb_cpp import Coor, geom

coor = Coor("tests/input/1y0m.cif")
ca = coor.select_atoms("name CA")
dist = geom.distance_matrix(ca, ca)
print(dist.shape)

Benchmarks

Benchmark scripts are available in benchmark/README.md:

DockQ vs pdb_cpp implementation
I/O read/write speed
Common operation speed comparisons (pdb_cpp, pdb_numpy, biopython, biotite)

Documentation

API and examples: https://samuelmurail.github.io/pdb_cpp/
Docs sources: docs/source/

Notes for contributors (C++ core)

When adding C++ features:

Add implementation files in src/pdb_cpp/_core/
Register sources in setup.py
Expose bindings in src/pdb_cpp/_core/pybind.cpp
Reinstall extension (pip install -e . --no-build-isolation) and run tests

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.2.0

Apr 8, 2026

0.0.1

Feb 20, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdb_cpp-0.2.0.tar.gz (175.2 kB view details)

Uploaded Apr 8, 2026 Source

File details

Details for the file pdb_cpp-0.2.0.tar.gz.

File metadata

Download URL: pdb_cpp-0.2.0.tar.gz
Upload date: Apr 8, 2026
Size: 175.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.8

File hashes

Hashes for pdb_cpp-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`89be157f17b06729a05f94ac034fd0a4e85e897bd3d8d19d70b9fc41df7bfef2`
MD5	`b82a1558e5014c8381e3021c4fde20fc`
BLAKE2b-256	`810df2f11b9ba60286705ce44eb83de30ec67baca8abb2c30f0df8570b945e8f`

See more details on using hashes here.

pdb-cpp 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

pdb_cpp

What is included

Installation

From PyPI

From source

Quick start

Selection language (including complex selections)

Sequence alignment and structure superposition

TM-align / TM-score

DockQ scoring

Hydrogen bond detection

Geometry utilities

Benchmarks

Documentation

Notes for contributors (C++ core)

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes