Skip to main content

High-performance PDB/mmCIF parsing and analysis library with Python bindings

Project description

PDBRust Python Bindings

High-performance Python bindings for PDBRust, a Rust library for parsing and analyzing PDB/mmCIF protein structure files.

Installation

pip install pdbrust

Quick Start

import pdbrust

# Parse a PDB file
structure = pdbrust.parse_pdb_file("protein.pdb")
print(f"Loaded {structure.num_atoms} atoms in {structure.num_chains} chains")

# Get chain IDs
chains = structure.get_chain_ids()
print(f"Chains: {chains}")

# Access atoms
for atom in structure.atoms[:5]:
    print(f"{atom.name} {atom.residue_name}{atom.residue_seq}")

Features

Parsing

# Parse different formats
structure = pdbrust.parse_pdb_file("protein.pdb")
structure = pdbrust.parse_mmcif_file("protein.cif")
structure = pdbrust.parse_structure_file("protein.ent")  # auto-detect

# Parse gzip-compressed files
structure = pdbrust.parse_gzip_pdb_file("pdb1ubq.ent.gz")

# Parse from string
structure = pdbrust.parse_pdb_string(pdb_content)

Filtering and Cleaning

# Method chaining for clean code
cleaned = structure.remove_ligands().keep_only_chain("A").keep_only_ca()

# Get CA coordinates
ca_coords = structure.get_ca_coords()  # List of (x, y, z) tuples
ca_coords_chain_a = structure.get_ca_coords("A")  # Specific chain

# Cleaning operations
structure.center_structure()
structure.normalize_chain_ids()
structure.reindex_residues()

Structural Descriptors

# Individual metrics
rg = structure.radius_of_gyration()
max_dist = structure.max_ca_distance()
composition = structure.aa_composition()

# All descriptors at once
desc = structure.structure_descriptors()
print(f"Rg: {desc.radius_of_gyration:.2f} A")
print(f"Hydrophobic: {desc.hydrophobic_ratio:.1%}")

Quality Assessment

# Quick checks
if structure.has_altlocs():
    print("Warning: alternate conformations present")

if structure.has_multiple_models():
    print("NMR ensemble detected")

# Full quality report
report = structure.quality_report()
if report.is_analysis_ready():
    print("Structure is ready for analysis")

RCSB PDB Integration

from pdbrust import SearchQuery, rcsb_search, download_structure, FileFormat

# Download a structure
structure = download_structure("1UBQ", FileFormat.pdb())

# Search RCSB
query = SearchQuery().with_text("kinase").with_organism("Homo sapiens").with_resolution_max(2.0)
results = rcsb_search(query, 10)
print(f"Found {results.total_count} structures")
for pdb_id in results.pdb_ids:
    print(f"  {pdb_id}")

Performance

PDBRust provides 40-260x speedups over pure Python implementations:

Operation Speedup vs Python
Parsing 2-3x
get_ca_coords 240x
max_ca_distance 260x
radius_of_gyration 100x

Requirements

  • Python >= 3.9
  • No runtime dependencies (Rust code is compiled into the package)

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pdbrust-0.3.0-cp39-cp39-manylinux_2_34_x86_64.whl (4.5 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.34+ x86-64

File details

Details for the file pdbrust-0.3.0-cp39-cp39-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for pdbrust-0.3.0-cp39-cp39-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 bcf7e49263d26b9b814c3119d2a278d2e29122ede940b91617683bee339a2630
MD5 b04f587c5a6517943d6d51d818f98658
BLAKE2b-256 b4e125f66c7d7df1956f8bc2c32390fb9eaeae360d3db84ffd114ad8b07736be

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page