Skip to main content

Fast Rust-based SDF, MOL2, and XYZ molecular structure file parser

Project description

sdfrust - Python Bindings

Fast Rust-based SDF, MOL2, and XYZ molecular structure file parser with Python bindings, including transparent gzip decompression.

Installation

From source (requires Rust toolchain)

cd sdfrust-python
pip install maturin
maturin develop --features numpy

Build wheel

maturin build --release --features numpy
pip install target/wheels/sdfrust-*.whl

Quick Start

import sdfrust

# Parse a single SDF file
mol = sdfrust.parse_sdf_file("molecule.sdf")
print(f"Name: {mol.name}")
print(f"Atoms: {mol.num_atoms}")
print(f"Formula: {mol.formula()}")
print(f"MW: {mol.molecular_weight():.2f}")

# Parse multiple molecules
mols = sdfrust.parse_sdf_file_multi("database.sdf")
for mol in mols:
    print(f"{mol.name}: {mol.num_atoms} atoms")

# Memory-efficient iteration over large files
for mol in sdfrust.iter_sdf_file("large_database.sdf"):
    print(f"{mol.name}: MW={mol.molecular_weight():.2f}")

Supported Formats

  • SDF V2000: Full support for reading and writing (up to 999 atoms/bonds)
  • SDF V3000: Full support for reading and writing (unlimited atoms/bonds)
  • MOL2 TRIPOS: Full support for reading and writing
  • XYZ: Read support for XYZ coordinate files (single and multi-molecule)
  • Gzip: Transparent decompression of .gz files for all formats

API Reference

Parsing Functions

SDF Files

# Single molecule
mol = sdfrust.parse_sdf_file("file.sdf")      # V2000
mol = sdfrust.parse_sdf_auto_file("file.sdf") # Auto-detect V2000/V3000
mol = sdfrust.parse_sdf_v3000_file("file.sdf") # V3000 only

# Multiple molecules
mols = sdfrust.parse_sdf_file_multi("file.sdf")
mols = sdfrust.parse_sdf_auto_file_multi("file.sdf")

# From string
mol = sdfrust.parse_sdf_string(content)
mols = sdfrust.parse_sdf_string_multi(content)

MOL2 Files

mol = sdfrust.parse_mol2_file("file.mol2")
mols = sdfrust.parse_mol2_file_multi("file.mol2")
mol = sdfrust.parse_mol2_string(content)

Iterators (Memory-Efficient)

for mol in sdfrust.iter_sdf_file("large.sdf"):
    process(mol)

for mol in sdfrust.iter_mol2_file("large.mol2"):
    process(mol)

Writing Functions

# Single molecule
sdfrust.write_sdf_file(mol, "output.sdf")
sdfrust.write_sdf_auto_file(mol, "output.sdf")  # Auto V2000/V3000
sdf_string = sdfrust.write_sdf_string(mol)

# Multiple molecules
sdfrust.write_sdf_file_multi(mols, "output.sdf")

Molecule Properties

mol = sdfrust.parse_sdf_file("aspirin.sdf")

# Basic info
print(mol.name)           # Molecule name
print(mol.num_atoms)      # Number of atoms
print(mol.num_bonds)      # Number of bonds
print(mol.formula())      # Molecular formula

# Descriptors
print(mol.molecular_weight())    # Molecular weight
print(mol.exact_mass())          # Monoisotopic mass
print(mol.heavy_atom_count())    # Non-hydrogen atoms
print(mol.ring_count())          # Number of rings
print(mol.rotatable_bond_count()) # Rotatable bonds
print(mol.total_charge())        # Sum of formal charges

# Geometry
centroid = mol.centroid()        # (x, y, z) center
mol.translate(1.0, 0.0, 0.0)     # Move molecule
mol.center()                     # Center at origin

# Properties (from SDF data block)
cid = mol.get_property("PUBCHEM_CID")
mol.set_property("SOURCE", "generated")

Atom Access

# Iterate over atoms
for atom in mol.atoms:
    print(f"{atom.element} at ({atom.x}, {atom.y}, {atom.z})")

# Get specific atom
atom = mol.get_atom(0)
print(atom.element)
print(atom.formal_charge)
print(atom.coords())  # (x, y, z) tuple

# Filter atoms
carbons = mol.atoms_by_element("C")
neighbors = mol.neighbors(0)  # Atom indices bonded to atom 0

Bond Access

# Iterate over bonds
for bond in mol.bonds:
    print(f"{bond.atom1}-{bond.atom2}: {bond.order}")

# Filter bonds
double_bonds = mol.bonds_by_order(sdfrust.BondOrder.double())
aromatic = mol.has_aromatic_bonds()

# Bond properties
bond = mol.bonds[0]
print(bond.is_aromatic())
print(bond.contains_atom(0))
print(bond.other_atom(0))  # Other atom in bond

NumPy Integration

import numpy as np
import sdfrust

mol = sdfrust.parse_sdf_file("molecule.sdf")

# Get coordinates as NumPy array
coords = mol.get_coords_array()  # Shape: (N, 3)
print(coords.shape)

# Modify and set back
coords[:, 0] += 10.0  # Translate in x
mol.set_coords_array(coords)

# Get atomic numbers
atomic_nums = mol.get_atomic_numbers()  # Shape: (N,)

Creating Molecules

import sdfrust

# Create empty molecule
mol = sdfrust.Molecule("water")

# Add atoms
mol.add_atom(sdfrust.Atom(0, "O", 0.0, 0.0, 0.0))
mol.add_atom(sdfrust.Atom(1, "H", 0.96, 0.0, 0.0))
mol.add_atom(sdfrust.Atom(2, "H", -0.24, 0.93, 0.0))

# Add bonds
mol.add_bond(sdfrust.Bond(0, 1, sdfrust.BondOrder.single()))
mol.add_bond(sdfrust.Bond(0, 2, sdfrust.BondOrder.single()))

# Write to file
sdfrust.write_sdf_file(mol, "water.sdf")

Examples

The examples/ directory contains runnable scripts demonstrating real-world usage:

Script Description
basic_usage.py Core API: parsing, writing, atoms, bonds, descriptors, NumPy
format_conversion.py Multi-format detection, XYZ parsing, SDF/MOL2 conversion, round-trips
batch_analysis.py Drug library processing: filtering, sorting, Lipinski analysis
geometry_analysis.py 3D geometry: distance matrices, RMSD, rotation, transforms
cd sdfrust-python
maturin develop --features numpy,geometry
python examples/basic_usage.py
python examples/format_conversion.py
python examples/batch_analysis.py
python examples/geometry_analysis.py

Performance

sdfrust is implemented in Rust for maximum performance. Benchmarks show it is significantly faster than pure Python parsers and comparable to C++ implementations.

For large files, use the iterator API (iter_sdf_file) to process molecules one at a time without loading the entire file into memory.

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sdfrust-0.6.0.tar.gz (158.7 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

sdfrust-0.6.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (735.3 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ x86-64

sdfrust-0.6.0-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (730.2 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ ARM64

sdfrust-0.6.0-cp312-cp312-win_amd64.whl (555.2 kB view details)

Uploaded CPython 3.12Windows x86-64

sdfrust-0.6.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (735.4 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

sdfrust-0.6.0-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (730.7 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ ARM64

sdfrust-0.6.0-cp312-cp312-macosx_11_0_arm64.whl (670.4 kB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

sdfrust-0.6.0-cp312-cp312-macosx_10_12_x86_64.whl (688.9 kB view details)

Uploaded CPython 3.12macOS 10.12+ x86-64

sdfrust-0.6.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (735.9 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

sdfrust-0.6.0-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (730.2 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ ARM64

sdfrust-0.6.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (736.1 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

sdfrust-0.6.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (729.5 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ ARM64

sdfrust-0.6.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (736.8 kB view details)

Uploaded CPython 3.9manylinux: glibc 2.17+ x86-64

sdfrust-0.6.0-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (729.9 kB view details)

Uploaded CPython 3.9manylinux: glibc 2.17+ ARM64

File details

Details for the file sdfrust-0.6.0.tar.gz.

File metadata

  • Download URL: sdfrust-0.6.0.tar.gz
  • Upload date:
  • Size: 158.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.12.3

File hashes

Hashes for sdfrust-0.6.0.tar.gz
Algorithm Hash digest
SHA256 af9d175846835b3c8b7de5a00f4bc63605ee05f02e58aa988b4b4820d9095cd0
MD5 c8f97f2ca6e4673ecf17c89f1d19fca3
BLAKE2b-256 cb1b75fcd59fea31e4b51fc66d266ad325d8c0c37b946b47a4e8ddb1537802c8

See more details on using hashes here.

File details

Details for the file sdfrust-0.6.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for sdfrust-0.6.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 55a4ce59d9b8d308ce1b97237d0e88f14646fd213882c70506be3488e4f39e10
MD5 f0fc35ad15c40dc3c1ecfcfa9747b70b
BLAKE2b-256 2f06e529cb3bad7efe9a6f98df2ed9cf898d7531dd5597310f2785946dde9898

See more details on using hashes here.

File details

Details for the file sdfrust-0.6.0-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for sdfrust-0.6.0-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 47117868c0f6d6d3f4ea3bb1296da6080bde59cd64cc3aebc93f2d624e213251
MD5 ae158c51d1d356792c12d16ea4dc84fd
BLAKE2b-256 30adb614caf1a2a1fd4c7355f81ed8b799718bedc6ff4e5bb45dc8398628ad94

See more details on using hashes here.

File details

Details for the file sdfrust-0.6.0-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: sdfrust-0.6.0-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 555.2 kB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.12.3

File hashes

Hashes for sdfrust-0.6.0-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 d1c3bed9667a239989746af7b00531dbad19a8b0d376dc4c0934e54304de8b97
MD5 cdec02a2f446ac958f067d73854339cf
BLAKE2b-256 492644fd7ddf8720ad9c65ce07a2a975d1c18fa8c70b13b8c20a46309ba1eb6a

See more details on using hashes here.

File details

Details for the file sdfrust-0.6.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for sdfrust-0.6.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 d7c2d45a57a071c18555eb250f4b023685357bb916ecad349db52c1dd93f4462
MD5 916d1092e27893d4a6f909997ac96f42
BLAKE2b-256 651bceecbbca0de04bfc35f54121ecacf85f4a31140e0a36e109272efdbb859f

See more details on using hashes here.

File details

Details for the file sdfrust-0.6.0-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for sdfrust-0.6.0-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 cfe929bb1338a6eb427dcae24e9fc65343e60718a5ecec94a484898a126a689b
MD5 455c25b72385b8fa5ad240c8b7af9cf0
BLAKE2b-256 a18ab95f8c707958e6bf41e2b875823c0363951ae15d351a17c6166b8ef776b9

See more details on using hashes here.

File details

Details for the file sdfrust-0.6.0-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for sdfrust-0.6.0-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 9db6b769c39acbbc5f201a7d115d6678e6174ca0ce5e8b047ad9571cd9d490c2
MD5 c05a952c9f7f9b929f6820b8df6f00cc
BLAKE2b-256 2ef8be4d796031116e2050e66f7dc67bb73c2db9e91c8823a32ee63053de05f0

See more details on using hashes here.

File details

Details for the file sdfrust-0.6.0-cp312-cp312-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for sdfrust-0.6.0-cp312-cp312-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 d3b691017e2f025c7462196ae55b9883bf73755151ba5d5058cd179f3608a73b
MD5 1bc5b8404fb1fd77a9bf80a91fb9cd7a
BLAKE2b-256 d9440c19f39c49b1f7a650789208f4e51147d139f28e3c668f1997b1b4a642cd

See more details on using hashes here.

File details

Details for the file sdfrust-0.6.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for sdfrust-0.6.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 29b3a2658334660e3cc7fe14b6822ba0b5ea45c8c3720388b54ea41d4f1f6228
MD5 2e1551e40749d2b8ae790c857cc778c8
BLAKE2b-256 61133fab597c8b2e0c970f6358bc04facbca44bec3639752e090f67ddd47e225

See more details on using hashes here.

File details

Details for the file sdfrust-0.6.0-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for sdfrust-0.6.0-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 19964c717aa5295b7e75faf75e6628fb9f0475e6788d9501d6cc228c0f599dc7
MD5 21dac1384962f9813309704c995ed3ba
BLAKE2b-256 161ce1ea611c28906bb770956ce2efab0c132476fec49eea72c81da799266463

See more details on using hashes here.

File details

Details for the file sdfrust-0.6.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for sdfrust-0.6.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 c8cb3327ac7f7edb26569764a7abe2899433a720dc4f007bf62c4f99aae00cae
MD5 c8fb1aec3bde7884bb193bd537049226
BLAKE2b-256 2d7684214b2a24778e7164bbab59af67d93f2c06e0c5a282654cbc9d7309b5e2

See more details on using hashes here.

File details

Details for the file sdfrust-0.6.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for sdfrust-0.6.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 878598a1c4e97715ac2487d42cbcd6a21dd067c37dbfac6899bc3209191b45b0
MD5 f7d6e3f3b6e7d656336af0b66cbcef99
BLAKE2b-256 bd845cd7c853d3f28d809820fddc25ac48d4d3984d6c618b26b6afb36739af55

See more details on using hashes here.

File details

Details for the file sdfrust-0.6.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for sdfrust-0.6.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 7952d9660046a867aaf10b3b34f066ac65dc45d4cf2a0fb783ef2c3699a2e39e
MD5 160ae3492d337831221de77b46fe0040
BLAKE2b-256 7369e6da2ba878113e58a4a05d30d159347207b58e007385ed20212be1d82315

See more details on using hashes here.

File details

Details for the file sdfrust-0.6.0-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for sdfrust-0.6.0-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 26ba3b9cf05919ec8e0a85e30b0bdf4f9d53e0c7550a4aebd2cb93fff058b1e9
MD5 92b626fb33df45e5cb461534716a32b8
BLAKE2b-256 90bc84669cc4a5d1b4e6d1a5b1a64da7496faf01fbb63f6b01f07a184c79fa76

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page