Skip to main content

Fast Rust-based SDF, MOL2, and XYZ molecular structure file parser

Project description

sdfrust - Python Bindings

Fast Rust-based SDF, MOL2, and XYZ molecular structure file parser with Python bindings, including transparent gzip decompression.

Installation

From source (requires Rust toolchain)

cd sdfrust-python
pip install maturin
maturin develop --features numpy

Build wheel

maturin build --release --features numpy
pip install target/wheels/sdfrust-*.whl

Quick Start

import sdfrust

# Parse a single SDF file
mol = sdfrust.parse_sdf_file("molecule.sdf")
print(f"Name: {mol.name}")
print(f"Atoms: {mol.num_atoms}")
print(f"Formula: {mol.formula()}")
print(f"MW: {mol.molecular_weight():.2f}")

# Parse multiple molecules
mols = sdfrust.parse_sdf_file_multi("database.sdf")
for mol in mols:
    print(f"{mol.name}: {mol.num_atoms} atoms")

# Memory-efficient iteration over large files
for mol in sdfrust.iter_sdf_file("large_database.sdf"):
    print(f"{mol.name}: MW={mol.molecular_weight():.2f}")

Supported Formats

  • SDF V2000: Full support for reading and writing (up to 999 atoms/bonds)
  • SDF V3000: Full support for reading and writing (unlimited atoms/bonds)
  • MOL2 TRIPOS: Full support for reading and writing
  • XYZ: Read support for XYZ coordinate files (single and multi-molecule)
  • Gzip: Transparent decompression of .gz files for all formats

API Reference

Parsing Functions

SDF Files

# Single molecule
mol = sdfrust.parse_sdf_file("file.sdf")      # V2000
mol = sdfrust.parse_sdf_auto_file("file.sdf") # Auto-detect V2000/V3000
mol = sdfrust.parse_sdf_v3000_file("file.sdf") # V3000 only

# Multiple molecules
mols = sdfrust.parse_sdf_file_multi("file.sdf")
mols = sdfrust.parse_sdf_auto_file_multi("file.sdf")

# From string
mol = sdfrust.parse_sdf_string(content)
mols = sdfrust.parse_sdf_string_multi(content)

MOL2 Files

mol = sdfrust.parse_mol2_file("file.mol2")
mols = sdfrust.parse_mol2_file_multi("file.mol2")
mol = sdfrust.parse_mol2_string(content)

Iterators (Memory-Efficient)

for mol in sdfrust.iter_sdf_file("large.sdf"):
    process(mol)

for mol in sdfrust.iter_mol2_file("large.mol2"):
    process(mol)

Writing Functions

# Single molecule
sdfrust.write_sdf_file(mol, "output.sdf")
sdfrust.write_sdf_auto_file(mol, "output.sdf")  # Auto V2000/V3000
sdf_string = sdfrust.write_sdf_string(mol)

# Multiple molecules
sdfrust.write_sdf_file_multi(mols, "output.sdf")

Molecule Properties

mol = sdfrust.parse_sdf_file("aspirin.sdf")

# Basic info
print(mol.name)           # Molecule name
print(mol.num_atoms)      # Number of atoms
print(mol.num_bonds)      # Number of bonds
print(mol.formula())      # Molecular formula

# Descriptors
print(mol.molecular_weight())    # Molecular weight
print(mol.exact_mass())          # Monoisotopic mass
print(mol.heavy_atom_count())    # Non-hydrogen atoms
print(mol.ring_count())          # Number of rings
print(mol.rotatable_bond_count()) # Rotatable bonds
print(mol.total_charge())        # Sum of formal charges

# Geometry
centroid = mol.centroid()        # (x, y, z) center
mol.translate(1.0, 0.0, 0.0)     # Move molecule
mol.center()                     # Center at origin

# Properties (from SDF data block)
cid = mol.get_property("PUBCHEM_CID")
mol.set_property("SOURCE", "generated")

Atom Access

# Iterate over atoms
for atom in mol.atoms:
    print(f"{atom.element} at ({atom.x}, {atom.y}, {atom.z})")

# Get specific atom
atom = mol.get_atom(0)
print(atom.element)
print(atom.formal_charge)
print(atom.coords())  # (x, y, z) tuple

# Filter atoms
carbons = mol.atoms_by_element("C")
neighbors = mol.neighbors(0)  # Atom indices bonded to atom 0

Bond Access

# Iterate over bonds
for bond in mol.bonds:
    print(f"{bond.atom1}-{bond.atom2}: {bond.order}")

# Filter bonds
double_bonds = mol.bonds_by_order(sdfrust.BondOrder.double())
aromatic = mol.has_aromatic_bonds()

# Bond properties
bond = mol.bonds[0]
print(bond.is_aromatic())
print(bond.contains_atom(0))
print(bond.other_atom(0))  # Other atom in bond

NumPy Integration

import numpy as np
import sdfrust

mol = sdfrust.parse_sdf_file("molecule.sdf")

# Get coordinates as NumPy array
coords = mol.get_coords_array()  # Shape: (N, 3)
print(coords.shape)

# Modify and set back
coords[:, 0] += 10.0  # Translate in x
mol.set_coords_array(coords)

# Get atomic numbers
atomic_nums = mol.get_atomic_numbers()  # Shape: (N,)

Creating Molecules

import sdfrust

# Create empty molecule
mol = sdfrust.Molecule("water")

# Add atoms
mol.add_atom(sdfrust.Atom(0, "O", 0.0, 0.0, 0.0))
mol.add_atom(sdfrust.Atom(1, "H", 0.96, 0.0, 0.0))
mol.add_atom(sdfrust.Atom(2, "H", -0.24, 0.93, 0.0))

# Add bonds
mol.add_bond(sdfrust.Bond(0, 1, sdfrust.BondOrder.single()))
mol.add_bond(sdfrust.Bond(0, 2, sdfrust.BondOrder.single()))

# Write to file
sdfrust.write_sdf_file(mol, "water.sdf")

Performance

sdfrust is implemented in Rust for maximum performance. Benchmarks show it is significantly faster than pure Python parsers and comparable to C++ implementations.

For large files, use the iterator API (iter_sdf_file) to process molecules one at a time without loading the entire file into memory.

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sdfrust-0.5.0.tar.gz (93.6 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

sdfrust-0.5.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (656.9 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ x86-64

sdfrust-0.5.0-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (651.0 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ ARM64

sdfrust-0.5.0-cp312-cp312-win_amd64.whl (483.0 kB view details)

Uploaded CPython 3.12Windows x86-64

sdfrust-0.5.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (657.2 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

sdfrust-0.5.0-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (651.5 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ ARM64

sdfrust-0.5.0-cp312-cp312-macosx_11_0_arm64.whl (595.6 kB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

sdfrust-0.5.0-cp312-cp312-macosx_10_12_x86_64.whl (610.0 kB view details)

Uploaded CPython 3.12macOS 10.12+ x86-64

sdfrust-0.5.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (657.5 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

sdfrust-0.5.0-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (652.0 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ ARM64

sdfrust-0.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (657.8 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

sdfrust-0.5.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (650.8 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ ARM64

sdfrust-0.5.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (658.4 kB view details)

Uploaded CPython 3.9manylinux: glibc 2.17+ x86-64

sdfrust-0.5.0-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (651.4 kB view details)

Uploaded CPython 3.9manylinux: glibc 2.17+ ARM64

File details

Details for the file sdfrust-0.5.0.tar.gz.

File metadata

  • Download URL: sdfrust-0.5.0.tar.gz
  • Upload date:
  • Size: 93.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.12.2

File hashes

Hashes for sdfrust-0.5.0.tar.gz
Algorithm Hash digest
SHA256 9a1390b0199be9d6695a847d87dde110a61f915905e30eb7cddecfb4d4227ba8
MD5 31d7fde89ca8ebd2159349cc775fec72
BLAKE2b-256 4564b01f7b689ba674cf90e04be9b92ddfb0ce19d6ad7f4b90192f6acb986d12

See more details on using hashes here.

File details

Details for the file sdfrust-0.5.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for sdfrust-0.5.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 0cb5f10875598ced3e21dc8ff3fa7a136695ff05bc37a95aae3d1e3ef1c297a0
MD5 66355188f04c7bfc839cae980d24ca2d
BLAKE2b-256 8a54407163dd19fa8a2aadcf7793170829f3c693bc3ff9e90e2c4fef40f4abd1

See more details on using hashes here.

File details

Details for the file sdfrust-0.5.0-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for sdfrust-0.5.0-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 69dfc53ae62cc853f30efb44a9c237096fdff64c56f857c2173835e9a534cb86
MD5 3894592b9583b68a4d9ded6e3608feaa
BLAKE2b-256 a9da6527e2f5dd9a6302352070e6903def252c79075608ada91ef71d4e78bcff

See more details on using hashes here.

File details

Details for the file sdfrust-0.5.0-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: sdfrust-0.5.0-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 483.0 kB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.12.2

File hashes

Hashes for sdfrust-0.5.0-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 62ed58657d3bcfb9dfe8b564c3cb26932bd407c0b3993c80261fd8d5b0932e67
MD5 64b1912328ccf7697daad56e1daff7ad
BLAKE2b-256 c7149fbd7f5bc3928b5b429a5c7c33197edbcf78541f87634e472ab61ddcfdce

See more details on using hashes here.

File details

Details for the file sdfrust-0.5.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for sdfrust-0.5.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 c9f337063a02d00e0be394e2ab445108817be52fc6f38d1376caf84c4c651e39
MD5 b8d06406921c730f991b3c38601630e6
BLAKE2b-256 702210d11449888f049b1d376448c1bce952578a3fcff7374fcc1d5e6aaa916a

See more details on using hashes here.

File details

Details for the file sdfrust-0.5.0-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for sdfrust-0.5.0-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 e74a6e9e8d0c99063a8f5d57f9ca6bfa715f361c742449f889beb997be2d42b1
MD5 9d1de8ef7006a5165456d3a287bc9c07
BLAKE2b-256 f2677da8220e7098237e468cbfd715cca7f4c57ad4d2beac731e0c9389a2f0b4

See more details on using hashes here.

File details

Details for the file sdfrust-0.5.0-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for sdfrust-0.5.0-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 f57643c4818a217eded12014f44b8f3a69a2463146709b3505c15305f9716897
MD5 1dfd175b0fc34b6b330c4bb8e577acdd
BLAKE2b-256 80bcf24828aca88763e20d16d10627f6cbfc4e38c3ecf4d85ed595a0ab5bd4df

See more details on using hashes here.

File details

Details for the file sdfrust-0.5.0-cp312-cp312-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for sdfrust-0.5.0-cp312-cp312-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 a88111ae397d3111e8a5ad16793a628aaa4180e4812de4ef15e2baebb63275f1
MD5 7327f56b993f1e0e8b9654df9c81a285
BLAKE2b-256 a362266dcc811a8ec813848ad94391e043022cabba8b469b5f7ac69418549845

See more details on using hashes here.

File details

Details for the file sdfrust-0.5.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for sdfrust-0.5.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 b382e8c41bd2dee93021d42d056c7157bd4451a39faf798f252951928c963084
MD5 97ccd8d903ca88c2f7b4e13bcd7a359e
BLAKE2b-256 9911010939bdaeb44ce433e3cd6c5bf39c7fe5ae0e736cf9519849cab3bbece5

See more details on using hashes here.

File details

Details for the file sdfrust-0.5.0-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for sdfrust-0.5.0-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 05b76810eeff7b1fbc9518406721c134ef5d2bdf88a39557487e8e5765d7f736
MD5 9c3ce4d139530b344905ce7150ef676a
BLAKE2b-256 0629bdd7b7eb46c90b46a08187a99058db076288f4748b440137df8fba51610b

See more details on using hashes here.

File details

Details for the file sdfrust-0.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for sdfrust-0.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 8b4bdcd1ed7dacfbec901ca124e34c4e508fdcaa3c53ef755cbddec81d432653
MD5 4a79aed93e1d11a24b3549ef5c5ac3c0
BLAKE2b-256 d9b1c9c32356c72dbb4344ff42aeba3c0791ced3635cd3e540a42099ea43f3f4

See more details on using hashes here.

File details

Details for the file sdfrust-0.5.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for sdfrust-0.5.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 ff25bb44dfecd6b1c6dcee590e21f4bf58b940d96df367d8da3cf6f5ed42f42a
MD5 852683e9ed0cc40e8dd4f8ce4c39221f
BLAKE2b-256 1fe9522c26893ea2166d7792d3c0ea4e52c64aa14190d0cd08c80e5a5fce8541

See more details on using hashes here.

File details

Details for the file sdfrust-0.5.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for sdfrust-0.5.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 26f36030338605955dc33d48165d7748dde4c0110737ebcbed3e9a555ffc97d0
MD5 953d8933f8b890f51fe3eff52556dd7b
BLAKE2b-256 364f0fd48861e8e8d14e61b4ecc6094b40ed1d68b77d0a98ad2e250e8c257962

See more details on using hashes here.

File details

Details for the file sdfrust-0.5.0-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for sdfrust-0.5.0-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 8aa8c84d4c091e2131aeeb279709d4efc0ae2515dac43e62cc5172735238443b
MD5 c00c62d6b41531f6ce28736734de6fe2
BLAKE2b-256 c6ea8cadf2dc119369fece3cb387c67ecad667c1cc759d29d1ce4feb26ee6361

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page