Skip to main content

High-performance 3D molecular conformer generation — Python bindings

Project description

sci-form

High-performance 3D molecular conformer generation using ETKDG distance geometry, written in Rust.

Generates chemically valid 3D coordinates from SMILES strings, matching RDKit's ETKDGv2 quality while offering native bindings for Rust, Python, TypeScript/JavaScript (WASM), and a cross-platform CLI.

Features

  • ETKDG Distance Geometry — Cambridge Structural Database torsion preferences (837 SMARTS patterns)
  • High Accuracy — 0.00% heavy-atom RMSD > 0.5 Å vs RDKit on GDB-20 (2000 molecules, ensemble comparison)
  • Fast — 60+ molecules/second in Rust, parallel batch processing via rayon
  • Multi-platform — Rust lib, Python (PyO3), TypeScript/JS (WASM), CLI (Linux/macOS/Windows)
  • Zero dependencies at runtime — pure Rust, no C++ toolchain needed
  • SMILES + SMARTS — full SMILES parser and SMARTS pattern matching engine

Quick Start

Rust

[dependencies]
sci-form = "0.1"
let result = sci_form::embed("CCO", 42);
println!("Atoms: {}, Coords: {:?}", result.num_atoms, result.coords);

Python

pip install sci-form
import sci_form

result = sci_form.embed("CCO")
print(f"Atoms: {result.num_atoms}, Time: {result.time_ms:.1f}ms")
positions = result.get_positions()  # [(x, y, z), ...]

TypeScript / JavaScript

npm install sci-form
import { embed } from 'sci-form';

const result = JSON.parse(embed("CCO", 42));
console.log(`Atoms: ${result.num_atoms}`);

CLI

# Single molecule
sci-form embed "CCO" --format xyz

# Batch processing
sci-form batch -i molecules.smi -o output.sdf --format sdf --threads 8

# Parse only (no 3D)
sci-form parse "c1ccccc1"

Benchmark Results

Diverse Molecules (131 molecules, all chemical functional groups)

Metric Value
Parse success 100%
Embed success 97.7%
Geometry quality 97.7%
Throughput 60 mol/s

RDKit Comparison (heavy-atom pairwise-distance RMSD)

Metric Value
Average RMSD 0.064 Å
Median RMSD 0.011 Å
< 0.5 Å 98.4%
< 0.3 Å 94.4%

GDB-20 Ensemble (2000 molecules × 10 seeds vs 21 RDKit seeds)

Metric All-atom Heavy-atom
Avg RMSD 0.035 Å 0.018 Å
> 0.5 Å 0.95% 0.00%

Algorithm

sci-form implements the ETKDGv2 (Experimental Torsion Knowledge Distance Geometry) algorithm:

  1. SMILES Parsing → Molecular graph with atoms, bonds, hybridization
  2. Bounds Matrix → 1-2, 1-3, 1-4, and VdW distance bounds from topology
  3. Triangle Smoothing → Floyd-Warshall triangle inequality enforcement
  4. Distance Picking → Random distances from smoothed bounds (MinstdRand)
  5. Metric Matrix Embedding → Eigendecomposition → 4D coordinates
  6. Bounds Force Field → BFGS minimization in 4D to satisfy distance constraints
  7. Projection to 3D → Drop lowest-variance dimension
  8. ETKDG 3D Refinement — Force field with CSD torsion preferences (837 patterns)
  9. Validation — Tetrahedral centers, planarity, double-bond geometry

See documentation for detailed algorithm descriptions with mathematical derivations.

Building from Source

# Library + CLI
cargo build --release

# Python bindings
cd crates/python && maturin develop --release

# WASM bindings
cd crates/wasm && wasm-pack build --target bundler --release

Testing

# Unit tests
cargo test --lib

# Diverse molecule benchmark
cargo test --release --test test_diverse_molecules -- --nocapture

# Geometry quality (requires GDB20.50000.smi)
cargo test --release --test test_geometry_quality -- --nocapture

# Gradient correctness
cargo test --release --test test_gradient_check -- --nocapture

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

sciforma-0.1.7-cp311-cp311-win_amd64.whl (336.8 kB view details)

Uploaded CPython 3.11Windows x86-64

sciforma-0.1.7-cp311-cp311-manylinux_2_34_x86_64.whl (425.4 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.34+ x86-64

sciforma-0.1.7-cp311-cp311-macosx_11_0_arm64.whl (376.1 kB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

File details

Details for the file sciforma-0.1.7-cp311-cp311-win_amd64.whl.

File metadata

  • Download URL: sciforma-0.1.7-cp311-cp311-win_amd64.whl
  • Upload date:
  • Size: 336.8 kB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sciforma-0.1.7-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 f1e192db663c252cf14ef6670cb63d2e33cfd3f02d9ed590773e7f32ec5bbb47
MD5 b6852e865c970cb831d0db94d2b20993
BLAKE2b-256 e25cb301f3ddf8f74dfdf9a06458c100dd058d7eee26d54f5e8e6b8209136cf7

See more details on using hashes here.

File details

Details for the file sciforma-0.1.7-cp311-cp311-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for sciforma-0.1.7-cp311-cp311-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 693f6cc893e68092e39686200a25b8bf649d8c37d9806144dd36733fc65c122b
MD5 5ee786f11ea88958e46cf6144e4b4b54
BLAKE2b-256 b414790c158065cfab4fdc88164467f6d36ac37e8ccaca0fa4e71959af9f27b5

See more details on using hashes here.

File details

Details for the file sciforma-0.1.7-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for sciforma-0.1.7-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 a1478f61795d41c82d3c6c8ec0487bf783f31d519c2a1093a59ec9a907645253
MD5 cdc592ee98fac65ba5b1c9c3ce15a51c
BLAKE2b-256 88d1f487e9d3f1e25e98e41616e1e495c50399c8599d1698853b630415021009

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page