High-performance 3D molecular conformer generation — Python bindings
Project description
sci-form
High-performance 3D molecular conformer generation using ETKDG distance geometry, written in Rust.
Generates chemically valid 3D coordinates from SMILES strings, matching RDKit's ETKDGv2 quality while offering native bindings for Rust, Python, TypeScript/JavaScript (WASM), and a cross-platform CLI.
Features
- ETKDG Distance Geometry — Cambridge Structural Database torsion preferences (837 SMARTS patterns)
- High Accuracy — 0.00% heavy-atom RMSD > 0.5 Å vs RDKit on GDB-20 (2000 molecules, ensemble comparison)
- Fast — 60+ molecules/second in Rust, parallel batch processing via rayon
- Multi-platform — Rust lib, Python (PyO3), TypeScript/JS (WASM), CLI (Linux/macOS/Windows)
- Zero runtime dependencies — pure Rust, no C++ toolchain needed
- SMILES + SMARTS — full SMILES parser and SMARTS pattern matching engine
Installation
Rust
[dependencies]
sci-form = "0.1"
let result = sci_form::embed("CCO", 42);
println!("Atoms: {}, Coords: {:?}", result.num_atoms, result.coords);
Python
The Python package is published as sciforma on PyPI.
The import name is sci_form.
pip install sciforma
import sci_form
result = sci_form.embed("CCO") # seed=42 is default
print(f"Atoms: {result.num_atoms}, Time: {result.time_ms:.1f}ms")
positions = result.get_positions() # list of (x, y, z) tuples
# Batch
results = sci_form.embed_batch(["CCO", "c1ccccc1", "CC(=O)O"])
for r in results:
if r.is_ok():
print(f"{r.smiles}: {r.num_atoms} atoms")
TypeScript / JavaScript (Node.js + Browser)
The npm package is sci-form-wasm.
npm install sci-form-wasm
Node.js (CommonJS)
const sci = require('sci-form-wasm');
const result = JSON.parse(sci.embed('CCO', 42));
console.log(`Atoms: ${result.num_atoms}`);
ES Module / TypeScript
import init, { embed } from 'sci-form-wasm';
await init();
const result = JSON.parse(embed('CC(=O)O', 42));
console.log(`Atoms: ${result.num_atoms}`);
CLI
Download prebuilt binary from GitHub Releases:
| Platform | File |
|---|---|
| Linux x86_64 | sci-form-linux-x86_64 |
| Linux aarch64 | sci-form-linux-aarch64 |
| macOS x86_64 | sci-form-macos-x86_64 |
| macOS Apple Silicon | sci-form-macos-aarch64 |
| Windows x86_64 | sci-form-windows-x86_64.exe |
Or install via cargo:
cargo install sci-form-cli
# Single molecule
sci-form embed "CCO" --format xyz
# Batch processing
sci-form batch -i molecules.smi -o output.sdf --format sdf --threads 8
# Parse only (no 3D)
sci-form parse "c1ccccc1"
# Show version / features
sci-form info
Benchmark Results
Diverse Molecules (131 molecules, all chemical functional groups)
| Metric | Value |
|---|---|
| Parse success | 100% |
| Embed success | 97.7% |
| Geometry quality | 97.7% |
| Throughput | 60 mol/s |
RDKit Comparison (heavy-atom pairwise-distance RMSD)
| Metric | Value |
|---|---|
| Average RMSD | 0.064 Å |
| Median RMSD | 0.011 Å |
| < 0.5 Å | 98.4% |
| < 0.3 Å | 94.4% |
GDB-20 Ensemble (2000 molecules × 10 seeds vs 21 RDKit seeds)
| Metric | All-atom | Heavy-atom |
|---|---|---|
| Avg RMSD | 0.035 Å | 0.018 Å |
| > 0.5 Å | 0.95% | 0.00% |
Algorithm
sci-form implements ETKDGv2 (Experimental Torsion Knowledge Distance Geometry):
- SMILES Parsing → Molecular graph with atoms, bonds, hybridization
- Bounds Matrix → 1-2, 1-3, 1-4, and VdW distance bounds from topology
- Triangle Smoothing → Floyd-Warshall triangle inequality enforcement
- Distance Picking → Random distances from smoothed bounds (MinstdRand)
- Metric Matrix Embedding → Eigendecomposition → 4D coordinates
- Bounds Force Field → BFGS minimization in 4D to satisfy distance constraints
- Projection to 3D → Drop lowest-variance dimension
- ETKDG 3D Refinement — Force field with CSD torsion preferences (837 patterns)
- Validation — Tetrahedral centers, planarity, double-bond geometry
See the algorithm documentation for mathematical derivations and step-by-step diagrams.
Building from Source
# Library + CLI
cargo build --release
# Python bindings
cd crates/python && pip install maturin && maturin develop --release
# WASM bindings
cd crates/wasm && wasm-pack build --target bundler --release
Testing
# Unit tests
cargo test --lib
# Integration — diverse molecules
cargo test --release --test test_diverse_molecules -- --nocapture
# Integration — geometry quality (requires gdb20_reference.json, see scripts/)
cargo test --release --test test_geometry_quality -- --nocapture
# Integration — gradient correctness
cargo test --release --test test_gradient_check -- --nocapture
# Lint & format
cargo fmt --all -- --check
cargo clippy --all-targets -- -D warnings
Releasing a New Version
Use the provided bump script. It updates all version strings, commits, tags, and pushes:
# Auto-increment patch (0.1.7 → 0.1.8)
./scripts/bump_version.sh
# Set a specific version
./scripts/bump_version.sh 0.2.0
This updates versions in:
Cargo.toml(root lib)crates/cli/Cargo.tomlcrates/python/Cargo.tomlcrates/wasm/Cargo.tomlcrates/python/pyproject.tomlcrates/wasm/pkg/package.jsonpkg/package.json&pkg-node/package.json
Then creates a vX.Y.Z git tag, which triggers the release workflow to publish to crates.io, PyPI, and npm automatically.
Required repository secrets:
| Secret | Used for |
|---|---|
CARGO_REGISTRY_TOKEN |
Publishing to crates.io |
PYPI_API_TOKEN |
Publishing to PyPI (sciforma) |
NPM_TOKEN |
Publishing to npm (sci-form-wasm) — must be a Granular Automation token |
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sciforma-0.2.0-cp311-cp311-win_amd64.whl.
File metadata
- Download URL: sciforma-0.2.0-cp311-cp311-win_amd64.whl
- Upload date:
- Size: 338.0 kB
- Tags: CPython 3.11, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
05d8a5a66cb2baaee4046b80434433f9e73869c1b3ff80bcaf89f842f022c80a
|
|
| MD5 |
c56669d414be0b2e5dab8ee88d1cccc7
|
|
| BLAKE2b-256 |
ea6b4aaa23a6d20f870c17131b3768bd897c25a0143e58b6dea75e2a3e4eafe3
|
File details
Details for the file sciforma-0.2.0-cp311-cp311-manylinux_2_34_x86_64.whl.
File metadata
- Download URL: sciforma-0.2.0-cp311-cp311-manylinux_2_34_x86_64.whl
- Upload date:
- Size: 426.5 kB
- Tags: CPython 3.11, manylinux: glibc 2.34+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
625b3209b53b3805ae19fc452f2e8aac448ae72a4b5ce393ac6bad3c7cebc4eb
|
|
| MD5 |
70d09c1886718d5c73860b574f172531
|
|
| BLAKE2b-256 |
fa8ce7dc5a70885d8527966c112bedcfa2a1aef02a9417b8214ea8236fb907cd
|
File details
Details for the file sciforma-0.2.0-cp311-cp311-macosx_11_0_arm64.whl.
File metadata
- Download URL: sciforma-0.2.0-cp311-cp311-macosx_11_0_arm64.whl
- Upload date:
- Size: 377.3 kB
- Tags: CPython 3.11, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c6f79d6693ea84d43c058389993e72f8f704b7ab3902095562da6106da52587e
|
|
| MD5 |
3b2424c7c42adafe8c21936a5986f069
|
|
| BLAKE2b-256 |
58643ddca918f35cd63086aff3f77a0afa5d4c7626e68c6d2be99b1888f304bb
|