Skip to main content

High-performance 3D molecular conformer generation — Python bindings

Project description

sci-form

High-performance 3D molecular conformer generation using ETKDG distance geometry, written in Rust.

Generates chemically valid 3D coordinates from SMILES strings, matching RDKit's ETKDGv2 quality while offering native bindings for Rust, Python, TypeScript/JavaScript (WASM), and a cross-platform CLI.

crates.io PyPI npm License: MIT

Features

  • ETKDG Distance Geometry — Cambridge Structural Database torsion preferences (837 SMARTS patterns)
  • High Accuracy — 0.00% heavy-atom RMSD > 0.5 Å vs RDKit on GDB-20 (2000 molecules, ensemble comparison)
  • Fast — 60+ molecules/second in Rust, parallel batch processing via rayon
  • Multi-platform — Rust lib, Python (PyO3), TypeScript/JS (WASM), CLI (Linux/macOS/Windows)
  • Zero runtime dependencies — pure Rust, no C++ toolchain needed
  • SMILES + SMARTS — full SMILES parser and SMARTS pattern matching engine

Installation

Rust

[dependencies]
sci-form = "0.1"
let result = sci_form::embed("CCO", 42);
println!("Atoms: {}, Coords: {:?}", result.num_atoms, result.coords);

Rust API docs · Full guide


Python

The Python package is published as sciforma on PyPI. The import name is sci_form.

pip install sciforma
import sci_form

result = sci_form.embed("CCO")            # seed=42 is default
print(f"Atoms: {result.num_atoms}, Time: {result.time_ms:.1f}ms")
positions = result.get_positions()        # list of (x, y, z) tuples

# Batch
results = sci_form.embed_batch(["CCO", "c1ccccc1", "CC(=O)O"])
for r in results:
    if r.is_ok():
        print(f"{r.smiles}: {r.num_atoms} atoms")

Full Python guide


TypeScript / JavaScript (Node.js + Browser)

The npm package is sci-form-wasm.

npm install sci-form-wasm

Node.js (CommonJS)

const sci = require('sci-form-wasm');
const result = JSON.parse(sci.embed('CCO', 42));
console.log(`Atoms: ${result.num_atoms}`);

ES Module / TypeScript

import init, { embed } from 'sci-form-wasm';
await init();
const result = JSON.parse(embed('CC(=O)O', 42));
console.log(`Atoms: ${result.num_atoms}`);

Full TypeScript guide


CLI

Download prebuilt binary from GitHub Releases:

Platform File
Linux x86_64 sci-form-linux-x86_64
Linux aarch64 sci-form-linux-aarch64
macOS x86_64 sci-form-macos-x86_64
macOS Apple Silicon sci-form-macos-aarch64
Windows x86_64 sci-form-windows-x86_64.exe

Or install via cargo:

cargo install sci-form-cli
# Single molecule
sci-form embed "CCO" --format xyz

# Batch processing
sci-form batch -i molecules.smi -o output.sdf --format sdf --threads 8

# Parse only (no 3D)
sci-form parse "c1ccccc1"

# Show version / features
sci-form info

Full CLI guide


Benchmark Results

Diverse Molecules (131 molecules, all chemical functional groups)

Metric Value
Parse success 100%
Embed success 97.7%
Geometry quality 97.7%
Throughput 60 mol/s

RDKit Comparison (heavy-atom pairwise-distance RMSD)

Metric Value
Average RMSD 0.064 Å
Median RMSD 0.011 Å
< 0.5 Å 98.4%
< 0.3 Å 94.4%

GDB-20 Ensemble (2000 molecules × 10 seeds vs 21 RDKit seeds)

Metric All-atom Heavy-atom
Avg RMSD 0.035 Å 0.018 Å
> 0.5 Å 0.95% 0.00%

Algorithm

sci-form implements ETKDGv2 (Experimental Torsion Knowledge Distance Geometry):

  1. SMILES Parsing → Molecular graph with atoms, bonds, hybridization
  2. Bounds Matrix → 1-2, 1-3, 1-4, and VdW distance bounds from topology
  3. Triangle Smoothing → Floyd-Warshall triangle inequality enforcement
  4. Distance Picking → Random distances from smoothed bounds (MinstdRand)
  5. Metric Matrix Embedding → Eigendecomposition → 4D coordinates
  6. Bounds Force Field → BFGS minimization in 4D to satisfy distance constraints
  7. Projection to 3D → Drop lowest-variance dimension
  8. ETKDG 3D Refinement — Force field with CSD torsion preferences (837 patterns)
  9. Validation — Tetrahedral centers, planarity, double-bond geometry

See the algorithm documentation for mathematical derivations and step-by-step diagrams.


Building from Source

# Library + CLI
cargo build --release

# Python bindings
cd crates/python && pip install maturin && maturin develop --release

# WASM bindings
cd crates/wasm && wasm-pack build --target bundler --release

Testing

# Unit tests
cargo test --lib

# Integration — diverse molecules
cargo test --release --test test_diverse_molecules -- --nocapture

# Integration — geometry quality (requires gdb20_reference.json, see scripts/)
cargo test --release --test test_geometry_quality -- --nocapture

# Integration — gradient correctness
cargo test --release --test test_gradient_check -- --nocapture

# Lint & format
cargo fmt --all -- --check
cargo clippy --all-targets -- -D warnings

Releasing a New Version

Use the provided bump script. It updates all version strings, commits, tags, and pushes:

# Auto-increment patch (0.1.7 → 0.1.8)
./scripts/bump_version.sh

# Set a specific version
./scripts/bump_version.sh 0.2.0

This updates versions in:

  • Cargo.toml (root lib)
  • crates/cli/Cargo.toml
  • crates/python/Cargo.toml
  • crates/wasm/Cargo.toml
  • crates/python/pyproject.toml
  • crates/wasm/pkg/package.json
  • pkg/package.json & pkg-node/package.json

Then creates a vX.Y.Z git tag, which triggers the release workflow to publish to crates.io, PyPI, and npm automatically.

Required repository secrets:

Secret Used for
CARGO_REGISTRY_TOKEN Publishing to crates.io
PYPI_API_TOKEN Publishing to PyPI (sciforma)
NPM_TOKEN Publishing to npm (sci-form-wasm) — must be a Granular Automation token

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

sciforma-0.2.0-cp311-cp311-win_amd64.whl (338.0 kB view details)

Uploaded CPython 3.11Windows x86-64

sciforma-0.2.0-cp311-cp311-manylinux_2_34_x86_64.whl (426.5 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.34+ x86-64

sciforma-0.2.0-cp311-cp311-macosx_11_0_arm64.whl (377.3 kB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

File details

Details for the file sciforma-0.2.0-cp311-cp311-win_amd64.whl.

File metadata

  • Download URL: sciforma-0.2.0-cp311-cp311-win_amd64.whl
  • Upload date:
  • Size: 338.0 kB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sciforma-0.2.0-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 05d8a5a66cb2baaee4046b80434433f9e73869c1b3ff80bcaf89f842f022c80a
MD5 c56669d414be0b2e5dab8ee88d1cccc7
BLAKE2b-256 ea6b4aaa23a6d20f870c17131b3768bd897c25a0143e58b6dea75e2a3e4eafe3

See more details on using hashes here.

File details

Details for the file sciforma-0.2.0-cp311-cp311-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for sciforma-0.2.0-cp311-cp311-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 625b3209b53b3805ae19fc452f2e8aac448ae72a4b5ce393ac6bad3c7cebc4eb
MD5 70d09c1886718d5c73860b574f172531
BLAKE2b-256 fa8ce7dc5a70885d8527966c112bedcfa2a1aef02a9417b8214ea8236fb907cd

See more details on using hashes here.

File details

Details for the file sciforma-0.2.0-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for sciforma-0.2.0-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 c6f79d6693ea84d43c058389993e72f8f704b7ab3902095562da6106da52587e
MD5 3b2424c7c42adafe8c21936a5986f069
BLAKE2b-256 58643ddca918f35cd63086aff3f77a0afa5d4c7626e68c6d2be99b1888f304bb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page