Skip to main content

A comprehensive CLI tool for RDKit cheminformatics operations

Project description

rdkit-cli

PyPI version Python versions PyPI Downloads License

A high-performance CLI for cheminformatics workflows, powered by native RDKit (C++ under the hood).

32 commands | 5 I/O formats (CSV, TSV, SMI, SDF, Parquet) | multi-core parallel processing | ~80ms startup

Installation

pip install rdkit-cli

Quick Start

# Quick molecule info — no files needed
rdkit-cli info "c1ccccc1"

# Compute descriptors
rdkit-cli descriptors compute -i molecules.csv -o desc.csv -d MolWt,MolLogP,TPSA

# Filter by drug-likeness
rdkit-cli filter druglike -i molecules.csv -o filtered.csv --rule lipinski

# Similarity search
rdkit-cli similarity search -i library.csv -o hits.csv --query "c1ccccc1" --threshold 0.7

# Standardize structures
rdkit-cli standardize -i molecules.csv -o std.csv --cleanup --neutralize

Commands

Usage: rdkit-cli [-h] [-V] <command> ...

Commands:
    align          Align 3D molecules to a reference
    conformers     Generate and optimize 3D conformers
    convert        Convert between molecular file formats
    deduplicate    Remove duplicate molecules
    depict         Generate molecular depictions (SVG/PNG)
    descriptors    Compute molecular descriptors
    diversity      Analyze and select diverse molecules
    energy         Force field energy calculations
    enumerate      Enumerate stereoisomers and tautomers
    filter         Filter by substructure, properties, drug-likeness, PAINS
    fingerprints   Compute fingerprints (Morgan, MACCS, RDKit, AtomPair, Torsion)
    fragment       BRICS/RECAP fragmentation and functional groups
    info           Quick molecule information from SMILES
    mcs            Find Maximum Common Substructure
    merge          Merge multiple molecule files
    mmp            Matched Molecular Pairs analysis
    pharmacophore  Pharmacophore feature analysis
    props          Property column operations (add, rename, drop, keep)
    protonate      Enumerate protonation states
    reactions      Apply SMIRKS transformations and enumerate products
    rgroup         R-group decomposition around a core
    rings          Ring system analysis and extraction
    rmsd           Calculate RMSD between 3D structures
    sample         Randomly sample molecules (reservoir sampling supported)
    sascorer       Synthetic accessibility, QED, and NP-likeness scores
    scaffold       Extract Murcko scaffolds
    similarity     Search, matrix, and clustering
    split          Split files into smaller chunks
    standardize    Standardize and canonicalize molecules
    stats          Calculate dataset statistics
    stereo         Analyze and manipulate stereochemistry
    validate       Validate molecular structures

Use 'rdkit-cli <command> --help' for command-specific options.

Global Options

Option Description
-i, --input FILE Input file
-o, --output FILE Output file
-n, --ncpu N Number of CPUs (-1 = all, default: 1; auto-scales for heavy commands)
--smiles-column COL SMILES column name (default: "smiles")
--name-column COL Name column (optional)
--no-header Input has no header row
-q, --quiet Suppress progress output

Example Pipeline

# Validate → deduplicate → standardize → filter → describe → pick diverse subset
rdkit-cli validate -i raw.csv -o valid.csv --valid-only
rdkit-cli deduplicate -i valid.csv -o unique.csv -b inchikey
rdkit-cli standardize -i unique.csv -o std.csv --cleanup --neutralize
rdkit-cli filter druglike -i std.csv -o druglike.csv --rule lipinski
rdkit-cli descriptors compute -i druglike.csv -o desc.csv -d MolWt,MolLogP,TPSA,HBD,HBA
rdkit-cli diversity pick -i druglike.csv -o diverse.csv -k 500

Formats

Format Extension
CSV .csv
TSV .tsv
SMILES .smi
SDF .sdf
Parquet .parquet

Formats are auto-detected from file extensions. Override with --in-format / --out-format.

Performance

  • Native RDKit: C++ computation with Python bindings — no performance penalty
  • Smart parallelism: defaults to single-threaded for fast commands (avoids IPC overhead), auto-scales to all cores for heavy workloads (descriptors --all). Override with -n -1
  • Lazy imports: ~80ms startup time regardless of installed packages
  • Streaming: Memory-efficient reservoir sampling for large datasets

Benchmarks — 27K molecules, Apple M-series (8 cores):

Command Time Throughput
fingerprints compute --type morgan 3.1s ~8,700 mol/s
descriptors compute -d MolWt,MolLogP,TPSA 6.4s ~4,200 mol/s
filter druglike --rule lipinski 6.9s ~3,900 mol/s
standardize --cleanup --uncharge 7.0s ~3,900 mol/s
descriptors compute --all (auto-parallel) 55s ~490 mol/s

Development

git clone https://github.com/vitruves/rdkit-cli
cd rdkit-cli
uv sync --dev
uv run pytest

License

Apache 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rdkit_cli-0.3.2.tar.gz (130.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rdkit_cli-0.3.2-py3-none-any.whl (138.1 kB view details)

Uploaded Python 3

File details

Details for the file rdkit_cli-0.3.2.tar.gz.

File metadata

  • Download URL: rdkit_cli-0.3.2.tar.gz
  • Upload date:
  • Size: 130.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for rdkit_cli-0.3.2.tar.gz
Algorithm Hash digest
SHA256 a27c998c2fdef56082ccade4f896d87e47039efa4200804d10410534b1c971ea
MD5 3d6dfaa5d6f05d235ba2e664492e50fe
BLAKE2b-256 a4e42e1265aadb01f3208f3e2fe5931b31807fdab2e6422961252eee45ac6c48

See more details on using hashes here.

Provenance

The following attestation bundles were made for rdkit_cli-0.3.2.tar.gz:

Publisher: publish.yml on Vitruves/rdkit-cli

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rdkit_cli-0.3.2-py3-none-any.whl.

File metadata

  • Download URL: rdkit_cli-0.3.2-py3-none-any.whl
  • Upload date:
  • Size: 138.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for rdkit_cli-0.3.2-py3-none-any.whl
Algorithm Hash digest
SHA256 d969ff7f811aba54f151ea520dba21c7aa36ee535790bf561382ef83122f87e6
MD5 38b1b1af694bcd82e4876bedbf311075
BLAKE2b-256 f955126d2c32dadb3a496125f65ae23af8c1e2703cdb730ad67081d5a175846b

See more details on using hashes here.

Provenance

The following attestation bundles were made for rdkit_cli-0.3.2-py3-none-any.whl:

Publisher: publish.yml on Vitruves/rdkit-cli

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page