Skip to main content

A comprehensive CLI tool for RDKit cheminformatics operations

Project description

rdkit-cli

A comprehensive, high-performance CLI tool wrapping RDKit functionality for cheminformatics workflows.

Features

  • 19 Command Categories: descriptors, fingerprints, filter, convert, standardize, similarity, conformers, reactions, scaffold, enumerate, fragment, diversity, mcs, depict, stats, split, sample, deduplicate, validate
  • Multiple Input/Output Formats: CSV, TSV, SMI, SDF, Parquet
  • Parallel Processing: Efficient multi-core support via ProcessPoolExecutor
  • Ninja-style Progress: Real-time progress display with speed and ETA

Installation

pip install rdkit-cli

Or with uv:

uv add rdkit-cli

Quick Start

# Compute molecular descriptors
rdkit-cli descriptors compute -i molecules.csv -o descriptors.csv -d MolWt,MolLogP,TPSA

# Generate fingerprints
rdkit-cli fingerprints compute -i molecules.csv -o fingerprints.csv --type morgan

# Filter by drug-likeness
rdkit-cli filter druglike -i molecules.csv -o filtered.csv --rule lipinski

# Standardize molecules
rdkit-cli standardize -i molecules.csv -o standardized.csv --cleanup --neutralize

# Similarity search
rdkit-cli similarity search -i library.csv -o hits.csv --query "c1ccccc1" --threshold 0.7

Commands

descriptors

Compute molecular descriptors.

# List available descriptors
rdkit-cli descriptors list
rdkit-cli descriptors list --all

# Compute specific descriptors
rdkit-cli descriptors compute -i input.csv -o output.csv -d MolWt,MolLogP,TPSA

# Compute all descriptors
rdkit-cli descriptors compute -i input.csv -o output.csv --all

fingerprints

Generate molecular fingerprints.

# List available fingerprint types
rdkit-cli fingerprints list

# Compute Morgan fingerprints (default)
rdkit-cli fingerprints compute -i input.csv -o output.csv --type morgan

# With options
rdkit-cli fingerprints compute -i input.csv -o output.csv \
    --type morgan --radius 3 --bits 4096 --use-chirality

Supported types: morgan, maccs, rdkit, atompair, torsion, pattern

filter

Filter molecules by various criteria.

# Substructure filter
rdkit-cli filter substructure -i input.csv -o output.csv --smarts "c1ccccc1"
rdkit-cli filter substructure -i input.csv -o output.csv --smarts "c1ccccc1" --exclude

# Property filter
rdkit-cli filter property -i input.csv -o output.csv --rule "MolWt < 500"

# Drug-likeness filters
rdkit-cli filter druglike -i input.csv -o output.csv --rule lipinski
rdkit-cli filter druglike -i input.csv -o output.csv --rule veber
rdkit-cli filter druglike -i input.csv -o output.csv --rule ghose

# PAINS filter
rdkit-cli filter pains -i input.csv -o output.csv

convert

Convert between molecular file formats.

# Auto-detect formats from extensions
rdkit-cli convert -i molecules.csv -o molecules.sdf

# Explicit format specification
rdkit-cli convert -i molecules.csv -o molecules.smi --out-format smi

Supported formats: csv, tsv, smi, sdf, parquet

standardize

Standardize and canonicalize molecules.

# Basic standardization
rdkit-cli standardize -i input.csv -o output.csv

# With options
rdkit-cli standardize -i input.csv -o output.csv \
    --cleanup --neutralize --fragment-parent

similarity

Compute molecular similarity.

# Similarity search
rdkit-cli similarity search -i library.csv -o hits.csv \
    --query "CCO" --threshold 0.7

# Similarity matrix
rdkit-cli similarity matrix -i molecules.csv -o matrix.csv \
    --metric tanimoto

# Clustering
rdkit-cli similarity cluster -i molecules.csv -o clustered.csv \
    --cutoff 0.5

conformers

Generate and optimize 3D conformers.

# Generate conformers
rdkit-cli conformers generate -i input.csv -o output.sdf --num 10

# Optimize conformers
rdkit-cli conformers optimize -i input.sdf -o optimized.sdf --force-field mmff

reactions

Apply chemical reactions and transformations.

# SMIRKS transformation
rdkit-cli reactions transform -i input.csv -o output.csv \
    --smirks "[OH:1]>>[O-:1]"

# Reaction enumeration
rdkit-cli reactions enumerate -i reactants.csv -o products.csv \
    --template "reaction.rxn"

scaffold

Extract molecular scaffolds.

# Murcko scaffolds
rdkit-cli scaffold murcko -i input.csv -o scaffolds.csv

# Generic scaffolds
rdkit-cli scaffold murcko -i input.csv -o scaffolds.csv --generic

# Scaffold decomposition
rdkit-cli scaffold decompose -i input.csv -o decomposed.csv

enumerate

Enumerate molecular variants.

# Stereoisomers
rdkit-cli enumerate stereoisomers -i input.csv -o isomers.csv --max-isomers 32

# Tautomers
rdkit-cli enumerate tautomers -i input.csv -o tautomers.csv --max-tautomers 50

# Canonical tautomer
rdkit-cli enumerate canonical-tautomer -i input.csv -o canonical.csv

fragment

Fragment molecules.

# BRICS fragmentation
rdkit-cli fragment brics -i input.csv -o fragments.csv

# RECAP fragmentation
rdkit-cli fragment recap -i input.csv -o fragments.csv

# Functional group extraction
rdkit-cli fragment functional-groups -i input.csv -o groups.csv

# Fragment frequency analysis
rdkit-cli fragment analyze -i fragments.csv -o analysis.csv

diversity

Analyze and select diverse molecules.

# Pick diverse subset
rdkit-cli diversity pick -i input.csv -o diverse.csv -k 100

# Analyze diversity
rdkit-cli diversity analyze -i input.csv

mcs

Find Maximum Common Substructure.

# Find MCS across molecules
rdkit-cli mcs -i molecules.csv -o mcs_result.csv

# With options
rdkit-cli mcs -i molecules.csv -o mcs_result.csv \
    --timeout 60 --atom-compare elements

depict

Generate molecular depictions.

# Single molecule
rdkit-cli depict single --smiles "c1ccccc1" -o benzene.svg

# Batch depiction
rdkit-cli depict batch -i molecules.csv -o images/ -f svg

# Grid image
rdkit-cli depict grid -i molecules.csv -o grid.svg --mols-per-row 4

stats

Calculate dataset statistics.

# Basic statistics
rdkit-cli stats -i molecules.csv -o stats.json --format json

# Specific properties
rdkit-cli stats -i molecules.csv -p MolWt,LogP,TPSA

# List available properties
rdkit-cli stats -i molecules.csv --list-properties

split

Split files into smaller chunks.

# Split into N files
rdkit-cli split -i large.csv -o chunks/ -c 10

# Split by chunk size
rdkit-cli split -i large.csv -o chunks/ -s 1000

# With custom prefix
rdkit-cli split -i large.csv -o chunks/ -c 5 --prefix molecules

sample

Randomly sample molecules.

# Sample by count
rdkit-cli sample -i molecules.csv -o sample.csv -k 100 --seed 42

# Sample by fraction
rdkit-cli sample -i molecules.csv -o sample.csv -f 0.1

# Memory-efficient streaming (reservoir sampling)
rdkit-cli sample -i huge.csv -o sample.csv -k 1000 --stream

deduplicate

Remove duplicate molecules.

# Deduplicate by canonical SMILES (default)
rdkit-cli deduplicate -i molecules.csv -o unique.csv

# Deduplicate by InChIKey
rdkit-cli deduplicate -i molecules.csv -o unique.csv -b inchikey

# Deduplicate by scaffold
rdkit-cli deduplicate -i molecules.csv -o unique.csv -b scaffold

# Keep last occurrence instead of first
rdkit-cli deduplicate -i molecules.csv -o unique.csv --keep last

validate

Validate molecular structures.

# Basic validation
rdkit-cli validate -i molecules.csv -o validated.csv

# Output only valid molecules
rdkit-cli validate -i molecules.csv -o valid.csv --valid-only

# With constraints
rdkit-cli validate -i molecules.csv -o validated.csv \
    --max-atoms 100 --max-rings 8

# Check allowed elements
rdkit-cli validate -i molecules.csv -o validated.csv \
    --allowed-elements C,H,N,O,S,F,Cl

# Check stereo and show summary
rdkit-cli validate -i molecules.csv -o validated.csv \
    --check-stereo --summary

Global Options

Option Description
-n, --ncpu N Number of CPUs (-1 = all, default: -1)
-i, --input FILE Input file
-o, --output FILE Output file
--smiles-column COL SMILES column name (default: "smiles")
--name-column COL Name column (optional)
--no-header Input has no header row
-q, --quiet Suppress progress output
-V, --version Show version
-h, --help Show help

Input/Output Formats

Format Extension Notes
CSV .csv Comma-separated, with header
TSV .tsv Tab-separated, with header
SMI .smi SMILES format, space-separated
SDF .sdf Structure-Data File
Parquet .parquet Apache Parquet format

Examples

Cheminformatics Pipeline

# 1. Validate and filter input
rdkit-cli validate -i raw.csv -o validated.csv --valid-only

# 2. Deduplicate
rdkit-cli deduplicate -i validated.csv -o unique.csv -b inchikey

# 3. Standardize molecules
rdkit-cli standardize -i unique.csv -o std.csv --cleanup --neutralize

# 4. Filter by drug-likeness
rdkit-cli filter druglike -i std.csv -o druglike.csv --rule lipinski

# 5. Compute descriptors
rdkit-cli descriptors compute -i druglike.csv -o desc.csv -d MolWt,MolLogP,TPSA,HBD,HBA

# 6. Get dataset statistics
rdkit-cli stats -i druglike.csv -o stats.json --format json

# 7. Select diverse subset
rdkit-cli diversity pick -i druglike.csv -o diverse.csv -k 500

# 8. Generate depictions
rdkit-cli depict grid -i diverse.csv -o library.svg --mols-per-row 10

Similarity Screening

# Search for similar compounds
rdkit-cli similarity search -i library.csv -o hits.csv \
    --query "CC(=O)Oc1ccccc1C(=O)O" \
    --threshold 0.6 \
    --type morgan

# Cluster results
rdkit-cli similarity cluster -i hits.csv -o clustered.csv --cutoff 0.4

Scaffold Analysis

# Extract scaffolds
rdkit-cli scaffold murcko -i library.csv -o scaffolds.csv

# Analyze scaffold diversity
rdkit-cli diversity analyze -i scaffolds.csv --smiles-column scaffold

Large Dataset Processing

# Sample from a huge dataset
rdkit-cli sample -i huge_library.csv -o sample.csv -k 10000 --stream

# Split for parallel processing
rdkit-cli split -i library.csv -o batches/ -c 10

# Process batches in parallel (using xargs)
ls batches/*.csv | xargs -P 4 -I {} rdkit-cli descriptors compute -i {} -o {}.desc.csv -d MolWt,LogP

Development

# Clone repository
git clone https://github.com/vitruves/rdkit-cli
cd rdkit-cli

# Install with dev dependencies
uv sync --dev

# Run tests
uv run pytest

# Run with coverage
uv run pytest --cov=rdkit_cli

License

Apache 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rdkit_cli-0.2.0.tar.gz (78.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rdkit_cli-0.2.0-py3-none-any.whl (89.4 kB view details)

Uploaded Python 3

File details

Details for the file rdkit_cli-0.2.0.tar.gz.

File metadata

  • Download URL: rdkit_cli-0.2.0.tar.gz
  • Upload date:
  • Size: 78.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for rdkit_cli-0.2.0.tar.gz
Algorithm Hash digest
SHA256 3c0cdcd59bddd5e809f415bc60a7ec0080d1a614cee725ce0aa85cbde21a91f6
MD5 5d56aa6f48cb94a8d02d465c1a058ed6
BLAKE2b-256 d262e2e5de56b1e720ec88ef44de43e1e454d7f29efe5280176d9e05c3863874

See more details on using hashes here.

Provenance

The following attestation bundles were made for rdkit_cli-0.2.0.tar.gz:

Publisher: publish.yml on Vitruves/rdkit-cli

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rdkit_cli-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: rdkit_cli-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 89.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for rdkit_cli-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8acffc8f7fb57d65b210e00c7d0ad86c7859f0c7a8021a84213015007b4218f2
MD5 63ddae1c9c988035298d22115a9e589b
BLAKE2b-256 bf0f5fe19c44979a84dfa03111781c1299b2767ec40ab0f40cea70c6bd9c227e

See more details on using hashes here.

Provenance

The following attestation bundles were made for rdkit_cli-0.2.0-py3-none-any.whl:

Publisher: publish.yml on Vitruves/rdkit-cli

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page