Skip to main content

A comprehensive CLI tool for RDKit cheminformatics operations

Project description

rdkit-cli

A comprehensive, high-performance CLI tool wrapping RDKit functionality for cheminformatics workflows.

Features

  • 14 Command Categories: descriptors, fingerprints, filter, convert, standardize, similarity, conformers, reactions, scaffold, enumerate, fragment, diversity, mcs, depict
  • Multiple Input/Output Formats: CSV, TSV, SMI, SDF, Parquet
  • Parallel Processing: Efficient multi-core support via ProcessPoolExecutor
  • Ninja-style Progress: Real-time progress display with speed and ETA

Installation

pip install rdkit-cli

Or with uv:

uv add rdkit-cli

Quick Start

# Compute molecular descriptors
rdkit-cli descriptors compute -i molecules.csv -o descriptors.csv -d MolWt,MolLogP,TPSA

# Generate fingerprints
rdkit-cli fingerprints compute -i molecules.csv -o fingerprints.csv --type morgan

# Filter by drug-likeness
rdkit-cli filter druglike -i molecules.csv -o filtered.csv --rule lipinski

# Standardize molecules
rdkit-cli standardize -i molecules.csv -o standardized.csv --cleanup --neutralize

# Similarity search
rdkit-cli similarity search -i library.csv -o hits.csv --query "c1ccccc1" --threshold 0.7

Commands

descriptors

Compute molecular descriptors.

# List available descriptors
rdkit-cli descriptors list
rdkit-cli descriptors list --all

# Compute specific descriptors
rdkit-cli descriptors compute -i input.csv -o output.csv -d MolWt,MolLogP,TPSA

# Compute all descriptors
rdkit-cli descriptors compute -i input.csv -o output.csv --all

fingerprints

Generate molecular fingerprints.

# List available fingerprint types
rdkit-cli fingerprints list

# Compute Morgan fingerprints (default)
rdkit-cli fingerprints compute -i input.csv -o output.csv --type morgan

# With options
rdkit-cli fingerprints compute -i input.csv -o output.csv \
    --type morgan --radius 3 --bits 4096 --use-chirality

Supported types: morgan, maccs, rdkit, atompair, torsion, pattern

filter

Filter molecules by various criteria.

# Substructure filter
rdkit-cli filter substructure -i input.csv -o output.csv --smarts "c1ccccc1"
rdkit-cli filter substructure -i input.csv -o output.csv --smarts "c1ccccc1" --exclude

# Property filter
rdkit-cli filter property -i input.csv -o output.csv --rule "MolWt < 500"

# Drug-likeness filters
rdkit-cli filter druglike -i input.csv -o output.csv --rule lipinski
rdkit-cli filter druglike -i input.csv -o output.csv --rule veber
rdkit-cli filter druglike -i input.csv -o output.csv --rule ghose

# PAINS filter
rdkit-cli filter pains -i input.csv -o output.csv

convert

Convert between molecular file formats.

# Auto-detect formats from extensions
rdkit-cli convert -i molecules.csv -o molecules.sdf

# Explicit format specification
rdkit-cli convert -i molecules.csv -o molecules.smi --out-format smi

Supported formats: csv, tsv, smi, sdf, parquet

standardize

Standardize and canonicalize molecules.

# Basic standardization
rdkit-cli standardize -i input.csv -o output.csv

# With options
rdkit-cli standardize -i input.csv -o output.csv \
    --cleanup --neutralize --fragment-parent

similarity

Compute molecular similarity.

# Similarity search
rdkit-cli similarity search -i library.csv -o hits.csv \
    --query "CCO" --threshold 0.7

# Similarity matrix
rdkit-cli similarity matrix -i molecules.csv -o matrix.csv \
    --metric tanimoto

# Clustering
rdkit-cli similarity cluster -i molecules.csv -o clustered.csv \
    --cutoff 0.5

conformers

Generate and optimize 3D conformers.

# Generate conformers
rdkit-cli conformers generate -i input.csv -o output.sdf --num 10

# Optimize conformers
rdkit-cli conformers optimize -i input.sdf -o optimized.sdf --force-field mmff

reactions

Apply chemical reactions and transformations.

# SMIRKS transformation
rdkit-cli reactions transform -i input.csv -o output.csv \
    --smirks "[OH:1]>>[O-:1]"

# Reaction enumeration
rdkit-cli reactions enumerate -i reactants.csv -o products.csv \
    --template "reaction.rxn"

scaffold

Extract molecular scaffolds.

# Murcko scaffolds
rdkit-cli scaffold murcko -i input.csv -o scaffolds.csv

# Generic scaffolds
rdkit-cli scaffold murcko -i input.csv -o scaffolds.csv --generic

# Scaffold decomposition
rdkit-cli scaffold decompose -i input.csv -o decomposed.csv

enumerate

Enumerate molecular variants.

# Stereoisomers
rdkit-cli enumerate stereoisomers -i input.csv -o isomers.csv --max-isomers 32

# Tautomers
rdkit-cli enumerate tautomers -i input.csv -o tautomers.csv --max-tautomers 50

# Canonical tautomer
rdkit-cli enumerate canonical-tautomer -i input.csv -o canonical.csv

fragment

Fragment molecules.

# BRICS fragmentation
rdkit-cli fragment brics -i input.csv -o fragments.csv

# RECAP fragmentation
rdkit-cli fragment recap -i input.csv -o fragments.csv

# Functional group extraction
rdkit-cli fragment functional-groups -i input.csv -o groups.csv

# Fragment frequency analysis
rdkit-cli fragment analyze -i fragments.csv -o analysis.csv

diversity

Analyze and select diverse molecules.

# Pick diverse subset
rdkit-cli diversity pick -i input.csv -o diverse.csv -k 100

# Analyze diversity
rdkit-cli diversity analyze -i input.csv

mcs

Find Maximum Common Substructure.

# Find MCS across molecules
rdkit-cli mcs -i molecules.csv -o mcs_result.csv

# With options
rdkit-cli mcs -i molecules.csv -o mcs_result.csv \
    --timeout 60 --atom-compare elements

depict

Generate molecular depictions.

# Single molecule
rdkit-cli depict single --smiles "c1ccccc1" -o benzene.svg

# Batch depiction
rdkit-cli depict batch -i molecules.csv -o images/ -f svg

# Grid image
rdkit-cli depict grid -i molecules.csv -o grid.svg --mols-per-row 4

Global Options

Option Description
-n, --ncpu N Number of CPUs (-1 = all, default: -1)
-i, --input FILE Input file
-o, --output FILE Output file
--smiles-column COL SMILES column name (default: "smiles")
--name-column COL Name column (optional)
--no-header Input has no header row
-q, --quiet Suppress progress output
-V, --version Show version
-h, --help Show help

Input/Output Formats

Format Extension Notes
CSV .csv Comma-separated, with header
TSV .tsv Tab-separated, with header
SMI .smi SMILES format, space-separated
SDF .sdf Structure-Data File
Parquet .parquet Apache Parquet format

Examples

Cheminformatics Pipeline

# 1. Standardize input molecules
rdkit-cli standardize -i raw.csv -o std.csv --cleanup --neutralize

# 2. Filter by drug-likeness
rdkit-cli filter druglike -i std.csv -o druglike.csv --rule lipinski

# 3. Compute descriptors
rdkit-cli descriptors compute -i druglike.csv -o desc.csv -d MolWt,MolLogP,TPSA,HBD,HBA

# 4. Select diverse subset
rdkit-cli diversity pick -i druglike.csv -o diverse.csv -k 500

# 5. Generate depictions
rdkit-cli depict grid -i diverse.csv -o library.svg --mols-per-row 10

Similarity Screening

# Search for similar compounds
rdkit-cli similarity search -i library.csv -o hits.csv \
    --query "CC(=O)Oc1ccccc1C(=O)O" \
    --threshold 0.6 \
    --type morgan

# Cluster results
rdkit-cli similarity cluster -i hits.csv -o clustered.csv --cutoff 0.4

Scaffold Analysis

# Extract scaffolds
rdkit-cli scaffold murcko -i library.csv -o scaffolds.csv

# Analyze scaffold diversity
rdkit-cli diversity analyze -i scaffolds.csv --smiles-column scaffold

Development

# Clone repository
git clone https://github.com/vitruves/rdkit-cli
cd rdkit-cli

# Install with dev dependencies
uv sync --dev

# Run tests
uv run pytest

# Run with coverage
uv run pytest --cov=rdkit_cli

License

Apache 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rdkit_cli-0.1.0.tar.gz (60.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rdkit_cli-0.1.0-py3-none-any.whl (70.5 kB view details)

Uploaded Python 3

File details

Details for the file rdkit_cli-0.1.0.tar.gz.

File metadata

  • Download URL: rdkit_cli-0.1.0.tar.gz
  • Upload date:
  • Size: 60.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for rdkit_cli-0.1.0.tar.gz
Algorithm Hash digest
SHA256 624de0e7fec0135368eb5d57ad6e9fb8b55bfdce5a9311bfbe46ac063b4188b5
MD5 355ba55851fb8cac29b47f842f8362f6
BLAKE2b-256 f2137a50de78a55815129853de2ac99723f16aa0ba74e1d4d7b16c80d3625f95

See more details on using hashes here.

Provenance

The following attestation bundles were made for rdkit_cli-0.1.0.tar.gz:

Publisher: publish.yml on Vitruves/rdkit-cli

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rdkit_cli-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: rdkit_cli-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 70.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for rdkit_cli-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 fff81ef092974d132d1c965efcf38b25a91a86a3f381614fd65cf54e8026b0a1
MD5 45cf7b864ca86d3f5598548f890e5cf9
BLAKE2b-256 0aafbd27e9c8e17e8fc4269a60f0c3b7c845c19636b76a27980eedb8864c4bca

See more details on using hashes here.

Provenance

The following attestation bundles were made for rdkit_cli-0.1.0-py3-none-any.whl:

Publisher: publish.yml on Vitruves/rdkit-cli

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page