A comprehensive CLI tool for RDKit cheminformatics operations
Project description
rdkit-cli
A high-performance CLI for cheminformatics workflows, powered by native RDKit (C++ under the hood).
32 commands | 5 I/O formats (CSV, TSV, SMI, SDF, Parquet) | multi-core parallel processing | ~80ms startup
Installation
pip install rdkit-cli
Quick Start
# Quick molecule info — no files needed
rdkit-cli info "c1ccccc1"
# Compute descriptors
rdkit-cli descriptors compute -i molecules.csv -o desc.csv -d MolWt,MolLogP,TPSA
# Filter by drug-likeness
rdkit-cli filter druglike -i molecules.csv -o filtered.csv --rule lipinski
# Similarity search
rdkit-cli similarity search -i library.csv -o hits.csv --query "c1ccccc1" --threshold 0.7
# Standardize structures
rdkit-cli standardize -i molecules.csv -o std.csv --cleanup --neutralize
Commands
Usage: rdkit-cli [-h] [-V] <command> ...
Commands:
align Align 3D molecules to a reference
conformers Generate and optimize 3D conformers
convert Convert between molecular file formats
deduplicate Remove duplicate molecules
depict Generate molecular depictions (SVG/PNG)
descriptors Compute molecular descriptors
diversity Analyze and select diverse molecules
energy Force field energy calculations
enumerate Enumerate stereoisomers and tautomers
filter Filter by substructure, properties, drug-likeness, PAINS
fingerprints Compute fingerprints (Morgan, MACCS, RDKit, AtomPair, Torsion)
fragment BRICS/RECAP fragmentation and functional groups
info Quick molecule information from SMILES
mcs Find Maximum Common Substructure
merge Merge multiple molecule files
mmp Matched Molecular Pairs analysis
pharmacophore Pharmacophore feature analysis
props Property column operations (add, rename, drop, keep)
protonate Enumerate protonation states
reactions Apply SMIRKS transformations and enumerate products
rgroup R-group decomposition around a core
rings Ring system analysis and extraction
rmsd Calculate RMSD between 3D structures
sample Randomly sample molecules (reservoir sampling supported)
sascorer Synthetic accessibility, QED, and NP-likeness scores
scaffold Extract Murcko scaffolds
similarity Search, matrix, and clustering
split Split files into smaller chunks
standardize Standardize and canonicalize molecules
stats Calculate dataset statistics
stereo Analyze and manipulate stereochemistry
validate Validate molecular structures
Use 'rdkit-cli <command> --help' for command-specific options.
Global Options
| Option | Description |
|---|---|
-i, --input FILE |
Input file |
-o, --output FILE |
Output file |
-n, --ncpu N |
Number of CPUs (-1 = all, default: 1; auto-scales for heavy commands) |
--smiles-column COL |
SMILES column name (default: "smiles") |
--name-column COL |
Name column (optional) |
--no-header |
Input has no header row |
-q, --quiet |
Suppress progress output |
Example Pipeline
# Validate → deduplicate → standardize → filter → describe → pick diverse subset
rdkit-cli validate -i raw.csv -o valid.csv --valid-only
rdkit-cli deduplicate -i valid.csv -o unique.csv -b inchikey
rdkit-cli standardize -i unique.csv -o std.csv --cleanup --neutralize
rdkit-cli filter druglike -i std.csv -o druglike.csv --rule lipinski
rdkit-cli descriptors compute -i druglike.csv -o desc.csv -d MolWt,MolLogP,TPSA,HBD,HBA
rdkit-cli diversity pick -i druglike.csv -o diverse.csv -k 500
Formats
| Format | Extension |
|---|---|
| CSV | .csv |
| TSV | .tsv |
| SMILES | .smi |
| SDF | .sdf |
| Parquet | .parquet |
Formats are auto-detected from file extensions. Override with --in-format / --out-format.
Performance
- Native RDKit: C++ computation with Python bindings — no performance penalty
- Smart parallelism: defaults to single-threaded for fast commands (avoids IPC overhead), auto-scales to all cores for heavy workloads (
descriptors --all). Override with-n -1 - Lazy imports: ~80ms startup time regardless of installed packages
- Streaming: Memory-efficient reservoir sampling for large datasets
Benchmarks — 27K molecules, Apple M-series (8 cores):
| Command | Time | Throughput |
|---|---|---|
fingerprints compute --type morgan |
3.1s | ~8,700 mol/s |
descriptors compute -d MolWt,MolLogP,TPSA |
6.4s | ~4,200 mol/s |
filter druglike --rule lipinski |
6.9s | ~3,900 mol/s |
standardize --cleanup --uncharge |
7.0s | ~3,900 mol/s |
descriptors compute --all (auto-parallel) |
55s | ~490 mol/s |
Development
git clone https://github.com/vitruves/rdkit-cli
cd rdkit-cli
uv sync --dev
uv run pytest
License
Apache 2.0
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file rdkit_cli-0.3.2.tar.gz.
File metadata
- Download URL: rdkit_cli-0.3.2.tar.gz
- Upload date:
- Size: 130.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a27c998c2fdef56082ccade4f896d87e47039efa4200804d10410534b1c971ea
|
|
| MD5 |
3d6dfaa5d6f05d235ba2e664492e50fe
|
|
| BLAKE2b-256 |
a4e42e1265aadb01f3208f3e2fe5931b31807fdab2e6422961252eee45ac6c48
|
Provenance
The following attestation bundles were made for rdkit_cli-0.3.2.tar.gz:
Publisher:
publish.yml on Vitruves/rdkit-cli
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
rdkit_cli-0.3.2.tar.gz -
Subject digest:
a27c998c2fdef56082ccade4f896d87e47039efa4200804d10410534b1c971ea - Sigstore transparency entry: 1224902754
- Sigstore integration time:
-
Permalink:
Vitruves/rdkit-cli@c79491a6cb8e4b2390b62c1a685e9971b4860491 -
Branch / Tag:
refs/tags/v0.3.2 - Owner: https://github.com/Vitruves
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@c79491a6cb8e4b2390b62c1a685e9971b4860491 -
Trigger Event:
release
-
Statement type:
File details
Details for the file rdkit_cli-0.3.2-py3-none-any.whl.
File metadata
- Download URL: rdkit_cli-0.3.2-py3-none-any.whl
- Upload date:
- Size: 138.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d969ff7f811aba54f151ea520dba21c7aa36ee535790bf561382ef83122f87e6
|
|
| MD5 |
38b1b1af694bcd82e4876bedbf311075
|
|
| BLAKE2b-256 |
f955126d2c32dadb3a496125f65ae23af8c1e2703cdb730ad67081d5a175846b
|
Provenance
The following attestation bundles were made for rdkit_cli-0.3.2-py3-none-any.whl:
Publisher:
publish.yml on Vitruves/rdkit-cli
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
rdkit_cli-0.3.2-py3-none-any.whl -
Subject digest:
d969ff7f811aba54f151ea520dba21c7aa36ee535790bf561382ef83122f87e6 - Sigstore transparency entry: 1224903004
- Sigstore integration time:
-
Permalink:
Vitruves/rdkit-cli@c79491a6cb8e4b2390b62c1a685e9971b4860491 -
Branch / Tag:
refs/tags/v0.3.2 - Owner: https://github.com/Vitruves
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@c79491a6cb8e4b2390b62c1a685e9971b4860491 -
Trigger Event:
release
-
Statement type: