Parallel fingerprint computation using the RDKit fingerprint generator API
Project description
rdkit-fp
Parallel RDKit fingerprints with a beginner one-liner and an advanced staged pipeline.
Install
pip install rdfp
Quickstart (10 seconds)
rdfp examples/chembl_1k.smi outputs/fps/
With explicit preset:
rdfp examples/chembl_1k.smi outputs/fps/ --preset ecfp4
This default command is equivalent to rdfp fps INPUT OUTPUT.
Common CLI usage
Simple run with explicit workers/chunk size:
rdfp examples/chembl_1k.smi outputs/fps/ --workers -1 --chunk 100000
Use IDs from a column and keep stable row mapping sidecars:
rdfp data/compounds.smi outputs/fps/ --smiles-col 0 --id-col 1
Resume a stopped large run:
rdfp data/compounds.smi outputs/fps/ --resume
Run built-in demo data:
rdfp demo
Presets
| Preset | Meaning |
|---|---|
ecfp4 (default) |
Morgan radius 2, 2048 bits |
ecfp6 |
Morgan radius 3, 2048 bits |
rdkit |
RDKit topological, 2048 bits |
ap |
Atom-pair, 2048 bits |
tt |
Topological torsion, 2048 bits |
pattern |
Pattern fingerprint, 2048 bits |
You can override preset values with explicit flags (--fp-type, --fp-size, --radius, --include-chirality).
Advanced lane
Stage 1 only (SMILES -> mols):
rdfp mols examples/chembl_1k.smi outputs/mols/
Stage 2 only (mols -> fingerprints):
rdfp fps-from-mols outputs/mols/ outputs/fps_from_mols/ --resume
Both stages with mol persistence:
rdfp fps examples/chembl_1k.smi outputs/fps/ --save-mols outputs/mols/
Helpful flags
--workers(alias of--n-jobs)--chunk(alias of--chunk-size)--preset ecfp4|ecfp6|rdkit|ap|tt|pattern--input-smiles-col(alias of--smiles-col)--id-col Ninclude IDs in.index.jsonsidecars--resumeskip existing chunk outputs--no-include-row-mappingdisable row/ID mapping sidecars and mapping metadata--format numpy|packed|pickle(pickledefault)
Python API
from pathlib import Path
import rdfp
smiles_path = Path("examples/chembl_1k.smi")
rdfp.smiles_to_fps_chunked(
rdfp.iter_smiles_records(smiles_path),
output_dir="outputs/fps_api/",
fp_type="morgan",
fp_size=2048,
radius=2,
fmt="pickle",
n_jobs=-1,
chunk_size=100_000,
include_row_mapping=True,
resume=True,
)
Output layout
pickle output (default):
outputs/fps/
fps_0000.pkl
fps_0000.index.json <- row_idx / id / valid per input row in chunk
fps_0001.pkl
fps_0001.index.json
metadata.json
Each chunk stores only valid fingerprints in fps; mapping metadata + sidecars preserve provenance and failed rows.
Compatibility and packaging
- Repo name stays
rdkit-fp. - Install/import/CLI are centered on
rdfp. rdkit_fpremains as a compatibility alias for existing code.
Release flow
- CI runs on pushes/PRs to
main. - PyPI publish runs from tags matching
v*(for examplev0.1.1).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file rdfp-0.1.0.tar.gz.
File metadata
- Download URL: rdfp-0.1.0.tar.gz
- Upload date:
- Size: 22.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4d286ce499b3a07ed2fe11c7824c3e5d8dd1d324655fbe645a3fd241cee07ece
|
|
| MD5 |
9b38476661b031aab0ae70972ebd126e
|
|
| BLAKE2b-256 |
bcc8abd9703772becf81622baa0372800aaf84c51a993bcc4959d03207a4a332
|
File details
Details for the file rdfp-0.1.0-py3-none-any.whl.
File metadata
- Download URL: rdfp-0.1.0-py3-none-any.whl
- Upload date:
- Size: 22.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ebb0b6086ce049c9bec2d33315c119ec20cd2e59755e15380b8a4f884e307111
|
|
| MD5 |
04b82374b9b3d1ff3971b50872a9fd85
|
|
| BLAKE2b-256 |
eba8632d90520e9a32f4d538b9c7b39af91bdfbb32882db9772209a8475e2ac4
|