Skip to main content

Parallel fingerprint computation using the RDKit fingerprint generator API

Project description

PyPI - Version CI

rdkit-fp

Parallel RDKit fingerprints with a beginner one-liner and an advanced staged pipeline.

Install

pip install rdfp

Quickstart (10 seconds)

rdfp examples/chembl_1k.smi outputs/fps/

With explicit preset:

rdfp examples/chembl_1k.smi outputs/fps/ --preset ecfp4

This default command is equivalent to rdfp fps INPUT OUTPUT.

Common CLI usage

Simple run with explicit workers/chunk size:

rdfp examples/chembl_1k.smi outputs/fps/ --workers -1 --chunk 100000

Use IDs from a column and keep stable row mapping sidecars:

rdfp data/compounds.smi outputs/fps/ --smiles-col 0 --id-col 1

Resume a stopped large run:

rdfp data/compounds.smi outputs/fps/ --resume

Run built-in demo data:

rdfp demo

Presets

Preset Meaning
ecfp4 (default) Morgan radius 2, 2048 bits
ecfp6 Morgan radius 3, 2048 bits
rdkit RDKit topological, 2048 bits
ap Atom-pair, 2048 bits
tt Topological torsion, 2048 bits
pattern Pattern fingerprint, 2048 bits

You can override preset values with explicit flags (--fp-type, --fp-size, --radius, --include-chirality).

Advanced lane

Stage 1 only (SMILES -> mols):

rdfp mols examples/chembl_1k.smi outputs/mols/

Stage 2 only (mols -> fingerprints):

rdfp fps-from-mols outputs/mols/ outputs/fps_from_mols/ --resume

Both stages with mol persistence:

rdfp fps examples/chembl_1k.smi outputs/fps/ --save-mols outputs/mols/

Helpful flags

  • --workers (alias of --n-jobs)
  • --chunk (alias of --chunk-size)
  • --preset ecfp4|ecfp6|rdkit|ap|tt|pattern
  • --input-smiles-col (alias of --smiles-col)
  • --id-col N include IDs in .index.json sidecars
  • --resume skip existing chunk outputs
  • --no-include-row-mapping disable row/ID mapping sidecars and mapping metadata
  • --format numpy|packed|pickle (pickle default)

Python API

from pathlib import Path
import rdfp

smiles_path = Path("examples/chembl_1k.smi")
rdfp.smiles_to_fps_chunked(
    rdfp.iter_smiles_records(smiles_path),
    output_dir="outputs/fps_api/",
    fp_type="morgan",
    fp_size=2048,
    radius=2,
    fmt="pickle",
    n_jobs=-1,
    chunk_size=100_000,
    include_row_mapping=True,
    resume=True,
)

Output layout

pickle output (default):

outputs/fps/
  fps_0000.pkl
  fps_0000.index.json   <- row_idx / id / valid per input row in chunk
  fps_0001.pkl
  fps_0001.index.json
  metadata.json

Each chunk stores only valid fingerprints in fps; mapping metadata + sidecars preserve provenance and failed rows.

Compatibility and packaging

  • Repo name stays rdkit-fp.
  • Install/import/CLI are centered on rdfp.
  • rdkit_fp remains as a compatibility alias for existing code.

Release flow

  • CI runs on pushes/PRs to main.
  • PyPI publish runs from tags matching v* (for example v0.1.1).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rdfp-0.1.0.tar.gz (22.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rdfp-0.1.0-py3-none-any.whl (22.9 kB view details)

Uploaded Python 3

File details

Details for the file rdfp-0.1.0.tar.gz.

File metadata

  • Download URL: rdfp-0.1.0.tar.gz
  • Upload date:
  • Size: 22.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for rdfp-0.1.0.tar.gz
Algorithm Hash digest
SHA256 4d286ce499b3a07ed2fe11c7824c3e5d8dd1d324655fbe645a3fd241cee07ece
MD5 9b38476661b031aab0ae70972ebd126e
BLAKE2b-256 bcc8abd9703772becf81622baa0372800aaf84c51a993bcc4959d03207a4a332

See more details on using hashes here.

File details

Details for the file rdfp-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: rdfp-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 22.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for rdfp-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ebb0b6086ce049c9bec2d33315c119ec20cd2e59755e15380b8a4f884e307111
MD5 04b82374b9b3d1ff3971b50872a9fd85
BLAKE2b-256 eba8632d90520e9a32f4d538b9c7b39af91bdfbb32882db9772209a8475e2ac4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page