Skip to main content

Library for computing molecular fingerprint based similarities as well as dimensionality reduction based chemical space visualizations.

Project description

GitHub License PyPI GitHub Actions Workflow Status Powered by RDKit

chemap - Mapping chemical space

Library for computing molecular fingerprint based similarities as well as dimensionality reduction based chemical space visualizations.

Fingerprint computations

Fingerprints can be computed using generators from RDKit or scikit-fingerprints. Here a code example:

import numpy as np
import scipy.sparse as sp
from rdkit.Chem import rdFingerprintGenerator
from skfp.fingerprints import MAPFingerprint, AtomPairFingerprint

from chemap import compute_fingerprints, DatasetLoader, FingerprintConfig


ds_loader = DatasetLoader()
smiles = ds_loader.load("tests/data/smiles.csv")

# ----------------------------
# RDKit: Morgan (folded, dense)
# ----------------------------
morgan = rdFingerprintGenerator.GetMorganGenerator(radius=3, fpSize=4096)
X_morgan = compute_fingerprints(
    smiles,
    morgan,
    config=FingerprintConfig(
        count=False,
        folded=True,
        return_csr=False,   # dense numpy
        invalid_policy="raise",
    ),
)
print("RDKit Morgan:", X_morgan.shape, X_morgan.dtype)

# -----------------------------------
# RDKit: RDKitFP (folded, CSR sparse)
# -----------------------------------
rdkitfp = rdFingerprintGenerator.GetRDKitFPGenerator(fpSize=4096)
X_rdkitfp_csr = compute_fingerprints(
    smiles,
    rdkitfp,
    config=FingerprintConfig(
        count=False,
        folded=True,
        return_csr=True,    # SciPy CSR
        invalid_policy="raise",
    ),
)
assert sp.issparse(X_rdkitfp_csr)
print("RDKit RDKitFP (CSR):", X_rdkitfp_csr.shape, X_rdkitfp_csr.dtype, "nnz=", X_rdkitfp_csr.nnz)

# --------------------------------------------------
# scikit-fingerprints: MAPFingerprint (folded, dense)
# --------------------------------------------------
# MAPFingerprint is a MinHash-like fingerprint (different from MAP4 lib).
map_fp = MAPFingerprint(fp_size=4096, count=False, sparse=False)
X_map = compute_fingerprints(
    smiles,
    map_fp,
    config=FingerprintConfig(
        count=False,
        folded=True,
        return_csr=False,
        invalid_policy="raise",
    ),
)
print("skfp MAPFingerprint:", X_map.shape, X_map.dtype)

# ----------------------------------------------------
# scikit-fingerprints: AtomPairFingerprint (folded, CSR)
# ----------------------------------------------------
atom_pair = AtomPairFingerprint(fp_size=4096, count=False, sparse=False, use_3D=False)
X_ap_csr = compute_fingerprints(
    smiles,
    atom_pair,
    config=FingerprintConfig(
        count=False,
        folded=True,
        return_csr=True,
        invalid_policy="raise",
    ),
)
assert sp.issparse(X_ap_csr)
print("skfp AtomPair (CSR):", X_ap_csr.shape, X_ap_csr.dtype, "nnz=", X_ap_csr.nnz)

# (Optional) convert CSR -> dense if you need a NumPy array downstream:
X_ap = X_ap_csr.toarray().astype(np.float32, copy=False)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chemap-0.2.1.tar.gz (27.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

chemap-0.2.1-py3-none-any.whl (29.8 kB view details)

Uploaded Python 3

File details

Details for the file chemap-0.2.1.tar.gz.

File metadata

  • Download URL: chemap-0.2.1.tar.gz
  • Upload date:
  • Size: 27.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for chemap-0.2.1.tar.gz
Algorithm Hash digest
SHA256 12a887568451ff43884f299a34ec818318551c2ba700c9e38fea896268a73af3
MD5 f09d5b86591729c607784d65fabf5e15
BLAKE2b-256 742b9447276d5e1c0f7efa689714a46468db44384216b8dcee5d60fdd3e0ba71

See more details on using hashes here.

Provenance

The following attestation bundles were made for chemap-0.2.1.tar.gz:

Publisher: CI_publish.yaml on matchms/chemap

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file chemap-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: chemap-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 29.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for chemap-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 c5561b108ebcb33ddd09f77b6f2f864e7346584cda899b674c39c39d5a7bfb11
MD5 e1111d1060968bc88175f010dcfee01a
BLAKE2b-256 bb9d49b564e4ade2a268f7018a52b15b8b0a99a770fc12723230339f2ad9c5b3

See more details on using hashes here.

Provenance

The following attestation bundles were made for chemap-0.2.1-py3-none-any.whl:

Publisher: CI_publish.yaml on matchms/chemap

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page