Skip to main content

Library for computing molecular fingerprint based similarities as well as dimensionality reduction based chemical space visualizations.

Project description

GitHub License PyPI GitHub Actions Workflow Status Powered by RDKit

chemap - Chemical Mapping

Library for computing molecular fingerprint based similarities as well as dimensionality reduction based chemical space visualizations.

Fingerprint computations

Fingerprints can be computed using generators from RDKit or scikit-fingerprints. Here a code example:

import numpy as np
import scipy.sparse as sp
from rdkit.Chem import rdFingerprintGenerator
from skfp.fingerprints import MAPFingerprint, AtomPairFingerprint

from chemap import compute_fingerprints, DatasetLoader, FingerprintConfig


ds_loader = DatasetLoader()
smiles = ds_loader.load("tests/data/smiles.csv")

# ----------------------------
# RDKit: Morgan (folded, dense)
# ----------------------------
morgan = rdFingerprintGenerator.GetMorganGenerator(radius=3, fpSize=4096)
X_morgan = compute_fingerprints(
    smiles,
    morgan,
    config=FingerprintConfig(
        count=False,
        folded=True,
        return_csr=False,   # dense numpy
        invalid_policy="raise",
    ),
)
print("RDKit Morgan:", X_morgan.shape, X_morgan.dtype)

# -----------------------------------
# RDKit: RDKitFP (folded, CSR sparse)
# -----------------------------------
rdkitfp = rdFingerprintGenerator.GetRDKitFPGenerator(fpSize=4096)
X_rdkitfp_csr = compute_fingerprints(
    smiles,
    rdkitfp,
    config=FingerprintConfig(
        count=False,
        folded=True,
        return_csr=True,    # SciPy CSR
        invalid_policy="raise",
    ),
)
assert sp.issparse(X_rdkitfp_csr)
print("RDKit RDKitFP (CSR):", X_rdkitfp_csr.shape, X_rdkitfp_csr.dtype, "nnz=", X_rdkitfp_csr.nnz)

# --------------------------------------------------
# scikit-fingerprints: MAPFingerprint (folded, dense)
# --------------------------------------------------
# MAPFingerprint is a MinHash-like fingerprint (different from MAP4 lib).
map_fp = MAPFingerprint(fp_size=4096, count=False, sparse=False)
X_map = compute_fingerprints(
    smiles,
    map_fp,
    config=FingerprintConfig(
        count=False,
        folded=True,
        return_csr=False,
        invalid_policy="raise",
    ),
)
print("skfp MAPFingerprint:", X_map.shape, X_map.dtype)

# ----------------------------------------------------
# scikit-fingerprints: AtomPairFingerprint (folded, CSR)
# ----------------------------------------------------
atom_pair = AtomPairFingerprint(fp_size=4096, count=False, sparse=False, use_3D=False)
X_ap_csr = compute_fingerprints(
    smiles,
    atom_pair,
    config=FingerprintConfig(
        count=False,
        folded=True,
        return_csr=True,
        invalid_policy="raise",
    ),
)
assert sp.issparse(X_ap_csr)
print("skfp AtomPair (CSR):", X_ap_csr.shape, X_ap_csr.dtype, "nnz=", X_ap_csr.nnz)

# (Optional) convert CSR -> dense if you need a NumPy array downstream:
X_ap = X_ap_csr.toarray().astype(np.float32, copy=False)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chemap-0.2.0.tar.gz (27.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

chemap-0.2.0-py3-none-any.whl (29.7 kB view details)

Uploaded Python 3

File details

Details for the file chemap-0.2.0.tar.gz.

File metadata

  • Download URL: chemap-0.2.0.tar.gz
  • Upload date:
  • Size: 27.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for chemap-0.2.0.tar.gz
Algorithm Hash digest
SHA256 2dfd4f72861a76a73869b66c5c4a6e8f89acb4d56ed259505e9620fc047c251a
MD5 fc66c2551c27e999dc49dbd6ed6c759e
BLAKE2b-256 0df0800e81212ef560d065a985a327d4fd2b771b4aa6279ca7040f9e3f3f612e

See more details on using hashes here.

Provenance

The following attestation bundles were made for chemap-0.2.0.tar.gz:

Publisher: CI_publish.yaml on matchms/chemap

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file chemap-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: chemap-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 29.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for chemap-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 723919e49fcdbf858429ba2b2160360bfb39cd13125e89ecc50165fc361cb40a
MD5 826c1ad4e0badabd6c19d9c5baa6a2ca
BLAKE2b-256 df3585df9177e44a0aeb7f5ddd6d38e987491fa99e4556e23e28af13293f2142

See more details on using hashes here.

Provenance

The following attestation bundles were made for chemap-0.2.0-py3-none-any.whl:

Publisher: CI_publish.yaml on matchms/chemap

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page