Skip to main content

SMSD -- Substructure & MCS search for chemical graphs

Project description

SMSD Python Bindings

Python interface for SMSD (Small Molecule Substructure Detector) -- a high-performance library for substructure search, Maximum Common Substructure (MCS), and molecular similarity.

Features

  • SMILES parsing and writing -- built-in OpenSMILES parser, no RDKit or CDK required
  • Substructure search -- VF2++ subgraph isomorphism
  • MCS search -- McSplit with seed-and-extend, orbit pruning, coverage-driven termination
  • Tautomer-aware matching -- keto/enol, amide, imidazole tautomer equivalence
  • RASCAL screening -- O(V+E) Tanimoto-like similarity upper bound
  • Fingerprints -- path-based and MCS-aware fingerprints for pre-screening

Requirements

  • Python >= 3.8
  • C++17 compiler (GCC 7+, Clang 5+, MSVC 2019+)
  • CMake >= 3.15
  • pybind11 >= 2.12

Installation

From source (recommended)

cd python/
pip install .

Development install

pip install -e ".[dev]"

Build with specific compiler

CMAKE_ARGS="-DCMAKE_CXX_COMPILER=g++-13" pip install .

Quick Start

from smsd import parse_smiles, find_mcs, is_substructure, similarity

# Parse SMILES strings
benzene = parse_smiles("c1ccccc1")
phenol  = parse_smiles("c1ccc(O)cc1")

# Substructure search
assert is_substructure(benzene, phenol)  # benzene is in phenol

# Maximum Common Substructure
mcs = find_mcs(benzene, phenol)
print(f"MCS size: {len(mcs)} atoms")  # 6

# Similarity
sim = similarity(benzene, phenol)
print(f"Similarity: {sim:.3f}")  # ~0.857

API Reference

SMILES Parsing

from smsd import parse_smiles, to_smiles

mol = parse_smiles("c1ccccc1")
print(mol.n)              # 6 atoms
print(mol.atomic_num)     # [6, 6, 6, 6, 6, 6]
print(mol.aromatic)       # [True, True, True, True, True, True]

smi = to_smiles(mol)      # canonical SMILES string

Substructure Search

from smsd import is_substructure, find_substructure, ChemOptions

query  = parse_smiles("c1ccccc1")
target = parse_smiles("c1ccc(O)cc1")

# Boolean check
if is_substructure(query, target):
    print("Query is a substructure of target")

# Get atom mapping
mapping = find_substructure(query, target)
# Returns list of (query_atom, target_atom) pairs
for qi, ti in mapping:
    print(f"  query atom {qi} -> target atom {ti}")

# Custom options
opts = ChemOptions()
opts.ring_matches_ring_only = True
is_substructure(query, target, opts=opts, timeout_ms=5000)

MCS Search

from smsd import find_mcs, ChemOptions, McsOptions

g1 = parse_smiles("CC(=O)Oc1ccccc1C(=O)O")   # aspirin
g2 = parse_smiles("CC(=O)Nc1ccc(O)cc1")       # acetaminophen

# Default MCS
mapping = find_mcs(g1, g2)
print(f"MCS size: {len(mapping)}")

# Tautomer-aware MCS
taut = ChemOptions.tautomer_profile()
mapping = find_mcs(g1, g2, chem=taut)

# With MCS options
mcs_opts = McsOptions()
mcs_opts.timeout_ms = 5000
mcs_opts.connected_only = True
mapping = find_mcs(g1, g2, opts=mcs_opts)

# Convenience wrapper (accepts SMILES strings directly)
from smsd import mcs
mapping = mcs("c1ccccc1", "Cc1ccccc1", tautomer_aware=True)

Similarity and Screening

from smsd import similarity_upper_bound, screen_targets, similarity

g1 = parse_smiles("c1ccccc1")
g2 = parse_smiles("Cc1ccccc1")

# Single pair
sim = similarity_upper_bound(g1, g2)
print(f"Similarity: {sim:.3f}")

# Convenience wrapper (accepts SMILES)
sim = similarity("c1ccccc1", "Cc1ccccc1")

# Batch screening
library = [parse_smiles(s) for s in smiles_list]
query = parse_smiles("c1ccccc1")
hits = screen_targets(query, library, threshold=0.5)
# Returns indices of molecules with similarity >= 0.5

Fingerprints

from smsd import (
    path_fingerprint, mcs_fingerprint,
    fingerprint_subset, analyze_fp_quality,
    fingerprint, tanimoto,
)

mol = parse_smiles("c1ccccc1")

# Path fingerprint (returns set bit positions)
fp = path_fingerprint(mol, path_length=7, fp_size=2048)

# MCS-aware fingerprint
fp_mcs = mcs_fingerprint(mol, path_length=7, fp_size=2048)

# Subset check (for substructure pre-screening)
query_fp = path_fingerprint(parse_smiles("c1ccccc1"))
target_fp = path_fingerprint(parse_smiles("c1ccc(O)cc1"))
assert fingerprint_subset(query_fp, target_fp)

# Quality analysis
quality = analyze_fp_quality(fp)
print(quality)  # {'set_bits': 12, 'density': 0.006, ...}

# Convenience wrappers
fp = fingerprint("CCO", kind="mcs")
sim = tanimoto(fp, fingerprint("CCCO"))

MolGraph Builder

Build molecules directly without SMILES:

from smsd import MolGraphBuilder

builder = MolGraphBuilder(6)  # 6 atoms
for i in range(6):
    builder.atom(i, 6, charge=0, aromatic=True, in_ring=True)
for i in range(6):
    builder.bond(i, (i + 1) % 6, order=1, in_ring=True, aromatic=True)
benzene = builder.build()

Configuration

from smsd import ChemOptions, McsOptions, BondOrderMode, RingFusionMode

# ChemOptions controls atom/bond matching
chem = ChemOptions()
chem.match_atom_type = True
chem.match_formal_charge = True
chem.tautomer_aware = True
chem.complete_rings_only = True
chem.match_bond_order = BondOrderMode.LOOSE
chem.ring_fusion_mode = RingFusionMode.STRICT

# Named profiles
chem = ChemOptions.tautomer_profile()   # tautomer-aware defaults
chem = ChemOptions.profile("strict")    # strict matching

# McsOptions controls MCS algorithm behavior
opts = McsOptions()
opts.connected_only = True
opts.timeout_ms = 10000
opts.maximize_bonds = True  # MCES mode

Running Tests

cd python/
pip install -e ".[dev]"
pytest tests/ -v

License

Apache 2.0. Copyright (c) 2009-2026 Syed Asad Rahman, BioInception Labs.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

smsd-5.0.1.tar.gz (261.3 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

smsd-5.0.1-cp313-cp313-win_amd64.whl (344.2 kB view details)

Uploaded CPython 3.13Windows x86-64

smsd-5.0.1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (396.1 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ x86-64

smsd-5.0.1-cp313-cp313-macosx_11_0_arm64.whl (345.2 kB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

smsd-5.0.1-cp312-cp312-win_amd64.whl (344.2 kB view details)

Uploaded CPython 3.12Windows x86-64

smsd-5.0.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (396.1 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

smsd-5.0.1-cp312-cp312-macosx_11_0_arm64.whl (345.1 kB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

smsd-5.0.1-cp311-cp311-win_amd64.whl (343.1 kB view details)

Uploaded CPython 3.11Windows x86-64

smsd-5.0.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (395.8 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

smsd-5.0.1-cp311-cp311-macosx_11_0_arm64.whl (345.1 kB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

smsd-5.0.1-cp310-cp310-win_amd64.whl (342.2 kB view details)

Uploaded CPython 3.10Windows x86-64

smsd-5.0.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (394.6 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

smsd-5.0.1-cp310-cp310-macosx_11_0_arm64.whl (343.9 kB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

File details

Details for the file smsd-5.0.1.tar.gz.

File metadata

  • Download URL: smsd-5.0.1.tar.gz
  • Upload date:
  • Size: 261.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for smsd-5.0.1.tar.gz
Algorithm Hash digest
SHA256 2f71d99c68f1d9728080f659c76d6425d52664d2465cb84e8add2be08e2c269c
MD5 ecc633790ddfa565ffb0995feb596655
BLAKE2b-256 6cecd57517191a5e5098e978e54db6a37278b0236cec8eced3dfe1a5adf9c1dd

See more details on using hashes here.

Provenance

The following attestation bundles were made for smsd-5.0.1.tar.gz:

Publisher: python-publish.yml on asad/SMSD

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file smsd-5.0.1-cp313-cp313-win_amd64.whl.

File metadata

  • Download URL: smsd-5.0.1-cp313-cp313-win_amd64.whl
  • Upload date:
  • Size: 344.2 kB
  • Tags: CPython 3.13, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for smsd-5.0.1-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 406dfc4b7afe29f6789909c013de5ded4502e7b8c4f38499271b81e309b13177
MD5 a16618e9d0a04df6d94415423fca7d15
BLAKE2b-256 67bdaac455e02252281e21768664f73db3d6d27bed43c46d733a016e0ce1093b

See more details on using hashes here.

Provenance

The following attestation bundles were made for smsd-5.0.1-cp313-cp313-win_amd64.whl:

Publisher: python-publish.yml on asad/SMSD

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file smsd-5.0.1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for smsd-5.0.1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 9645189e75e57b4fe8f4d49f7642ed8125cc555df305827215b1957c1803dd25
MD5 86bd36c26c1c69c1ff396fb82d44d149
BLAKE2b-256 096ac891c97db38335c55d936beba8531a1e523bccfd89110b19c97c3efbd7f6

See more details on using hashes here.

Provenance

The following attestation bundles were made for smsd-5.0.1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: python-publish.yml on asad/SMSD

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file smsd-5.0.1-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for smsd-5.0.1-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 441abb4ede47f8a2936858f2c647349ef54b771b68bda152056a40f1839ff36f
MD5 ce979f916a288f52b081da0e655ca24e
BLAKE2b-256 da4d9654ed52ad77d193abd91b2e7d5a33455a141393db4dbb2f2337294627d6

See more details on using hashes here.

Provenance

The following attestation bundles were made for smsd-5.0.1-cp313-cp313-macosx_11_0_arm64.whl:

Publisher: python-publish.yml on asad/SMSD

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file smsd-5.0.1-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: smsd-5.0.1-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 344.2 kB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for smsd-5.0.1-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 d43c8564e8f37e74fd2b3f023e325b82a424f32b147bdd2d848085fa79f038de
MD5 4be85cb99e2c971f2c99454e99f57812
BLAKE2b-256 623ca589627ba4f36c52afe401fb945b64ff21e44eff22d3b9a6d88b26856548

See more details on using hashes here.

Provenance

The following attestation bundles were made for smsd-5.0.1-cp312-cp312-win_amd64.whl:

Publisher: python-publish.yml on asad/SMSD

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file smsd-5.0.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for smsd-5.0.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 e4af9304a511c0802b366720b2374fef02e2b5b84d486790fe8130fc204a482a
MD5 63586a0f5f07095f707a9e9d8ee5aabc
BLAKE2b-256 90ef188ed7dc561f2e1416b58793889cad975377ca3a67b239a8b5b1bc2690f4

See more details on using hashes here.

Provenance

The following attestation bundles were made for smsd-5.0.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: python-publish.yml on asad/SMSD

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file smsd-5.0.1-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for smsd-5.0.1-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 8e96294593d45df693b5e1058a1eabc0f5391f1f8792b26fe3f70b53e37fe133
MD5 68370da27a40a1e3edb1f9b03e6fc001
BLAKE2b-256 609415608600c7803d51f1148a0874893e806b6ce4b528642c971b85fb6a94f7

See more details on using hashes here.

Provenance

The following attestation bundles were made for smsd-5.0.1-cp312-cp312-macosx_11_0_arm64.whl:

Publisher: python-publish.yml on asad/SMSD

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file smsd-5.0.1-cp311-cp311-win_amd64.whl.

File metadata

  • Download URL: smsd-5.0.1-cp311-cp311-win_amd64.whl
  • Upload date:
  • Size: 343.1 kB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for smsd-5.0.1-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 0dccc3ea475685b39576d2770b5273b4fe13fd1383a48713bc791bf03e1537ec
MD5 163526960e2db773c72aff7f7b65c0c3
BLAKE2b-256 42b10d29f42910344fafaa121910f30c818d1a67ebb2a6dc32b04df8e4b8e004

See more details on using hashes here.

Provenance

The following attestation bundles were made for smsd-5.0.1-cp311-cp311-win_amd64.whl:

Publisher: python-publish.yml on asad/SMSD

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file smsd-5.0.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for smsd-5.0.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 ca937961619e3f55c3fd5b41062f16f2ce364e79ca2cf16d4547455c6c3dac3c
MD5 c66b610412671c13836ecb34694c3ef1
BLAKE2b-256 e8b496006dad8f927c191300cd1d35e365dd84aa80c679160fea7c237cee647a

See more details on using hashes here.

Provenance

The following attestation bundles were made for smsd-5.0.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: python-publish.yml on asad/SMSD

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file smsd-5.0.1-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for smsd-5.0.1-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 43610ff1077ec1c735505a7823e060fc5bc62f376f8386b9d2c9b988b544002f
MD5 fbc303d72f80ac59a23ede28dbe5eab4
BLAKE2b-256 8854d279281898132fffc0204cc82d3a9fd17d027ff2af5ada0d33ae13a5cfde

See more details on using hashes here.

Provenance

The following attestation bundles were made for smsd-5.0.1-cp311-cp311-macosx_11_0_arm64.whl:

Publisher: python-publish.yml on asad/SMSD

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file smsd-5.0.1-cp310-cp310-win_amd64.whl.

File metadata

  • Download URL: smsd-5.0.1-cp310-cp310-win_amd64.whl
  • Upload date:
  • Size: 342.2 kB
  • Tags: CPython 3.10, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for smsd-5.0.1-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 cc8d31a1324b6dbf130cc2ed437df9e1d8768f6bff74bb8666a0887b0804c0c5
MD5 f7320bd7be1d3f8243c16eff92c1855b
BLAKE2b-256 e07dc1ea3bb36e74660330eb0cf6abbd287ee87f75a4c0b4f2eed1cc0f17f94e

See more details on using hashes here.

Provenance

The following attestation bundles were made for smsd-5.0.1-cp310-cp310-win_amd64.whl:

Publisher: python-publish.yml on asad/SMSD

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file smsd-5.0.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for smsd-5.0.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 4a3754e47f632ba3fc2ca7c148a38ced8f9f42b163537debd8536b31cf5b9faf
MD5 395a82991e1be9057b85d3c2e7aafeef
BLAKE2b-256 3112a9d527dc6e4d095b05572281a050fe28befbd20a1ea5ce5bd5515b01beec

See more details on using hashes here.

Provenance

The following attestation bundles were made for smsd-5.0.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: python-publish.yml on asad/SMSD

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file smsd-5.0.1-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for smsd-5.0.1-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 f5423018a3008d1072191af73a7c0513f91c0ae7f9e7003ca75a24dc91371cce
MD5 529cd9dfa913139cfa26f9cda9149600
BLAKE2b-256 e2155863caca2bfc0dbcb5de1fc5b515e0fd57e7b462e6b768eca0132048a966

See more details on using hashes here.

Provenance

The following attestation bundles were made for smsd-5.0.1-cp310-cp310-macosx_11_0_arm64.whl:

Publisher: python-publish.yml on asad/SMSD

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page