Skip to main content

SMSD -- Substructure & MCS search for chemical graphs

Project description

SMSD Python Bindings

Python interface for SMSD (Small Molecule Substructure Detector) -- a high-performance library for substructure search, Maximum Common Substructure (MCS), and molecular similarity.

Features

  • SMILES parsing and writing -- built-in OpenSMILES parser, no RDKit or CDK required
  • Substructure search -- VF2++ subgraph isomorphism
  • MCS search -- McSplit with seed-and-extend, orbit pruning, coverage-driven termination
  • Tautomer-aware matching -- keto/enol, amide, imidazole tautomer equivalence
  • RASCAL screening -- O(V+E) Tanimoto-like similarity upper bound
  • Fingerprints -- path-based and MCS-aware fingerprints for pre-screening

Requirements

  • Python >= 3.8
  • C++17 compiler (GCC 7+, Clang 5+, MSVC 2019+)
  • CMake >= 3.15
  • pybind11 >= 2.12

Installation

From source (recommended)

cd python/
pip install .

Development install

pip install -e ".[dev]"

Build with specific compiler

CMAKE_ARGS="-DCMAKE_CXX_COMPILER=g++-13" pip install .

Quick Start

from smsd import parse_smiles, find_mcs, is_substructure, similarity

# Parse SMILES strings
benzene = parse_smiles("c1ccccc1")
phenol  = parse_smiles("c1ccc(O)cc1")

# Substructure search
assert is_substructure(benzene, phenol)  # benzene is in phenol

# Maximum Common Substructure
mcs = find_mcs(benzene, phenol)
print(f"MCS size: {len(mcs)} atoms")  # 6

# Similarity
sim = similarity(benzene, phenol)
print(f"Similarity: {sim:.3f}")  # ~0.857

API Reference

SMILES Parsing

from smsd import parse_smiles, to_smiles

mol = parse_smiles("c1ccccc1")
print(mol.n)              # 6 atoms
print(mol.atomic_num)     # [6, 6, 6, 6, 6, 6]
print(mol.aromatic)       # [True, True, True, True, True, True]

smi = to_smiles(mol)      # canonical SMILES string

Substructure Search

from smsd import is_substructure, find_substructure, ChemOptions

query  = parse_smiles("c1ccccc1")
target = parse_smiles("c1ccc(O)cc1")

# Boolean check
if is_substructure(query, target):
    print("Query is a substructure of target")

# Get atom mapping
mapping = find_substructure(query, target)
# Returns list of (query_atom, target_atom) pairs
for qi, ti in mapping:
    print(f"  query atom {qi} -> target atom {ti}")

# Custom options
opts = ChemOptions()
opts.ring_matches_ring_only = True
is_substructure(query, target, opts=opts, timeout_ms=5000)

MCS Search

from smsd import find_mcs, ChemOptions, McsOptions

g1 = parse_smiles("CC(=O)Oc1ccccc1C(=O)O")   # aspirin
g2 = parse_smiles("CC(=O)Nc1ccc(O)cc1")       # acetaminophen

# Default MCS
mapping = find_mcs(g1, g2)
print(f"MCS size: {len(mapping)}")

# Tautomer-aware MCS
taut = ChemOptions.tautomer_profile()
mapping = find_mcs(g1, g2, chem=taut)

# With MCS options
mcs_opts = McsOptions()
mcs_opts.timeout_ms = 5000
mcs_opts.connected_only = True
mapping = find_mcs(g1, g2, opts=mcs_opts)

# Convenience wrapper (accepts SMILES strings directly)
from smsd import mcs
mapping = mcs("c1ccccc1", "Cc1ccccc1", tautomer_aware=True)

Similarity and Screening

from smsd import similarity_upper_bound, screen_targets, similarity

g1 = parse_smiles("c1ccccc1")
g2 = parse_smiles("Cc1ccccc1")

# Single pair
sim = similarity_upper_bound(g1, g2)
print(f"Similarity: {sim:.3f}")

# Convenience wrapper (accepts SMILES)
sim = similarity("c1ccccc1", "Cc1ccccc1")

# Batch screening
library = [parse_smiles(s) for s in smiles_list]
query = parse_smiles("c1ccccc1")
hits = screen_targets(query, library, threshold=0.5)
# Returns indices of molecules with similarity >= 0.5

Fingerprints

from smsd import (
    path_fingerprint, mcs_fingerprint,
    fingerprint_subset, analyze_fp_quality,
    fingerprint, tanimoto,
)

mol = parse_smiles("c1ccccc1")

# Path fingerprint (returns set bit positions)
fp = path_fingerprint(mol, path_length=7, fp_size=2048)

# MCS-aware fingerprint
fp_mcs = mcs_fingerprint(mol, path_length=7, fp_size=2048)

# Subset check (for substructure pre-screening)
query_fp = path_fingerprint(parse_smiles("c1ccccc1"))
target_fp = path_fingerprint(parse_smiles("c1ccc(O)cc1"))
assert fingerprint_subset(query_fp, target_fp)

# Quality analysis
quality = analyze_fp_quality(fp)
print(quality)  # {'set_bits': 12, 'density': 0.006, ...}

# Convenience wrappers
fp = fingerprint("CCO", kind="mcs")
sim = tanimoto(fp, fingerprint("CCCO"))

MolGraph Builder

Build molecules directly without SMILES:

from smsd import MolGraphBuilder

builder = MolGraphBuilder(6)  # 6 atoms
for i in range(6):
    builder.atom(i, 6, charge=0, aromatic=True, in_ring=True)
for i in range(6):
    builder.bond(i, (i + 1) % 6, order=1, in_ring=True, aromatic=True)
benzene = builder.build()

Configuration

from smsd import ChemOptions, McsOptions, BondOrderMode, RingFusionMode

# ChemOptions controls atom/bond matching
chem = ChemOptions()
chem.match_atom_type = True
chem.match_formal_charge = True
chem.tautomer_aware = True
chem.complete_rings_only = True
chem.match_bond_order = BondOrderMode.LOOSE
chem.ring_fusion_mode = RingFusionMode.STRICT

# Named profiles
chem = ChemOptions.tautomer_profile()   # tautomer-aware defaults
chem = ChemOptions.profile("strict")    # strict matching

# McsOptions controls MCS algorithm behavior
opts = McsOptions()
opts.connected_only = True
opts.timeout_ms = 10000
opts.maximize_bonds = True  # MCES mode

Running Tests

cd python/
pip install -e ".[dev]"
pytest tests/ -v

License

Apache 2.0. Copyright (c) 2009-2026 Syed Asad Rahman, BioInception Labs.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

smsd-5.3.1.tar.gz (414.1 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

smsd-5.3.1-cp313-cp313-win_amd64.whl (404.2 kB view details)

Uploaded CPython 3.13Windows x86-64

smsd-5.3.1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (456.4 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ x86-64

smsd-5.3.1-cp313-cp313-macosx_11_0_arm64.whl (409.9 kB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

smsd-5.3.1-cp312-cp312-win_amd64.whl (404.2 kB view details)

Uploaded CPython 3.12Windows x86-64

smsd-5.3.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (456.1 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

smsd-5.3.1-cp312-cp312-macosx_11_0_arm64.whl (410.0 kB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

smsd-5.3.1-cp311-cp311-win_amd64.whl (403.1 kB view details)

Uploaded CPython 3.11Windows x86-64

smsd-5.3.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (456.3 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

smsd-5.3.1-cp311-cp311-macosx_11_0_arm64.whl (409.7 kB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

smsd-5.3.1-cp310-cp310-win_amd64.whl (402.4 kB view details)

Uploaded CPython 3.10Windows x86-64

smsd-5.3.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (455.0 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

smsd-5.3.1-cp310-cp310-macosx_11_0_arm64.whl (408.7 kB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

File details

Details for the file smsd-5.3.1.tar.gz.

File metadata

  • Download URL: smsd-5.3.1.tar.gz
  • Upload date:
  • Size: 414.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for smsd-5.3.1.tar.gz
Algorithm Hash digest
SHA256 8145e1d708ac11b77264958c6f2c2e66790ce5d98c4a407e6051eae77ed9308d
MD5 66f93a054b4c1250720909f4d13a9dfb
BLAKE2b-256 8e1a6a6e0a386f1d22e0af0e30ab0b4fd74422a4355cc34e9fa8780e118d302b

See more details on using hashes here.

Provenance

The following attestation bundles were made for smsd-5.3.1.tar.gz:

Publisher: python-publish.yml on asad/SMSD

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file smsd-5.3.1-cp313-cp313-win_amd64.whl.

File metadata

  • Download URL: smsd-5.3.1-cp313-cp313-win_amd64.whl
  • Upload date:
  • Size: 404.2 kB
  • Tags: CPython 3.13, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for smsd-5.3.1-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 b254c9d4d0c12f17bde35205300474e441e2dcd9580320cdc6b0033aaeb81feb
MD5 5c22b66b4ad9f923b903cd359356a26f
BLAKE2b-256 183870f893ee2bdadf732ac1c09ad451e78b28b0e6e50a1ca7e0f27f2c1ef298

See more details on using hashes here.

Provenance

The following attestation bundles were made for smsd-5.3.1-cp313-cp313-win_amd64.whl:

Publisher: python-publish.yml on asad/SMSD

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file smsd-5.3.1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for smsd-5.3.1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 a0bfcc8ed492285bab6b19fd5f1046ff95d40c5befafe9fe7dcd736c9a0a2b80
MD5 629780bb08a33da79e63a77d3c6e72b8
BLAKE2b-256 72fc10a629fdd628dc631a1290e92ae2ecc58d1a047b4e323a664d327cae1229

See more details on using hashes here.

Provenance

The following attestation bundles were made for smsd-5.3.1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: python-publish.yml on asad/SMSD

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file smsd-5.3.1-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for smsd-5.3.1-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 31140941614f44265a89c438dc9b9ed2dd97836f5fec8354964bd38004ff061d
MD5 ccb5811146903e535a4ebfd55d1bc065
BLAKE2b-256 61285eb5eec3c8698946be55425b83112299ec36d9d64535cf1b405587941c01

See more details on using hashes here.

Provenance

The following attestation bundles were made for smsd-5.3.1-cp313-cp313-macosx_11_0_arm64.whl:

Publisher: python-publish.yml on asad/SMSD

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file smsd-5.3.1-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: smsd-5.3.1-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 404.2 kB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for smsd-5.3.1-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 c8b71adf99d998f753b8c7e49881f07e9bee69a914f04fa8ed862d58187af735
MD5 7abb8608d6be11bc16ce2d43ba84d5a7
BLAKE2b-256 9f24416801f520bb69756d1aba204d4f9e0467ea3f648c1da05a19ecfc5c9fa0

See more details on using hashes here.

Provenance

The following attestation bundles were made for smsd-5.3.1-cp312-cp312-win_amd64.whl:

Publisher: python-publish.yml on asad/SMSD

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file smsd-5.3.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for smsd-5.3.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 983350bcd661cf8f400f87ceeefc175584eec366faf6714222e1f85c43030794
MD5 c169e5e63cbf0e6f41998d1a7f6a4451
BLAKE2b-256 17114d37dc996910a85210bddd4d6b288a27eece75d956bd64ebf4605ab758ea

See more details on using hashes here.

Provenance

The following attestation bundles were made for smsd-5.3.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: python-publish.yml on asad/SMSD

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file smsd-5.3.1-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for smsd-5.3.1-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 7f2cf5037554327d1e939fa13a0c4221277104c2972f1e5094bb3b87783edbe2
MD5 ca851ddfee5b39a1d0e398fdf564f672
BLAKE2b-256 5d1bcdfca425f71ddc1fcaf4c8296a0667a737490c029366ad20f7f0ba26a20c

See more details on using hashes here.

Provenance

The following attestation bundles were made for smsd-5.3.1-cp312-cp312-macosx_11_0_arm64.whl:

Publisher: python-publish.yml on asad/SMSD

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file smsd-5.3.1-cp311-cp311-win_amd64.whl.

File metadata

  • Download URL: smsd-5.3.1-cp311-cp311-win_amd64.whl
  • Upload date:
  • Size: 403.1 kB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for smsd-5.3.1-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 ca6eab9654c155ddeabcf9213fe57854b3c2f5c5641019970d928ff462fcb307
MD5 01b78140902b052c49887354b03653eb
BLAKE2b-256 064ad14acc5c384edf0daadefe3f6f003aa23f9248781ba52de7cc1ac281c97f

See more details on using hashes here.

Provenance

The following attestation bundles were made for smsd-5.3.1-cp311-cp311-win_amd64.whl:

Publisher: python-publish.yml on asad/SMSD

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file smsd-5.3.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for smsd-5.3.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 e3f0e61ef41f02c290ab40651ff0873b587954c70a20d04f2566c410e9f86f5a
MD5 81f38fefe39326fed51b0a0b264acd60
BLAKE2b-256 a7077050998b0a49b5cca21eebf2ecba984a487474a89a09f80e8c655c6367b7

See more details on using hashes here.

Provenance

The following attestation bundles were made for smsd-5.3.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: python-publish.yml on asad/SMSD

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file smsd-5.3.1-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for smsd-5.3.1-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 8cb74084753c832e7773d00ee4939316548de14d63b5e27ec1ba4b9c2dd87723
MD5 e7e373fbe38d1f8b6e7321e0b1aeaff2
BLAKE2b-256 447d1d3add4b72f5de4b7fb823c602fcbb8eb479ae7cbc8b3e7516cc7bbe7462

See more details on using hashes here.

Provenance

The following attestation bundles were made for smsd-5.3.1-cp311-cp311-macosx_11_0_arm64.whl:

Publisher: python-publish.yml on asad/SMSD

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file smsd-5.3.1-cp310-cp310-win_amd64.whl.

File metadata

  • Download URL: smsd-5.3.1-cp310-cp310-win_amd64.whl
  • Upload date:
  • Size: 402.4 kB
  • Tags: CPython 3.10, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for smsd-5.3.1-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 cc03ec9156f45444c46a9f8ecada643c59f1d589c82c22e4ab28ced4d3e06e33
MD5 2833a3f34e8c2e43c14ac41321d5fbab
BLAKE2b-256 ba48c8b58743d3d4bd01f39547202e3941a452ad786f567f806efee9aace4e13

See more details on using hashes here.

Provenance

The following attestation bundles were made for smsd-5.3.1-cp310-cp310-win_amd64.whl:

Publisher: python-publish.yml on asad/SMSD

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file smsd-5.3.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for smsd-5.3.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 97a3f7b082c68886bd5dc99ca0f58b23e36426c55352525fd7a2827e5a926b50
MD5 a5ad7a932e80cc61df596dea9a47f621
BLAKE2b-256 703ddde578d07efe1047f9f018d10bacb498e3f4effb85dae264088a9b9045a6

See more details on using hashes here.

Provenance

The following attestation bundles were made for smsd-5.3.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: python-publish.yml on asad/SMSD

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file smsd-5.3.1-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for smsd-5.3.1-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 7d632daca8e2e8db6cb966ae1d14480ae9e2425b4d0f30ca3c4f7353997da8e7
MD5 04182142ec68bc163ede54f061ad7f88
BLAKE2b-256 f26dd660f2fd7051ffaa4bbacb0efd6d9edf3243c537233c39c8d8079a5d35f3

See more details on using hashes here.

Provenance

The following attestation bundles were made for smsd-5.3.1-cp310-cp310-macosx_11_0_arm64.whl:

Publisher: python-publish.yml on asad/SMSD

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page