Skip to main content

SMSD -- Substructure & MCS search for chemical graphs

Project description

SMSD Python Bindings

Python interface for SMSD (Small Molecule Substructure Detector) -- a high-performance library for substructure search, Maximum Common Substructure (MCS), and molecular similarity.

Features

  • SMILES parsing and writing -- built-in OpenSMILES parser, no RDKit or CDK required
  • Substructure search -- VF2++ subgraph isomorphism
  • MCS search -- McSplit with seed-and-extend, orbit pruning, coverage-driven termination
  • Tautomer-aware matching -- keto/enol, amide, imidazole tautomer equivalence
  • RASCAL screening -- O(V+E) Tanimoto-like similarity upper bound
  • Fingerprints -- path-based and MCS-aware fingerprints for pre-screening

Requirements

  • Python >= 3.8
  • C++17 compiler (GCC 7+, Clang 5+, MSVC 2019+)
  • CMake >= 3.15
  • pybind11 >= 2.12

Installation

From source (recommended)

cd python/
pip install .

Development install

pip install -e ".[dev]"

Build with specific compiler

CMAKE_ARGS="-DCMAKE_CXX_COMPILER=g++-13" pip install .

Quick Start

from smsd import parse_smiles, find_mcs, is_substructure, similarity

# Parse SMILES strings
benzene = parse_smiles("c1ccccc1")
phenol  = parse_smiles("c1ccc(O)cc1")

# Substructure search
assert is_substructure(benzene, phenol)  # benzene is in phenol

# Maximum Common Substructure
mcs = find_mcs(benzene, phenol)
print(f"MCS size: {len(mcs)} atoms")  # 6

# Similarity
sim = similarity(benzene, phenol)
print(f"Similarity: {sim:.3f}")  # ~0.857

API Reference

SMILES Parsing

from smsd import parse_smiles, to_smiles

mol = parse_smiles("c1ccccc1")
print(mol.n)              # 6 atoms
print(mol.atomic_num)     # [6, 6, 6, 6, 6, 6]
print(mol.aromatic)       # [True, True, True, True, True, True]

smi = to_smiles(mol)      # canonical SMILES string

Substructure Search

from smsd import is_substructure, find_substructure, ChemOptions

query  = parse_smiles("c1ccccc1")
target = parse_smiles("c1ccc(O)cc1")

# Boolean check
if is_substructure(query, target):
    print("Query is a substructure of target")

# Get atom mapping
mapping = find_substructure(query, target)
# Returns list of (query_atom, target_atom) pairs
for qi, ti in mapping:
    print(f"  query atom {qi} -> target atom {ti}")

# Custom options
opts = ChemOptions()
opts.ring_matches_ring_only = True
is_substructure(query, target, opts=opts, timeout_ms=5000)

MCS Search

from smsd import find_mcs, ChemOptions, McsOptions

g1 = parse_smiles("CC(=O)Oc1ccccc1C(=O)O")   # aspirin
g2 = parse_smiles("CC(=O)Nc1ccc(O)cc1")       # acetaminophen

# Default MCS
mapping = find_mcs(g1, g2)
print(f"MCS size: {len(mapping)}")

# Tautomer-aware MCS
taut = ChemOptions.tautomer_profile()
mapping = find_mcs(g1, g2, chem=taut)

# With MCS options
mcs_opts = McsOptions()
mcs_opts.timeout_ms = 5000
mcs_opts.connected_only = True
mapping = find_mcs(g1, g2, opts=mcs_opts)

# Convenience wrapper (accepts SMILES strings directly)
from smsd import mcs
mapping = mcs("c1ccccc1", "Cc1ccccc1", tautomer_aware=True)

Similarity and Screening

from smsd import similarity_upper_bound, screen_targets, similarity

g1 = parse_smiles("c1ccccc1")
g2 = parse_smiles("Cc1ccccc1")

# Single pair
sim = similarity_upper_bound(g1, g2)
print(f"Similarity: {sim:.3f}")

# Convenience wrapper (accepts SMILES)
sim = similarity("c1ccccc1", "Cc1ccccc1")

# Batch screening
library = [parse_smiles(s) for s in smiles_list]
query = parse_smiles("c1ccccc1")
hits = screen_targets(query, library, threshold=0.5)
# Returns indices of molecules with similarity >= 0.5

Fingerprints

from smsd import (
    path_fingerprint, mcs_fingerprint,
    fingerprint_subset, analyze_fp_quality,
    fingerprint, tanimoto,
)

mol = parse_smiles("c1ccccc1")

# Path fingerprint (returns set bit positions)
fp = path_fingerprint(mol, path_length=7, fp_size=2048)

# MCS-aware fingerprint
fp_mcs = mcs_fingerprint(mol, path_length=7, fp_size=2048)

# Subset check (for substructure pre-screening)
query_fp = path_fingerprint(parse_smiles("c1ccccc1"))
target_fp = path_fingerprint(parse_smiles("c1ccc(O)cc1"))
assert fingerprint_subset(query_fp, target_fp)

# Quality analysis
quality = analyze_fp_quality(fp)
print(quality)  # {'set_bits': 12, 'density': 0.006, ...}

# Convenience wrappers
fp = fingerprint("CCO", kind="mcs")
sim = tanimoto(fp, fingerprint("CCCO"))

MolGraph Builder

Build molecules directly without SMILES:

from smsd import MolGraphBuilder

builder = MolGraphBuilder(6)  # 6 atoms
for i in range(6):
    builder.atom(i, 6, charge=0, aromatic=True, in_ring=True)
for i in range(6):
    builder.bond(i, (i + 1) % 6, order=1, in_ring=True, aromatic=True)
benzene = builder.build()

Configuration

from smsd import ChemOptions, McsOptions, BondOrderMode, RingFusionMode

# ChemOptions controls atom/bond matching
chem = ChemOptions()
chem.match_atom_type = True
chem.match_formal_charge = True
chem.tautomer_aware = True
chem.complete_rings_only = True
chem.match_bond_order = BondOrderMode.LOOSE
chem.ring_fusion_mode = RingFusionMode.STRICT

# Named profiles
chem = ChemOptions.tautomer_profile()   # tautomer-aware defaults
chem = ChemOptions.profile("strict")    # strict matching

# McsOptions controls MCS algorithm behavior
opts = McsOptions()
opts.connected_only = True
opts.timeout_ms = 10000
opts.maximize_bonds = True  # MCES mode

Running Tests

cd python/
pip install -e ".[dev]"
pytest tests/ -v

License

Apache 2.0. Copyright (c) 2009-2026 Syed Asad Rahman, BioInception Labs.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

smsd-5.4.0.tar.gz (432.1 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

smsd-5.4.0-cp313-cp313-win_amd64.whl (421.3 kB view details)

Uploaded CPython 3.13Windows x86-64

smsd-5.4.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (475.3 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ x86-64

smsd-5.4.0-cp313-cp313-macosx_11_0_arm64.whl (428.7 kB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

smsd-5.4.0-cp312-cp312-win_amd64.whl (421.3 kB view details)

Uploaded CPython 3.12Windows x86-64

smsd-5.4.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (475.3 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

smsd-5.4.0-cp312-cp312-macosx_11_0_arm64.whl (428.6 kB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

smsd-5.4.0-cp311-cp311-win_amd64.whl (420.1 kB view details)

Uploaded CPython 3.11Windows x86-64

smsd-5.4.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (474.8 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

smsd-5.4.0-cp311-cp311-macosx_11_0_arm64.whl (428.7 kB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

smsd-5.4.0-cp310-cp310-win_amd64.whl (419.3 kB view details)

Uploaded CPython 3.10Windows x86-64

smsd-5.4.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (473.6 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

smsd-5.4.0-cp310-cp310-macosx_11_0_arm64.whl (427.4 kB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

File details

Details for the file smsd-5.4.0.tar.gz.

File metadata

  • Download URL: smsd-5.4.0.tar.gz
  • Upload date:
  • Size: 432.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for smsd-5.4.0.tar.gz
Algorithm Hash digest
SHA256 831a6f6e488f26d421d66d0b790a1884d0a481f8bc0314ea7069d4ded7ae9ee0
MD5 010fe79e76b1c6c4154299a5fd17faa1
BLAKE2b-256 3776f9dea2814574e6a1de07ef43fb72e29a6968b5ae316ae3f3b9e797eb0743

See more details on using hashes here.

Provenance

The following attestation bundles were made for smsd-5.4.0.tar.gz:

Publisher: python-publish.yml on asad/SMSD

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file smsd-5.4.0-cp313-cp313-win_amd64.whl.

File metadata

  • Download URL: smsd-5.4.0-cp313-cp313-win_amd64.whl
  • Upload date:
  • Size: 421.3 kB
  • Tags: CPython 3.13, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for smsd-5.4.0-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 273b6f2fe824b196e45877bbea15d2a14304622e4622ff49f95d1bd70dd55119
MD5 39ab65adf1b8568d6db31f571dd0d66b
BLAKE2b-256 3f1dc71eae1704558aef43e80a275d7ccede14a517fd997792a283a3f19f517d

See more details on using hashes here.

Provenance

The following attestation bundles were made for smsd-5.4.0-cp313-cp313-win_amd64.whl:

Publisher: python-publish.yml on asad/SMSD

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file smsd-5.4.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for smsd-5.4.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 ace3e7343116add4e4bb019bdd9b4a1ee40fa2950c34a7238b8e515ba7d6414f
MD5 8cab2916eb3690ab86a9b2ba98bc6df5
BLAKE2b-256 1a4819ebaf8819cb1cae5c66f625baae4b5631e54ffb4a904f108fc620889e19

See more details on using hashes here.

Provenance

The following attestation bundles were made for smsd-5.4.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: python-publish.yml on asad/SMSD

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file smsd-5.4.0-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for smsd-5.4.0-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 d94b3ca9b973f207bc499d21a99527ac0810a37812846327b3aef499a11a67df
MD5 e52352dd79e281f1ca8bd6f66f982cea
BLAKE2b-256 7cd7e447c98627f58a4f19e74a198af2c82eb3c22d5ceb40cd98bb0bb6ef54c0

See more details on using hashes here.

Provenance

The following attestation bundles were made for smsd-5.4.0-cp313-cp313-macosx_11_0_arm64.whl:

Publisher: python-publish.yml on asad/SMSD

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file smsd-5.4.0-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: smsd-5.4.0-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 421.3 kB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for smsd-5.4.0-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 97b31cff4fe8282784c4eac6b65d529cc715c7bf32e037d773a0bf513e87b766
MD5 eb395aa71950e19305e4c81418adb3c4
BLAKE2b-256 acc791ccdd9dbd9a845db9aa7246151513845c66fdeadd195a8857db3ba9aa84

See more details on using hashes here.

Provenance

The following attestation bundles were made for smsd-5.4.0-cp312-cp312-win_amd64.whl:

Publisher: python-publish.yml on asad/SMSD

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file smsd-5.4.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for smsd-5.4.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 462d44033e77223d221d94a3ac563d398802c25680933bab6b9bebfec8d377a3
MD5 576084f37425168f74623b727614d3b4
BLAKE2b-256 4545c4216bd730d0015273821652c6508148904e2d0118e9c3d7a406f73618b9

See more details on using hashes here.

Provenance

The following attestation bundles were made for smsd-5.4.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: python-publish.yml on asad/SMSD

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file smsd-5.4.0-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for smsd-5.4.0-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 0c59691150ef8110bba3f1065430271434258cd42eea09f2e21055cada8418b3
MD5 0feafdc3e4d6a4659ecbc0a25470d0a8
BLAKE2b-256 f07c7410b278a677d31ac9feb24fa610924e3fcee4bc4cca111bc3dc29d740fd

See more details on using hashes here.

Provenance

The following attestation bundles were made for smsd-5.4.0-cp312-cp312-macosx_11_0_arm64.whl:

Publisher: python-publish.yml on asad/SMSD

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file smsd-5.4.0-cp311-cp311-win_amd64.whl.

File metadata

  • Download URL: smsd-5.4.0-cp311-cp311-win_amd64.whl
  • Upload date:
  • Size: 420.1 kB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for smsd-5.4.0-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 cd8e4a189092339cfa768ef55f0fb1879a837c468886605da889727c4278ba3c
MD5 d87fad3e1c829f24ab174b12e6e0409f
BLAKE2b-256 629009822bebcc81d2a09c73a34594912916ef0107c05c01265a0c2e0d0e02a6

See more details on using hashes here.

Provenance

The following attestation bundles were made for smsd-5.4.0-cp311-cp311-win_amd64.whl:

Publisher: python-publish.yml on asad/SMSD

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file smsd-5.4.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for smsd-5.4.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 cee8c2e47838bf8065d2807a4410fc89f27bd69efee907402c9518a2c9d08521
MD5 e9da868b669a3450f64f6f8a38fcefd0
BLAKE2b-256 bd2e33244696498c413dc919203fca314fa62b4403919b6ff38014c9a5278d28

See more details on using hashes here.

Provenance

The following attestation bundles were made for smsd-5.4.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: python-publish.yml on asad/SMSD

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file smsd-5.4.0-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for smsd-5.4.0-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 866da803ea2aadb6252fe8ad01e357ba4d07cd50410a134a0718f8f25092919b
MD5 83e56dacb5ab14e15e57d209ef196bdf
BLAKE2b-256 d57e0eaf46cf2e36bdc09c5b850fdc9f8f677e7bfe4855cddc7be92677092911

See more details on using hashes here.

Provenance

The following attestation bundles were made for smsd-5.4.0-cp311-cp311-macosx_11_0_arm64.whl:

Publisher: python-publish.yml on asad/SMSD

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file smsd-5.4.0-cp310-cp310-win_amd64.whl.

File metadata

  • Download URL: smsd-5.4.0-cp310-cp310-win_amd64.whl
  • Upload date:
  • Size: 419.3 kB
  • Tags: CPython 3.10, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for smsd-5.4.0-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 ebab5555cdbb09ccd6be00cf1bfef88ee443dc76382da1e2bfcde9bcb79eea2c
MD5 a8b4e1377e1a6544d584e12b71e10b4b
BLAKE2b-256 1b0e826d8170e8fb9bcf44a2ae25f4ff09ca64a0ff7dcff4661b278c897b35cd

See more details on using hashes here.

Provenance

The following attestation bundles were made for smsd-5.4.0-cp310-cp310-win_amd64.whl:

Publisher: python-publish.yml on asad/SMSD

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file smsd-5.4.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for smsd-5.4.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 a6a72f8edc2f9fc34aa35b5d45e7c9c9045d5d5219657b0ae61539f783c23640
MD5 43f834dfd193fcfcecd50df6c256d8d7
BLAKE2b-256 348173c26ee74f28865eec71e28105bdca384984cf359a202e520d8687d3ce85

See more details on using hashes here.

Provenance

The following attestation bundles were made for smsd-5.4.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: python-publish.yml on asad/SMSD

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file smsd-5.4.0-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for smsd-5.4.0-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 7f67aa20b8f63efa1cbe5b56882b867804d21db061da3be7b9157500e12a33d0
MD5 dde13856825ba6f9a2b8ad86460c6bdd
BLAKE2b-256 191b6c3f7d97dd11fd6eb5c4bc5381a75fd5eee06a30978a0a751cd744be705f

See more details on using hashes here.

Provenance

The following attestation bundles were made for smsd-5.4.0-cp310-cp310-macosx_11_0_arm64.whl:

Publisher: python-publish.yml on asad/SMSD

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page