Skip to main content

SMSD -- Substructure & MCS search for chemical graphs

Project description

SMSD Python Bindings

Python interface for SMSD (Small Molecule Substructure Detector) -- a high-performance library for substructure search, Maximum Common Substructure (MCS), and molecular similarity.

Features

  • SMILES parsing and writing -- built-in OpenSMILES parser, no RDKit or CDK required
  • Substructure search -- VF2++ subgraph isomorphism
  • MCS search -- McSplit with seed-and-extend, orbit pruning, coverage-driven termination
  • Tautomer-aware matching -- keto/enol, amide, imidazole tautomer equivalence
  • RASCAL screening -- O(V+E) Tanimoto-like similarity upper bound
  • Fingerprints -- path-based and MCS-aware fingerprints for pre-screening

Requirements

  • Python >= 3.8
  • C++17 compiler (GCC 7+, Clang 5+, MSVC 2019+)
  • CMake >= 3.15
  • pybind11 >= 2.12

Installation

From source (recommended)

cd python/
pip install .

Development install

pip install -e ".[dev]"

Build with specific compiler

CMAKE_ARGS="-DCMAKE_CXX_COMPILER=g++-13" pip install .

Quick Start

from smsd import parse_smiles, find_mcs, is_substructure, similarity

# Parse SMILES strings
benzene = parse_smiles("c1ccccc1")
phenol  = parse_smiles("c1ccc(O)cc1")

# Substructure search
assert is_substructure(benzene, phenol)  # benzene is in phenol

# Maximum Common Substructure
mcs = find_mcs(benzene, phenol)
print(f"MCS size: {len(mcs)} atoms")  # 6

# Similarity
sim = similarity(benzene, phenol)
print(f"Similarity: {sim:.3f}")  # ~0.857

API Reference

SMILES Parsing

from smsd import parse_smiles, to_smiles

mol = parse_smiles("c1ccccc1")
print(mol.n)              # 6 atoms
print(mol.atomic_num)     # [6, 6, 6, 6, 6, 6]
print(mol.aromatic)       # [True, True, True, True, True, True]

smi = to_smiles(mol)      # canonical SMILES string

Substructure Search

from smsd import is_substructure, find_substructure, ChemOptions

query  = parse_smiles("c1ccccc1")
target = parse_smiles("c1ccc(O)cc1")

# Boolean check
if is_substructure(query, target):
    print("Query is a substructure of target")

# Get atom mapping
mapping = find_substructure(query, target)
# Returns list of (query_atom, target_atom) pairs
for qi, ti in mapping:
    print(f"  query atom {qi} -> target atom {ti}")

# Custom options
opts = ChemOptions()
opts.ring_matches_ring_only = True
is_substructure(query, target, opts=opts, timeout_ms=5000)

MCS Search

from smsd import find_mcs, ChemOptions, McsOptions

g1 = parse_smiles("CC(=O)Oc1ccccc1C(=O)O")   # aspirin
g2 = parse_smiles("CC(=O)Nc1ccc(O)cc1")       # acetaminophen

# Default MCS
mapping = find_mcs(g1, g2)
print(f"MCS size: {len(mapping)}")

# Tautomer-aware MCS
taut = ChemOptions.tautomer_profile()
mapping = find_mcs(g1, g2, chem=taut)

# With MCS options
mcs_opts = McsOptions()
mcs_opts.timeout_ms = 5000
mcs_opts.connected_only = True
mapping = find_mcs(g1, g2, opts=mcs_opts)

# Convenience wrapper (accepts SMILES strings directly)
from smsd import mcs
mapping = mcs("c1ccccc1", "Cc1ccccc1", tautomer_aware=True)

Similarity and Screening

from smsd import similarity_upper_bound, screen_targets, similarity

g1 = parse_smiles("c1ccccc1")
g2 = parse_smiles("Cc1ccccc1")

# Single pair
sim = similarity_upper_bound(g1, g2)
print(f"Similarity: {sim:.3f}")

# Convenience wrapper (accepts SMILES)
sim = similarity("c1ccccc1", "Cc1ccccc1")

# Batch screening
library = [parse_smiles(s) for s in smiles_list]
query = parse_smiles("c1ccccc1")
hits = screen_targets(query, library, threshold=0.5)
# Returns indices of molecules with similarity >= 0.5

Fingerprints

from smsd import (
    path_fingerprint, mcs_fingerprint,
    fingerprint_subset, analyze_fp_quality,
    fingerprint, tanimoto,
)

mol = parse_smiles("c1ccccc1")

# Path fingerprint (returns set bit positions)
fp = path_fingerprint(mol, path_length=7, fp_size=2048)

# MCS-aware fingerprint
fp_mcs = mcs_fingerprint(mol, path_length=7, fp_size=2048)

# Subset check (for substructure pre-screening)
query_fp = path_fingerprint(parse_smiles("c1ccccc1"))
target_fp = path_fingerprint(parse_smiles("c1ccc(O)cc1"))
assert fingerprint_subset(query_fp, target_fp)

# Quality analysis
quality = analyze_fp_quality(fp)
print(quality)  # {'set_bits': 12, 'density': 0.006, ...}

# Convenience wrappers
fp = fingerprint("CCO", kind="mcs")
sim = tanimoto(fp, fingerprint("CCCO"))

MolGraph Builder

Build molecules directly without SMILES:

from smsd import MolGraphBuilder

builder = MolGraphBuilder(6)  # 6 atoms
for i in range(6):
    builder.atom(i, 6, charge=0, aromatic=True, in_ring=True)
for i in range(6):
    builder.bond(i, (i + 1) % 6, order=1, in_ring=True, aromatic=True)
benzene = builder.build()

Configuration

from smsd import ChemOptions, McsOptions, BondOrderMode, RingFusionMode

# ChemOptions controls atom/bond matching
chem = ChemOptions()
chem.match_atom_type = True
chem.match_formal_charge = True
chem.tautomer_aware = True
chem.complete_rings_only = True
chem.match_bond_order = BondOrderMode.LOOSE
chem.ring_fusion_mode = RingFusionMode.STRICT

# Named profiles
chem = ChemOptions.tautomer_profile()   # tautomer-aware defaults
chem = ChemOptions.profile("strict")    # strict matching

# McsOptions controls MCS algorithm behavior
opts = McsOptions()
opts.connected_only = True
opts.timeout_ms = 10000
opts.maximize_bonds = True  # MCES mode

Running Tests

cd python/
pip install -e ".[dev]"
pytest tests/ -v

License

Apache 2.0. Copyright (c) 2018-2026 BioInception PVT LTD. Algorithm Copyright (c) 2009-2026 Syed Asad Rahman.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

smsd-5.6.0.tar.gz (447.4 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

smsd-5.6.0-cp313-cp313-win_amd64.whl (435.5 kB view details)

Uploaded CPython 3.13Windows x86-64

smsd-5.6.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (490.7 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ x86-64

smsd-5.6.0-cp313-cp313-macosx_11_0_arm64.whl (439.5 kB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

smsd-5.6.0-cp312-cp312-win_amd64.whl (435.5 kB view details)

Uploaded CPython 3.12Windows x86-64

smsd-5.6.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (490.5 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

smsd-5.6.0-cp312-cp312-macosx_11_0_arm64.whl (439.4 kB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

smsd-5.6.0-cp311-cp311-win_amd64.whl (434.1 kB view details)

Uploaded CPython 3.11Windows x86-64

smsd-5.6.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (488.7 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

smsd-5.6.0-cp311-cp311-macosx_11_0_arm64.whl (439.3 kB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

smsd-5.6.0-cp310-cp310-win_amd64.whl (433.3 kB view details)

Uploaded CPython 3.10Windows x86-64

smsd-5.6.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (487.9 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

smsd-5.6.0-cp310-cp310-macosx_11_0_arm64.whl (438.2 kB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

File details

Details for the file smsd-5.6.0.tar.gz.

File metadata

  • Download URL: smsd-5.6.0.tar.gz
  • Upload date:
  • Size: 447.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for smsd-5.6.0.tar.gz
Algorithm Hash digest
SHA256 5e2041a8a8b70d05a541a50ed2d0691451ed48d1e7cfcf4774bfb4e9ed285dbd
MD5 dc3b0893c6df81ecde23370c8cf0d95b
BLAKE2b-256 53fdf7af8c22eb310b6d17b05cf08874a29cf4d5f1a8db98e6aa713907ee9114

See more details on using hashes here.

Provenance

The following attestation bundles were made for smsd-5.6.0.tar.gz:

Publisher: python-publish.yml on asad/SMSD

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file smsd-5.6.0-cp313-cp313-win_amd64.whl.

File metadata

  • Download URL: smsd-5.6.0-cp313-cp313-win_amd64.whl
  • Upload date:
  • Size: 435.5 kB
  • Tags: CPython 3.13, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for smsd-5.6.0-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 112693a921d80c10da73a0a4e2155f855e295a207574385e1efce13d8b22727a
MD5 e557ba3867adede27ffc6837477423ac
BLAKE2b-256 f3a843f0b4656b3581e6859d3e354930651a173a45155f08537c8a2f9c565bac

See more details on using hashes here.

Provenance

The following attestation bundles were made for smsd-5.6.0-cp313-cp313-win_amd64.whl:

Publisher: python-publish.yml on asad/SMSD

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file smsd-5.6.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for smsd-5.6.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 1f15fae98271b9cd5e889c6d7b912bb35b6f089dee28349dd30a1c1f8de14304
MD5 ecb8ee44e005c1cfe50dd04a5ff3abdd
BLAKE2b-256 05424964db1b5fb6d2f4330835d28d8fd6159e9d2ce961b82e45a9069613a8aa

See more details on using hashes here.

Provenance

The following attestation bundles were made for smsd-5.6.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: python-publish.yml on asad/SMSD

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file smsd-5.6.0-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for smsd-5.6.0-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 c3b99a60aa005bcfa7fb394f25d92827d73f0172130ce1bda00f9678493e68e0
MD5 fb0b28c72d877fd35c9a4d37a2d8c0d7
BLAKE2b-256 0dc4e1db6c0120f7c082d31ceea7b50d7bdbf734d70b51dac07fce2ea70a8cf2

See more details on using hashes here.

Provenance

The following attestation bundles were made for smsd-5.6.0-cp313-cp313-macosx_11_0_arm64.whl:

Publisher: python-publish.yml on asad/SMSD

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file smsd-5.6.0-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: smsd-5.6.0-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 435.5 kB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for smsd-5.6.0-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 6fb13fc68f5da70a4487765b5cf468d8e8627624fa096f1e92e048db73cb5b0d
MD5 3c5ba42d18bce9cfcf9cfa2dd1f17089
BLAKE2b-256 9bb79473a92380863a816f3408461d8ae6e2d4e8703ffbe9b3a932aa7955581a

See more details on using hashes here.

Provenance

The following attestation bundles were made for smsd-5.6.0-cp312-cp312-win_amd64.whl:

Publisher: python-publish.yml on asad/SMSD

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file smsd-5.6.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for smsd-5.6.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 e25a3d1fff264267f414644c21a8e672245cee7dcd0d01062ae87120882d24a5
MD5 8734770f29b2bcc5031554803712f9ce
BLAKE2b-256 baafa911cad61d585697b3fd9ab53cf598f2560030572b436ec355c64a3de49d

See more details on using hashes here.

Provenance

The following attestation bundles were made for smsd-5.6.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: python-publish.yml on asad/SMSD

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file smsd-5.6.0-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for smsd-5.6.0-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 a2bbdfe8dc4482cba3b34580f118fd59a286dcce1f21af9b35ea3b7a439eba8d
MD5 3641c7ce200655d8e641427bbf73b4bb
BLAKE2b-256 a0060e7bd7cf52efcc1d5fa317f5cb81dfb009d3d691420aa122769cb7e4a9da

See more details on using hashes here.

Provenance

The following attestation bundles were made for smsd-5.6.0-cp312-cp312-macosx_11_0_arm64.whl:

Publisher: python-publish.yml on asad/SMSD

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file smsd-5.6.0-cp311-cp311-win_amd64.whl.

File metadata

  • Download URL: smsd-5.6.0-cp311-cp311-win_amd64.whl
  • Upload date:
  • Size: 434.1 kB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for smsd-5.6.0-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 edebc283f689a317f18bf70ead17f415dd25940f27c8d85862b4f4831ea04e80
MD5 63788d5d06e12c8b2cd1000cecbf34d0
BLAKE2b-256 f2b70c040263c5f74cda2c8d852584f545c78af7fbe9e6eef5d631de233b9a47

See more details on using hashes here.

Provenance

The following attestation bundles were made for smsd-5.6.0-cp311-cp311-win_amd64.whl:

Publisher: python-publish.yml on asad/SMSD

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file smsd-5.6.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for smsd-5.6.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 b9c94b39ff4f416943ace2deb32d798bd4ae624f40afac81ee28c6466f04ffcb
MD5 3b565929fdde3ae27f67377675b660fd
BLAKE2b-256 2db76932ae47722e42fb392d147a6fc8dd9a89d510ea3f5edd42a83b9558ff52

See more details on using hashes here.

Provenance

The following attestation bundles were made for smsd-5.6.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: python-publish.yml on asad/SMSD

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file smsd-5.6.0-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for smsd-5.6.0-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 b5349ab0194395b2f017833d87f8099086d3dd06e8046f4f30c772895c96c375
MD5 de6be44d4408198c42fb882b0af9def4
BLAKE2b-256 0a8638a7b803a432d852f9af5a2a53ef9dae86a398d0e9c098345644a77f3c50

See more details on using hashes here.

Provenance

The following attestation bundles were made for smsd-5.6.0-cp311-cp311-macosx_11_0_arm64.whl:

Publisher: python-publish.yml on asad/SMSD

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file smsd-5.6.0-cp310-cp310-win_amd64.whl.

File metadata

  • Download URL: smsd-5.6.0-cp310-cp310-win_amd64.whl
  • Upload date:
  • Size: 433.3 kB
  • Tags: CPython 3.10, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for smsd-5.6.0-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 2cab796013799b39914b36f3ccffb53e974abdadaa73fd742cebb2b0a1db94d1
MD5 d5523edb003a86a1186ce0db19980ea5
BLAKE2b-256 d776938750e7e9a01ab8eed28ddfa905b2e033d364f3aeba1d25fcdf52f83daa

See more details on using hashes here.

Provenance

The following attestation bundles were made for smsd-5.6.0-cp310-cp310-win_amd64.whl:

Publisher: python-publish.yml on asad/SMSD

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file smsd-5.6.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for smsd-5.6.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 85f35c08e6efb66078ddc907e862a08436510fd3facb42e42e58ef8e69f798d9
MD5 ec404493e3839ca9a20c67e9f867d230
BLAKE2b-256 9d1e82c6bbedf75b26c34cd37a4cc03031c2d87c948c040da8eec798796e9841

See more details on using hashes here.

Provenance

The following attestation bundles were made for smsd-5.6.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: python-publish.yml on asad/SMSD

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file smsd-5.6.0-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for smsd-5.6.0-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 03d0680f644a16478004c474bd2113b57cdf07691e345fc1925617e8d7a4d7d4
MD5 68670b4816177085d22c71560c29223e
BLAKE2b-256 5a6b082288de8e1c42c50dd517d02b1fc8245ba5872453b27a0c1ca24ee7298b

See more details on using hashes here.

Provenance

The following attestation bundles were made for smsd-5.6.0-cp310-cp310-macosx_11_0_arm64.whl:

Publisher: python-publish.yml on asad/SMSD

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page