Skip to main content

SMSD -- Substructure & MCS search for chemical graphs

Project description

SMSD Python Bindings

Python interface for SMSD (Small Molecule Substructure Detector) -- a high-performance library for substructure search, Maximum Common Substructure (MCS), and molecular similarity.

Features

  • SMILES parsing and writing -- built-in OpenSMILES parser, no RDKit or CDK required
  • Substructure search -- VF2++ subgraph isomorphism
  • MCS search -- McSplit with seed-and-extend, orbit pruning, coverage-driven termination
  • Tautomer-aware matching -- keto/enol, amide, imidazole tautomer equivalence
  • RASCAL screening -- O(V+E) Tanimoto-like similarity upper bound
  • Fingerprints -- path-based and MCS-aware fingerprints for pre-screening

Requirements

  • Python >= 3.8
  • C++17 compiler (GCC 7+, Clang 5+, MSVC 2019+)
  • CMake >= 3.15
  • pybind11 >= 2.12

Installation

From source (recommended)

cd python/
pip install .

Development install

pip install -e ".[dev]"

Build with specific compiler

CMAKE_ARGS="-DCMAKE_CXX_COMPILER=g++-13" pip install .

Quick Start

from smsd import parse_smiles, find_mcs, is_substructure, similarity

# Parse SMILES strings
benzene = parse_smiles("c1ccccc1")
phenol  = parse_smiles("c1ccc(O)cc1")

# Substructure search
assert is_substructure(benzene, phenol)  # benzene is in phenol

# Maximum Common Substructure
mcs = find_mcs(benzene, phenol)
print(f"MCS size: {len(mcs)} atoms")  # 6

# Similarity
sim = similarity(benzene, phenol)
print(f"Similarity: {sim:.3f}")  # ~0.857

API Reference

SMILES Parsing

from smsd import parse_smiles, to_smiles

mol = parse_smiles("c1ccccc1")
print(mol.n)              # 6 atoms
print(mol.atomic_num)     # [6, 6, 6, 6, 6, 6]
print(mol.aromatic)       # [True, True, True, True, True, True]

smi = to_smiles(mol)      # canonical SMILES string

Substructure Search

from smsd import is_substructure, find_substructure, ChemOptions

query  = parse_smiles("c1ccccc1")
target = parse_smiles("c1ccc(O)cc1")

# Boolean check
if is_substructure(query, target):
    print("Query is a substructure of target")

# Get atom mapping
mapping = find_substructure(query, target)
# Returns list of (query_atom, target_atom) pairs
for qi, ti in mapping:
    print(f"  query atom {qi} -> target atom {ti}")

# Custom options
opts = ChemOptions()
opts.ring_matches_ring_only = True
is_substructure(query, target, opts=opts, timeout_ms=5000)

MCS Search

from smsd import find_mcs, ChemOptions, McsOptions

g1 = parse_smiles("CC(=O)Oc1ccccc1C(=O)O")   # aspirin
g2 = parse_smiles("CC(=O)Nc1ccc(O)cc1")       # acetaminophen

# Default MCS
mapping = find_mcs(g1, g2)
print(f"MCS size: {len(mapping)}")

# Tautomer-aware MCS
taut = ChemOptions.tautomer_profile()
mapping = find_mcs(g1, g2, chem=taut)

# With MCS options
mcs_opts = McsOptions()
mcs_opts.timeout_ms = 5000
mcs_opts.connected_only = True
mapping = find_mcs(g1, g2, opts=mcs_opts)

# Convenience wrapper (accepts SMILES strings directly)
from smsd import mcs
mapping = mcs("c1ccccc1", "Cc1ccccc1", tautomer_aware=True)

Similarity and Screening

from smsd import similarity_upper_bound, screen_targets, similarity

g1 = parse_smiles("c1ccccc1")
g2 = parse_smiles("Cc1ccccc1")

# Single pair
sim = similarity_upper_bound(g1, g2)
print(f"Similarity: {sim:.3f}")

# Convenience wrapper (accepts SMILES)
sim = similarity("c1ccccc1", "Cc1ccccc1")

# Batch screening
library = [parse_smiles(s) for s in smiles_list]
query = parse_smiles("c1ccccc1")
hits = screen_targets(query, library, threshold=0.5)
# Returns indices of molecules with similarity >= 0.5

Fingerprints

from smsd import (
    path_fingerprint, mcs_fingerprint,
    fingerprint_subset, analyze_fp_quality,
    fingerprint, tanimoto,
)

mol = parse_smiles("c1ccccc1")

# Path fingerprint (returns set bit positions)
fp = path_fingerprint(mol, path_length=7, fp_size=2048)

# MCS-aware fingerprint
fp_mcs = mcs_fingerprint(mol, path_length=7, fp_size=2048)

# Subset check (for substructure pre-screening)
query_fp = path_fingerprint(parse_smiles("c1ccccc1"))
target_fp = path_fingerprint(parse_smiles("c1ccc(O)cc1"))
assert fingerprint_subset(query_fp, target_fp)

# Quality analysis
quality = analyze_fp_quality(fp)
print(quality)  # {'set_bits': 12, 'density': 0.006, ...}

# Convenience wrappers
fp = fingerprint("CCO", kind="mcs")
sim = tanimoto(fp, fingerprint("CCCO"))

MolGraph Builder

Build molecules directly without SMILES:

from smsd import MolGraphBuilder

builder = MolGraphBuilder(6)  # 6 atoms
for i in range(6):
    builder.atom(i, 6, charge=0, aromatic=True, in_ring=True)
for i in range(6):
    builder.bond(i, (i + 1) % 6, order=1, in_ring=True, aromatic=True)
benzene = builder.build()

Configuration

from smsd import ChemOptions, McsOptions, BondOrderMode, RingFusionMode

# ChemOptions controls atom/bond matching
chem = ChemOptions()
chem.match_atom_type = True
chem.match_formal_charge = True
chem.tautomer_aware = True
chem.complete_rings_only = True
chem.match_bond_order = BondOrderMode.LOOSE
chem.ring_fusion_mode = RingFusionMode.STRICT

# Named profiles
chem = ChemOptions.tautomer_profile()   # tautomer-aware defaults
chem = ChemOptions.profile("strict")    # strict matching

# McsOptions controls MCS algorithm behavior
opts = McsOptions()
opts.connected_only = True
opts.timeout_ms = 10000
opts.maximize_bonds = True  # MCES mode

Running Tests

cd python/
pip install -e ".[dev]"
pytest tests/ -v

License

Apache 2.0. Copyright (c) 2018-2026 BioInception PVT LTD. Algorithm Copyright (c) 2009-2026 Syed Asad Rahman.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

smsd-5.7.0.tar.gz (450.7 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

smsd-5.7.0-cp313-cp313-win_amd64.whl (444.9 kB view details)

Uploaded CPython 3.13Windows x86-64

smsd-5.7.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (503.8 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ x86-64

smsd-5.7.0-cp313-cp313-macosx_11_0_arm64.whl (453.6 kB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

smsd-5.7.0-cp312-cp312-win_amd64.whl (444.9 kB view details)

Uploaded CPython 3.12Windows x86-64

smsd-5.7.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (503.7 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

smsd-5.7.0-cp312-cp312-macosx_11_0_arm64.whl (453.6 kB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

smsd-5.7.0-cp311-cp311-win_amd64.whl (443.4 kB view details)

Uploaded CPython 3.11Windows x86-64

smsd-5.7.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (502.9 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

smsd-5.7.0-cp311-cp311-macosx_11_0_arm64.whl (453.4 kB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

smsd-5.7.0-cp310-cp310-win_amd64.whl (442.7 kB view details)

Uploaded CPython 3.10Windows x86-64

smsd-5.7.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (501.9 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

smsd-5.7.0-cp310-cp310-macosx_11_0_arm64.whl (452.2 kB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

File details

Details for the file smsd-5.7.0.tar.gz.

File metadata

  • Download URL: smsd-5.7.0.tar.gz
  • Upload date:
  • Size: 450.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for smsd-5.7.0.tar.gz
Algorithm Hash digest
SHA256 b5c4d374b83e19d914da23f8ff626df255f3af1e2f20c689e7af91de1123161a
MD5 697db4822f7d824deb9747aecf2f4eef
BLAKE2b-256 e1e607203b371f0ca34e18ff70248be01a595f7513532f13a418e55295d98ee8

See more details on using hashes here.

Provenance

The following attestation bundles were made for smsd-5.7.0.tar.gz:

Publisher: python-publish.yml on asad/SMSD

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file smsd-5.7.0-cp313-cp313-win_amd64.whl.

File metadata

  • Download URL: smsd-5.7.0-cp313-cp313-win_amd64.whl
  • Upload date:
  • Size: 444.9 kB
  • Tags: CPython 3.13, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for smsd-5.7.0-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 55d60778af4f40cb2e7fe67179d71cdfed30343bf2bdc4b62cceaa02817bafbf
MD5 0251e82377a4892e3d42d84fb1b571fe
BLAKE2b-256 63c8fa8f67e29945f31d99ef3738a2caa9ac627b29ee1844c5fe1e9577a54cc1

See more details on using hashes here.

Provenance

The following attestation bundles were made for smsd-5.7.0-cp313-cp313-win_amd64.whl:

Publisher: python-publish.yml on asad/SMSD

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file smsd-5.7.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for smsd-5.7.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 6654830b4a8b26abee83a6a82331b4014e26a0f8dbb85fbf44b70946c8ef8528
MD5 60b1757fe2749e62e0aab09b8e54d6f3
BLAKE2b-256 83d49e21af5046d1c774167f86a885b3ae444165136f3eee51b389526d629d6d

See more details on using hashes here.

Provenance

The following attestation bundles were made for smsd-5.7.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: python-publish.yml on asad/SMSD

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file smsd-5.7.0-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for smsd-5.7.0-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 cdeb32809bbab094746aaca31b7657030183d22f248bdd54902271b855fe1d4e
MD5 af3e1380013eb00fb7c04f7a042281da
BLAKE2b-256 969de3fc8db91b463f1a12d68d36fe511a61ce674709fb121f37e0edc5a24d1c

See more details on using hashes here.

Provenance

The following attestation bundles were made for smsd-5.7.0-cp313-cp313-macosx_11_0_arm64.whl:

Publisher: python-publish.yml on asad/SMSD

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file smsd-5.7.0-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: smsd-5.7.0-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 444.9 kB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for smsd-5.7.0-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 e4dc4648cf87d313683beb73c63efdcd63524c89729f253bf3101ba620587a15
MD5 b5e1c0ee023f9abdb4304cca98d3c339
BLAKE2b-256 ac00830ea150e4a2bde980b141446a90ef72204fa6582544f1cb9572a5ff7ac7

See more details on using hashes here.

Provenance

The following attestation bundles were made for smsd-5.7.0-cp312-cp312-win_amd64.whl:

Publisher: python-publish.yml on asad/SMSD

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file smsd-5.7.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for smsd-5.7.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 b5183d955ed106b208a2318ae20ee152757467ac997670809df4693997ffbdcf
MD5 bc2a288aded5b2e716d7b3fd991e272d
BLAKE2b-256 a5b29d0b5fc23ead2aba1ddd28f2bbf8cbc0dc739b96d8709921bd45bdef2923

See more details on using hashes here.

Provenance

The following attestation bundles were made for smsd-5.7.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: python-publish.yml on asad/SMSD

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file smsd-5.7.0-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for smsd-5.7.0-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 69d280e9b9c3ae21e148f0d58c79ffebbab3aa112aec76325259dc60a24e09ba
MD5 aaf1612b472fea939d6bc051eb305019
BLAKE2b-256 c492fa8a3ec592ee39b202f7855e402aa86233fc712f3c20c2f5949fa4d8228b

See more details on using hashes here.

Provenance

The following attestation bundles were made for smsd-5.7.0-cp312-cp312-macosx_11_0_arm64.whl:

Publisher: python-publish.yml on asad/SMSD

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file smsd-5.7.0-cp311-cp311-win_amd64.whl.

File metadata

  • Download URL: smsd-5.7.0-cp311-cp311-win_amd64.whl
  • Upload date:
  • Size: 443.4 kB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for smsd-5.7.0-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 d3e3f87bfb6642050b8c223bd1ffeb914b9926a8486400c601bf9082f4793714
MD5 a7d8dc2e192e73b6d62944e83e67fa32
BLAKE2b-256 11cf4b28e91f94d448d37c1da3166f4aeb497e93815b42a52d4ae28847cad078

See more details on using hashes here.

Provenance

The following attestation bundles were made for smsd-5.7.0-cp311-cp311-win_amd64.whl:

Publisher: python-publish.yml on asad/SMSD

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file smsd-5.7.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for smsd-5.7.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 2bc753c6ca320cfe4f411277b1a95e69a8d3380bb640a8c3863aa007cfe8abdd
MD5 ada8214df7d6d063bce54c9307e0091a
BLAKE2b-256 65ff6b5a12b45a299f2c3b00c42fa6d16f03662bd65e88b92ae2f2b98716baa4

See more details on using hashes here.

Provenance

The following attestation bundles were made for smsd-5.7.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: python-publish.yml on asad/SMSD

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file smsd-5.7.0-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for smsd-5.7.0-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 4989b32d93b13c1688dd71dd45894faf73499325bade5d15ca4cb1355e556117
MD5 3769f124d65920d51f04f451b9090b80
BLAKE2b-256 43089b843f3c9ae2cc4db47c22bcfc03a42ea4610852bb15807ba05fd2a086f3

See more details on using hashes here.

Provenance

The following attestation bundles were made for smsd-5.7.0-cp311-cp311-macosx_11_0_arm64.whl:

Publisher: python-publish.yml on asad/SMSD

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file smsd-5.7.0-cp310-cp310-win_amd64.whl.

File metadata

  • Download URL: smsd-5.7.0-cp310-cp310-win_amd64.whl
  • Upload date:
  • Size: 442.7 kB
  • Tags: CPython 3.10, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for smsd-5.7.0-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 582513105266766a1e204d5d645f5e4a6d03ee86daf3c0f46e24a113fd9c50b7
MD5 e6a72e3e3dc461e17519e2a331b6ee8f
BLAKE2b-256 266e731d993a47f85ed8484041b22f4a41d90dffdb82ab63b3b35fa705fe82e5

See more details on using hashes here.

Provenance

The following attestation bundles were made for smsd-5.7.0-cp310-cp310-win_amd64.whl:

Publisher: python-publish.yml on asad/SMSD

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file smsd-5.7.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for smsd-5.7.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 f30091a8b12d9405b74f903cee64ad6a9fbdf6a5e9ae051ce2082bde01e7f5f2
MD5 bc65d6ddf9bb091dadfb4f49a118fdbd
BLAKE2b-256 0571009564383fe64776cb670b9e21eb42426474df2319ae6d3b22b96b683207

See more details on using hashes here.

Provenance

The following attestation bundles were made for smsd-5.7.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: python-publish.yml on asad/SMSD

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file smsd-5.7.0-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for smsd-5.7.0-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 1a0b71264d9a389a3b2b4a6e32e6997069a1f5200db629dc87222f92d88e44bf
MD5 31d320f2e14a758200c60d927cf1e2f7
BLAKE2b-256 8ce7a01b2478df51f3200959f9f191c70652359d04f4cb12c58dab416a0bb323

See more details on using hashes here.

Provenance

The following attestation bundles were made for smsd-5.7.0-cp310-cp310-macosx_11_0_arm64.whl:

Publisher: python-publish.yml on asad/SMSD

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page