Skip to main content

Python bindings for COSMolKit

Project description

COSMolKit Python

COSMolKit is a Python package for molecule graph workflows, SMILES/SDF/MOL IO, coordinate access, Morgan fingerprints, molecule depiction, and high-throughput batch processing.

Current Python documentation: https://kit.cosmol.org/

Current note: COSMolKit already preserves supported MOL/SDF query semantics internally in Rust, but the Python package does not yet expose a public query AST or query-matching API. That surface is still pending design.

API Model: Copy-On-Write (COW) Molecule Values

COSMolKit's Python API uses copy-on-write (COW) value semantics for molecule transforms. Methods such as with_hydrogens(), without_hydrogens(), with_kekulized_bonds(), and with_2d_coords() return a new Molecule; they do not mutate the original object in place.

This is intentionally different from common RDKit Python workflows, where many operations mutate an RWMol or update a molecule object directly. In COSMolKit, keep the returned value:

mol = Molecule.from_smiles("CCO")
mol_h = mol.with_hydrogens()

assert mol is not mol_h
print(mol.to_smiles())
print(mol_h.to_smiles())

Installation

pip install cosmolkit

Quick Start

from cosmolkit import Molecule

mol = Molecule.from_smiles("c1ccccc1O")
drawn = mol.with_2d_coords()

print(mol.to_smiles())
print(drawn.atoms()[0])

drawn.write_png("python/examples/output/phenol.png", width=400, height=300)

Morgan fingerprints are exposed as RDKit-style sparse bit vectors. The on_bits() output is a list of bit indexes set to 1 inside a fixed-length binary vector, not a dense neural embedding:

fp = mol.fingerprint_morgan(radius=2, n_bits=2048)

print(fp.n_bits())
print(fp.on_bits())
print(fp.tanimoto(Molecule.from_smiles("c1ccccc1").fingerprint_morgan()))

Additional output mirrors RDKit's Morgan provenance helpers:

result = mol.fingerprint_morgan_with_output(radius=2, n_bits=2048)
info = result.additional_output()

print(result.fingerprint().on_bits())
print(info.atom_counts())
print(info.bit_info_map())

Chiral tags and bond orders are Python IntEnum values, so code can compare against typed constants instead of strings:

from cosmolkit import BondOrder, ChiralTag, Molecule

chiral = Molecule.from_smiles("F[C@H](Cl)Br")
print(chiral.to_smiles())
print(chiral.to_smiles(isomeric_smiles=False))

for atom in chiral.atoms():
    if atom.chiral_tag() != ChiralTag.CHI_UNSPECIFIED:
        print(atom.idx(), atom.chiral_tag().name)

for bond in Molecule.from_smiles("C=C").bonds():
    if bond.bond_type() == BondOrder.DOUBLE:
        print("double bond:", bond.begin_atom_idx(), bond.end_atom_idx())

Read-only maps such as BOND_ORDER_MAP and CHIRAL_TAG_MAP convert external string names to the enum members when needed.

Use Molecule.edit() when you want an explicit editing workflow:

editor = mol.edit()
cl = editor.add_atom("Cl")
editor.add_bond(0, cl, order="single")
mol2 = editor.commit()

Batch Workflows

from cosmolkit import BatchErrorMode, BatchErrorType, MoleculeBatch

smiles = ["CCO", "c1ccccc1", "not-smiles"]
batch = MoleculeBatch.from_smiles_list(
    smiles,
    errors=BatchErrorMode.KEEP,
).with_parallel_jobs(8)

for error in batch.errors():
    if error.error_type() == BatchErrorType.SMILES_PARSE:
        print(error.index(), error.message())

prepared = batch.add_hydrogens(errors=BatchErrorMode.KEEP).compute_2d_coords(
    errors=BatchErrorMode.KEEP,
)
report = prepared.to_images(
    "python/examples/output/molecule_images",
    format="png",
    errors=BatchErrorMode.SKIP,
    filenames=["ethanol", "benzene", "invalid"],
)
sdf_files = prepared.to_sdf_files(
    "python/examples/output/molecule_sdf_records",
    format="v2000",
    errors=BatchErrorMode.SKIP,
    filenames=["ethanol", "benzene", "invalid"],
)
fingerprints = prepared.fingerprint_morgan_list(n_bits=2048)

print(prepared.valid_mask())
print(prepared.errors())
print([fp.on_bits() if fp is not None else None for fp in fingerprints])
print([mol.to_smiles() if mol is not None else None for mol in prepared])
print(report)
print(sdf_files)

The errors option controls invalid records:

  • errors="raise" raises on the first batch validation failure.
  • errors="keep" preserves failed records and exposes structured errors.
  • errors="skip" omits failed records from the returned result or export.

Batch SMILES output accepts formatting options:

canonical = prepared.to_smiles_list(canonical=True)
explicit = prepared.to_smiles_list(
    all_bonds_explicit=True,
    all_hs_explicit=True,
)
without_maps = prepared.to_smiles_list(ignore_atom_map_numbers=True)

Single-molecule SMILES output accepts the same writer options:

benzene = Molecule.from_smiles("c1ccccc1")
ethanol = Molecule.from_smiles("CCO")

print(benzene.to_smiles(kekule=True))
print(ethanol.to_smiles(all_bonds_explicit=True))
print(ethanol.to_smiles(canonical=False, rooted_at_atom=2))

with_parallel_jobs() configures the default worker count for later batch operations while keeping the batch value-style. Method-level n_jobs can still override it for one call, and prepared.to_list() returns list[Molecule | None] when a Python list is more convenient than iteration.

with_progress_bar(True) configures Rust-side progress bars for later batch operations. Method-level progress_bar=True or False overrides the batch default for a single call, and the progress output is written to stderr.

Directory exports accept optional filenames lists. Entries are aligned with the batch records, None keeps the default numbered filename, and missing extensions are filled from the selected output format.

SDF and Arrays

mol = Molecule.from_smiles("CCO").with_2d_coords()

sdf_text = mol.to_2d_sdf_string(format="v2000", include_stereo=True, kekulize=True)
restored = Molecule.read_sdf_from_str(sdf_text, coordinate_dim="2d")

coords = restored.coords_2d()
bounds = restored.dg_bounds_matrix()

print(coords.shape)
print(bounds.shape)

Main Features

  • SMILES parsing and writing with Molecule.from_smiles() and to_smiles()
  • RDKit-style SMILES writer options on single molecules and batches
  • copy-on-write molecule value semantics for transforms
  • SDF and MOL file/string IO
  • atom and bond feature inspection
  • hydrogen add/remove transforms
  • Kekule bond representation
  • CW/CCW chiral tags, chiral centers, and tetrahedral stereo inspection
  • 2D coordinate generation and NumPy coordinate arrays
  • distance-geometry bounds matrix export
  • Morgan fingerprint bit vectors, Tanimoto similarity, and AdditionalOutput
  • SVG and PNG molecule depictions
  • ordered batch construction, transformation, filtering, and export
  • batch Morgan fingerprint generation with Rust-side parallel scheduling
  • batch-level default parallelism with with_parallel_jobs()
  • Python-style batch iteration, slicing, integer-list selection, and boolean-mask selection
  • custom per-record filenames for batch image and SDF directory exports
  • explicit molecule editing with Molecule.edit()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cosmolkit-0.1.3.tar.gz (551.6 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

cosmolkit-0.1.3-cp39-abi3-win_amd64.whl (2.4 MB view details)

Uploaded CPython 3.9+Windows x86-64

cosmolkit-0.1.3-cp39-abi3-manylinux_2_34_x86_64.whl (2.6 MB view details)

Uploaded CPython 3.9+manylinux: glibc 2.34+ x86-64

cosmolkit-0.1.3-cp39-abi3-macosx_11_0_arm64.whl (2.3 MB view details)

Uploaded CPython 3.9+macOS 11.0+ ARM64

File details

Details for the file cosmolkit-0.1.3.tar.gz.

File metadata

  • Download URL: cosmolkit-0.1.3.tar.gz
  • Upload date:
  • Size: 551.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for cosmolkit-0.1.3.tar.gz
Algorithm Hash digest
SHA256 a0b3a3096687054ca908b49cccfb1813a32cf26841091015c300b3beb84626e9
MD5 53f3c649acbd53b13bc0174670b4a1e7
BLAKE2b-256 67e436b7de49e53f806b89afda65457cfd2d8bde4e495d456ca07446f736c393

See more details on using hashes here.

File details

Details for the file cosmolkit-0.1.3-cp39-abi3-win_amd64.whl.

File metadata

  • Download URL: cosmolkit-0.1.3-cp39-abi3-win_amd64.whl
  • Upload date:
  • Size: 2.4 MB
  • Tags: CPython 3.9+, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for cosmolkit-0.1.3-cp39-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 535415394eceab2d5732b18b32c792302c3885ee03e308df572e1a3bfe0136f6
MD5 d93a428a5c7855b067b2e4eaf082254d
BLAKE2b-256 73d7c154551634db51ed36e8436946379bb631dc227f1624b0aa7bfa50e01c8b

See more details on using hashes here.

File details

Details for the file cosmolkit-0.1.3-cp39-abi3-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for cosmolkit-0.1.3-cp39-abi3-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 52e73e1618aef74e2047e3f9971f9c5c199ee216a89fbb47c8bff23ab8b68269
MD5 0321d54bb9414da73927c9dc7829b1e3
BLAKE2b-256 8adea1396686fec494e6a379b8a06ae9ac8d22b9bc5007a875cbe1d2cd1e97e7

See more details on using hashes here.

File details

Details for the file cosmolkit-0.1.3-cp39-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cosmolkit-0.1.3-cp39-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 b0be71f83bd1537f8bf7d5ba4be36053777660a2274f1df8f36c15d9d76d6100
MD5 68bbcd9754e1ba4522833a4e37797613
BLAKE2b-256 c5d86739412b9ab7476fcb4db246dc540d5efc603e383174c2c1c607659dccfd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page