Skip to main content

Python bindings for COSMolKit

Project description

COSMolKit Python

COSMolKit is a Python package for molecule graph workflows, SMILES/SDF IO, coordinate access, Morgan fingerprints, molecule depiction, and high-throughput batch processing.

Current Python documentation: https://kit.cosmol.org/

API Model: Copy-On-Write (COW) Molecule Values

COSMolKit's Python API uses copy-on-write (COW) value semantics for molecule transforms. Methods such as with_hydrogens(), without_hydrogens(), with_kekulized_bonds(), and with_2d_coords() return a new Molecule; they do not mutate the original object in place.

This is intentionally different from common RDKit Python workflows, where many operations mutate an RWMol or update a molecule object directly. In COSMolKit, keep the returned value:

mol = Molecule.from_smiles("CCO")
mol_h = mol.with_hydrogens()

assert mol is not mol_h
print(mol.to_smiles())
print(mol_h.to_smiles())

Installation

pip install cosmolkit

Quick Start

from cosmolkit import Molecule

mol = Molecule.from_smiles("c1ccccc1O")
drawn = mol.with_2d_coords()

print(mol.to_smiles())
print(drawn.atoms()[0])

drawn.write_png("phenol.png", width=400, height=300)

Morgan fingerprints are exposed as RDKit-style sparse bit vectors. The on_bits() output is a list of bit indexes set to 1 inside a fixed-length binary vector, not a dense neural embedding:

fp = mol.fingerprint_morgan(radius=2, n_bits=2048)

print(fp.n_bits())
print(fp.on_bits())
print(fp.tanimoto(Molecule.from_smiles("c1ccccc1").fingerprint_morgan()))

Additional output mirrors RDKit's Morgan provenance helpers:

result = mol.fingerprint_morgan_with_output(radius=2, n_bits=2048)
info = result.additional_output()

print(result.fingerprint().on_bits())
print(info.atom_counts())
print(info.bit_info_map())

Chiral tags and bond orders are Python IntEnum values, so code can compare against typed constants instead of strings:

from cosmolkit import BondOrder, ChiralTag, Molecule

chiral = Molecule.from_smiles("F[C@H](Cl)Br")
print(chiral.to_smiles())
print(chiral.to_smiles(isomeric_smiles=False))

for atom in chiral.atoms():
    if atom.chiral_tag() != ChiralTag.CHI_UNSPECIFIED:
        print(atom.idx(), atom.chiral_tag().name)

for bond in Molecule.from_smiles("C=C").bonds():
    if bond.bond_type() == BondOrder.DOUBLE:
        print("double bond:", bond.begin_atom_idx(), bond.end_atom_idx())

Read-only maps such as BOND_ORDER_MAP and CHIRAL_TAG_MAP convert external string names to the enum members when needed.

Use Molecule.edit() when you want an explicit editing workflow:

editor = mol.edit()
cl = editor.add_atom("Cl")
editor.add_bond(0, cl, order="single")
mol2 = editor.commit()

Batch Workflows

from cosmolkit import BatchErrorMode, BatchErrorType, MoleculeBatch

smiles = ["CCO", "c1ccccc1", "not-smiles"]
batch = MoleculeBatch.from_smiles_list(
    smiles,
    errors=BatchErrorMode.KEEP,
).with_parallel_jobs(8)

for error in batch.errors():
    if error.error_type() == BatchErrorType.SMILES_PARSE:
        print(error.index(), error.message())

prepared = batch.add_hydrogens(errors=BatchErrorMode.KEEP).compute_2d_coords(
    errors=BatchErrorMode.KEEP,
)
report = prepared.to_images(
    "molecule_images",
    format="png",
    errors=BatchErrorMode.SKIP,
    filenames=["ethanol", "benzene", "invalid"],
)
sdf_files = prepared.to_sdf_files(
    "molecule_sdf_records",
    format="v2000",
    errors=BatchErrorMode.SKIP,
    filenames=["ethanol", "benzene", "invalid"],
)
fingerprints = prepared.fingerprint_morgan_list(n_bits=2048)

print(prepared.valid_mask())
print(prepared.errors())
print([fp.on_bits() if fp is not None else None for fp in fingerprints])
print([mol.to_smiles() if mol is not None else None for mol in prepared])
print(report)
print(sdf_files)

The errors option controls invalid records:

  • errors="raise" raises on the first batch validation failure.
  • errors="keep" preserves failed records and exposes structured errors.
  • errors="skip" omits failed records from the returned result or export.

Batch SMILES output accepts formatting options:

canonical = prepared.to_smiles_list(canonical=True)
explicit = prepared.to_smiles_list(
    all_bonds_explicit=True,
    all_hs_explicit=True,
)
without_maps = prepared.to_smiles_list(ignore_atom_map_numbers=True)

Single-molecule SMILES output accepts the same writer options:

benzene = Molecule.from_smiles("c1ccccc1")
ethanol = Molecule.from_smiles("CCO")

print(benzene.to_smiles(kekule=True))
print(ethanol.to_smiles(all_bonds_explicit=True))
print(ethanol.to_smiles(canonical=False, rooted_at_atom=2))

with_parallel_jobs() configures the default worker count for later batch operations while keeping the batch value-style. Method-level n_jobs can still override it for one call, and prepared.to_list() returns list[Molecule | None] when a Python list is more convenient than iteration.

Directory exports accept optional filenames lists. Entries are aligned with the batch records, None keeps the default numbered filename, and missing extensions are filled from the selected output format.

SDF and Arrays

mol = Molecule.from_smiles("CCO").with_2d_coords()

sdf_text = mol.to_sdf_string(format="v2000")
restored = Molecule.read_sdf_record_from_str(sdf_text, coordinate_dim="2d")

coords = restored.coords_2d()
bounds = restored.dg_bounds_matrix()

print(coords.shape)
print(bounds.shape)

Main Features

  • SMILES parsing and writing with Molecule.from_smiles() and to_smiles()
  • RDKit-style SMILES writer options on single molecules and batches
  • copy-on-write molecule value semantics for transforms
  • SDF file and string IO
  • atom and bond feature inspection
  • hydrogen add/remove transforms
  • Kekule bond representation
  • CW/CCW chiral tags, chiral centers, and tetrahedral stereo inspection
  • 2D coordinate generation and NumPy coordinate arrays
  • distance-geometry bounds matrix export
  • Morgan fingerprint bit vectors, Tanimoto similarity, and AdditionalOutput
  • SVG and PNG molecule depictions
  • ordered batch construction, transformation, filtering, and export
  • batch Morgan fingerprint generation with Rust-side parallel scheduling
  • batch-level default parallelism with with_parallel_jobs()
  • Python-style batch iteration, slicing, integer-list selection, and boolean-mask selection
  • custom per-record filenames for batch image and SDF directory exports
  • explicit molecule editing with Molecule.edit()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cosmolkit-0.1.2.tar.gz (532.2 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

cosmolkit-0.1.2-cp39-abi3-win_amd64.whl (2.3 MB view details)

Uploaded CPython 3.9+Windows x86-64

cosmolkit-0.1.2-cp39-abi3-manylinux_2_34_x86_64.whl (2.5 MB view details)

Uploaded CPython 3.9+manylinux: glibc 2.34+ x86-64

cosmolkit-0.1.2-cp39-abi3-macosx_11_0_arm64.whl (2.3 MB view details)

Uploaded CPython 3.9+macOS 11.0+ ARM64

File details

Details for the file cosmolkit-0.1.2.tar.gz.

File metadata

  • Download URL: cosmolkit-0.1.2.tar.gz
  • Upload date:
  • Size: 532.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for cosmolkit-0.1.2.tar.gz
Algorithm Hash digest
SHA256 d022992e4f1473d1aa9e14a663890896f332f2ac569f8027937d64c0d17520e9
MD5 56378dcd9bb813237170add2111c20be
BLAKE2b-256 210abffcd035b3de687573287953ecd4656df7eb0aa6c347ae58450a91390e62

See more details on using hashes here.

File details

Details for the file cosmolkit-0.1.2-cp39-abi3-win_amd64.whl.

File metadata

  • Download URL: cosmolkit-0.1.2-cp39-abi3-win_amd64.whl
  • Upload date:
  • Size: 2.3 MB
  • Tags: CPython 3.9+, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for cosmolkit-0.1.2-cp39-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 01a6f4f4a9b4aef1f509c9ebafcfe12b4560f1805f54a161b3c73549bd2c6214
MD5 cbbf36d2ba57b7908fb4f406c39c6b7c
BLAKE2b-256 319b1b7dd9af5b30af2edc4174c5a072b07fbb60d284d43c29d3a08164aa1a0b

See more details on using hashes here.

File details

Details for the file cosmolkit-0.1.2-cp39-abi3-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for cosmolkit-0.1.2-cp39-abi3-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 f159dc737feb93e87f8a57e6fef0d7edc681f27a82c7bbb92cd9a408ae2619a1
MD5 e8cba7159c403a5478409f7f97740ccc
BLAKE2b-256 bc89991a0fb35fb06182b6badbf61e13265a242b9f82cc6e9e5249f159749051

See more details on using hashes here.

File details

Details for the file cosmolkit-0.1.2-cp39-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cosmolkit-0.1.2-cp39-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 4efba4ec2e0f3178da8b33b93229ee70ae4fd00e7baca9ab002aa8bf799bf09b
MD5 7fe25d9c6a610e2e7943979969624269
BLAKE2b-256 bc25eba4be6744fc90d2a91399b03a5dd7dcfb85ee9805cf96025c7c535bb182

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page