Skip to main content

Python bindings for COSMolKit

Project description

COSMolKit Python

COSMolKit is a Python package for molecule graph workflows, SMILES/SDF IO, coordinate access, Morgan fingerprints, molecule depiction, and high-throughput batch processing.

Current Python documentation: https://kit.cosmol.org/

API Model: Copy-On-Write (COW) Molecule Values

COSMolKit's Python API uses copy-on-write (COW) value semantics for molecule transforms. Methods such as with_hydrogens(), without_hydrogens(), with_kekulized_bonds(), and with_2d_coords() return a new Molecule; they do not mutate the original object in place.

This is intentionally different from common RDKit Python workflows, where many operations mutate an RWMol or update a molecule object directly. In COSMolKit, keep the returned value:

mol = Molecule.from_smiles("CCO")
mol_h = mol.with_hydrogens()

assert mol is not mol_h
print(mol.to_smiles())
print(mol_h.to_smiles())

Installation

pip install cosmolkit

Quick Start

from cosmolkit import Molecule

mol = Molecule.from_smiles("c1ccccc1O")
drawn = mol.with_2d_coords()

print(mol.to_smiles())
print(drawn.atoms()[0])

drawn.write_png("phenol.png", width=400, height=300)

Morgan fingerprints are exposed as RDKit-style sparse bit vectors. The on_bits() output is a list of bit indexes set to 1 inside a fixed-length binary vector, not a dense neural embedding:

fp = mol.fingerprint_morgan(radius=2, n_bits=2048)

print(fp.n_bits())
print(fp.on_bits())
print(fp.tanimoto(Molecule.from_smiles("c1ccccc1").fingerprint_morgan()))

Additional output mirrors RDKit's Morgan provenance helpers:

result = mol.fingerprint_morgan_with_output(radius=2, n_bits=2048)
info = result.additional_output()

print(result.fingerprint().on_bits())
print(info.atom_counts())
print(info.bit_info_map())

Chiral tags are available directly on atoms, so code that works with the SMILES/RDKit-style CW and CCW path does not need to switch to the ordered tetrahedral view:

chiral = Molecule.from_smiles("F[C@H](Cl)Br")

print(chiral.to_smiles())
print(chiral.to_smiles(isomeric_smiles=False))

for atom in chiral.atoms():
    if atom.chiral_tag() != "CHI_UNSPECIFIED":
        print(atom.idx(), atom.chiral_tag())

Use Molecule.edit() when you want an explicit editing workflow:

editor = mol.edit()
cl = editor.add_atom("Cl")
editor.add_bond(0, cl, order="single")
mol2 = editor.commit()

Batch Workflows

from cosmolkit import MoleculeBatch

smiles = ["CCO", "c1ccccc1", "not-smiles"]
batch = MoleculeBatch.from_smiles_list(smiles, errors="keep")

prepared = batch.add_hydrogens(errors="keep").compute_2d_coords(errors="keep")
report = prepared.to_images("molecule_images", format="png", errors="skip")
fingerprints = prepared.fingerprint_morgan_list(n_bits=2048, n_jobs=8)

print(prepared.valid_mask())
print(prepared.errors())
print([fp.on_bits() if fp is not None else None for fp in fingerprints])
print(report)

The errors option controls invalid records:

  • errors="raise" raises on the first batch validation failure.
  • errors="keep" preserves failed records and exposes structured errors.
  • errors="skip" omits failed records from the returned result or export.

Batch SMILES output accepts formatting options:

canonical = prepared.to_smiles_list(canonical=True)
explicit = prepared.to_smiles_list(
    all_bonds_explicit=True,
    all_hs_explicit=True,
)
without_maps = prepared.to_smiles_list(ignore_atom_map_numbers=True)

SDF and Arrays

mol = Molecule.from_smiles("CCO").with_2d_coords()

sdf_text = mol.to_sdf_string(format="v2000")
restored = Molecule.read_sdf_record_from_str(sdf_text, coordinate_dim="2d")

coords = restored.coords_2d()
bounds = restored.dg_bounds_matrix()

print(coords.shape)
print(bounds.shape)

Main Features

  • SMILES parsing and writing with Molecule.from_smiles() and to_smiles()
  • copy-on-write molecule value semantics for transforms
  • SDF file and string IO
  • atom and bond feature inspection
  • hydrogen add/remove transforms
  • Kekule bond representation
  • CW/CCW chiral tags, chiral centers, and tetrahedral stereo inspection
  • 2D coordinate generation and NumPy coordinate arrays
  • distance-geometry bounds matrix export
  • Morgan fingerprint bit vectors, Tanimoto similarity, and AdditionalOutput
  • SVG and PNG molecule depictions
  • ordered batch construction, transformation, filtering, and export
  • batch Morgan fingerprint generation with Rust-side parallel scheduling
  • explicit molecule editing with Molecule.edit()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cosmolkit-0.1.1.tar.gz (519.7 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

cosmolkit-0.1.1-cp39-abi3-win_amd64.whl (2.3 MB view details)

Uploaded CPython 3.9+Windows x86-64

cosmolkit-0.1.1-cp39-abi3-manylinux_2_34_x86_64.whl (2.5 MB view details)

Uploaded CPython 3.9+manylinux: glibc 2.34+ x86-64

cosmolkit-0.1.1-cp39-abi3-macosx_11_0_arm64.whl (2.2 MB view details)

Uploaded CPython 3.9+macOS 11.0+ ARM64

File details

Details for the file cosmolkit-0.1.1.tar.gz.

File metadata

  • Download URL: cosmolkit-0.1.1.tar.gz
  • Upload date:
  • Size: 519.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for cosmolkit-0.1.1.tar.gz
Algorithm Hash digest
SHA256 e10db27685f8e7b520af63cdc4317e18db4770c71f5350614d699ff3b04b04a8
MD5 ef6fba97569aec94953d054e6eff8b46
BLAKE2b-256 3c8a2aa070bc3e67fe8cdc3cf129d6921c3297e8ce7394762cb13e186b611757

See more details on using hashes here.

File details

Details for the file cosmolkit-0.1.1-cp39-abi3-win_amd64.whl.

File metadata

  • Download URL: cosmolkit-0.1.1-cp39-abi3-win_amd64.whl
  • Upload date:
  • Size: 2.3 MB
  • Tags: CPython 3.9+, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for cosmolkit-0.1.1-cp39-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 886b9ebb17281f65b578857b2c124fb54b3637e6eb87a5e7920b0448a2a8feb0
MD5 ac371833c986e4f4dbc4673d0c339a68
BLAKE2b-256 0a0586d231d7b696b75982f34667cacaf81f9d3ed1d9779b44346c0d95c86bb3

See more details on using hashes here.

File details

Details for the file cosmolkit-0.1.1-cp39-abi3-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for cosmolkit-0.1.1-cp39-abi3-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 cadf02971f28925d868ff72d3eb2c8227aaf2bb9d6a28db0d583029b646c23de
MD5 f00b96005fe717ee68a8ca1821f1276b
BLAKE2b-256 51d2de841ca4d77b81bfef5b1a32a38e51fd2be154fd75ea9dea4139912201a7

See more details on using hashes here.

File details

Details for the file cosmolkit-0.1.1-cp39-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for cosmolkit-0.1.1-cp39-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 a9ed4209aacb7e0cc24f6b43ca90df7782b0e302eeeaaa27ef75bdafccb1203c
MD5 18b2efc341d3b7d01a7321013a860fbc
BLAKE2b-256 6908f5ca4d442a0fd890cb87df17236f4e542ee21f840fe650a073126c92e39f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page