Python bindings for COSMolKit
Project description
COSMolKit Python
COSMolKit is a Python package for molecule graph workflows, SMILES/SDF IO, coordinate access, Morgan fingerprints, molecule depiction, and high-throughput batch processing.
Current Python documentation: https://kit.cosmol.org/
API Model: Copy-On-Write (COW) Molecule Values
COSMolKit's Python API uses copy-on-write (COW) value semantics for molecule
transforms. Methods such as with_hydrogens(), without_hydrogens(),
with_kekulized_bonds(), and with_2d_coords() return a new Molecule; they
do not mutate the original object in place.
This is intentionally different from common RDKit Python workflows, where many
operations mutate an RWMol or update a molecule object directly. In COSMolKit,
keep the returned value:
mol = Molecule.from_smiles("CCO")
mol_h = mol.with_hydrogens()
assert mol is not mol_h
print(mol.to_smiles())
print(mol_h.to_smiles())
Installation
pip install cosmolkit
Quick Start
from cosmolkit import Molecule
mol = Molecule.from_smiles("c1ccccc1O")
drawn = mol.with_2d_coords()
print(mol.to_smiles())
print(drawn.atoms()[0])
drawn.write_png("phenol.png", width=400, height=300)
Morgan fingerprints are exposed as RDKit-style sparse bit vectors. The
on_bits() output is a list of bit indexes set to 1 inside a fixed-length
binary vector, not a dense neural embedding:
fp = mol.fingerprint_morgan(radius=2, n_bits=2048)
print(fp.n_bits())
print(fp.on_bits())
print(fp.tanimoto(Molecule.from_smiles("c1ccccc1").fingerprint_morgan()))
Additional output mirrors RDKit's Morgan provenance helpers:
result = mol.fingerprint_morgan_with_output(radius=2, n_bits=2048)
info = result.additional_output()
print(result.fingerprint().on_bits())
print(info.atom_counts())
print(info.bit_info_map())
Chiral tags are available directly on atoms, so code that works with the SMILES/RDKit-style CW and CCW path does not need to switch to the ordered tetrahedral view:
chiral = Molecule.from_smiles("F[C@H](Cl)Br")
print(chiral.to_smiles())
print(chiral.to_smiles(isomeric_smiles=False))
for atom in chiral.atoms():
if atom.chiral_tag() != "CHI_UNSPECIFIED":
print(atom.idx(), atom.chiral_tag())
Use Molecule.edit() when you want an explicit editing workflow:
editor = mol.edit()
cl = editor.add_atom("Cl")
editor.add_bond(0, cl, order="single")
mol2 = editor.commit()
Batch Workflows
from cosmolkit import MoleculeBatch
smiles = ["CCO", "c1ccccc1", "not-smiles"]
batch = MoleculeBatch.from_smiles_list(smiles, errors="keep")
prepared = batch.add_hydrogens(errors="keep").compute_2d_coords(errors="keep")
report = prepared.to_images("molecule_images", format="png", errors="skip")
fingerprints = prepared.fingerprint_morgan_list(n_bits=2048, n_jobs=8)
print(prepared.valid_mask())
print(prepared.errors())
print([fp.on_bits() if fp is not None else None for fp in fingerprints])
print(report)
The errors option controls invalid records:
errors="raise"raises on the first batch validation failure.errors="keep"preserves failed records and exposes structured errors.errors="skip"omits failed records from the returned result or export.
Batch SMILES output accepts formatting options:
canonical = prepared.to_smiles_list(canonical=True)
explicit = prepared.to_smiles_list(
all_bonds_explicit=True,
all_hs_explicit=True,
)
without_maps = prepared.to_smiles_list(ignore_atom_map_numbers=True)
SDF and Arrays
mol = Molecule.from_smiles("CCO").with_2d_coords()
sdf_text = mol.to_sdf_string(format="v2000")
restored = Molecule.read_sdf_record_from_str(sdf_text, coordinate_dim="2d")
coords = restored.coords_2d()
bounds = restored.dg_bounds_matrix()
print(coords.shape)
print(bounds.shape)
Main Features
- SMILES parsing and writing with
Molecule.from_smiles()andto_smiles() - copy-on-write molecule value semantics for transforms
- SDF file and string IO
- atom and bond feature inspection
- hydrogen add/remove transforms
- Kekule bond representation
- CW/CCW chiral tags, chiral centers, and tetrahedral stereo inspection
- 2D coordinate generation and NumPy coordinate arrays
- distance-geometry bounds matrix export
- Morgan fingerprint bit vectors, Tanimoto similarity, and AdditionalOutput
- SVG and PNG molecule depictions
- ordered batch construction, transformation, filtering, and export
- batch Morgan fingerprint generation with Rust-side parallel scheduling
- explicit molecule editing with
Molecule.edit()
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cosmolkit-0.1.1.tar.gz.
File metadata
- Download URL: cosmolkit-0.1.1.tar.gz
- Upload date:
- Size: 519.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e10db27685f8e7b520af63cdc4317e18db4770c71f5350614d699ff3b04b04a8
|
|
| MD5 |
ef6fba97569aec94953d054e6eff8b46
|
|
| BLAKE2b-256 |
3c8a2aa070bc3e67fe8cdc3cf129d6921c3297e8ce7394762cb13e186b611757
|
File details
Details for the file cosmolkit-0.1.1-cp39-abi3-win_amd64.whl.
File metadata
- Download URL: cosmolkit-0.1.1-cp39-abi3-win_amd64.whl
- Upload date:
- Size: 2.3 MB
- Tags: CPython 3.9+, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
886b9ebb17281f65b578857b2c124fb54b3637e6eb87a5e7920b0448a2a8feb0
|
|
| MD5 |
ac371833c986e4f4dbc4673d0c339a68
|
|
| BLAKE2b-256 |
0a0586d231d7b696b75982f34667cacaf81f9d3ed1d9779b44346c0d95c86bb3
|
File details
Details for the file cosmolkit-0.1.1-cp39-abi3-manylinux_2_34_x86_64.whl.
File metadata
- Download URL: cosmolkit-0.1.1-cp39-abi3-manylinux_2_34_x86_64.whl
- Upload date:
- Size: 2.5 MB
- Tags: CPython 3.9+, manylinux: glibc 2.34+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cadf02971f28925d868ff72d3eb2c8227aaf2bb9d6a28db0d583029b646c23de
|
|
| MD5 |
f00b96005fe717ee68a8ca1821f1276b
|
|
| BLAKE2b-256 |
51d2de841ca4d77b81bfef5b1a32a38e51fd2be154fd75ea9dea4139912201a7
|
File details
Details for the file cosmolkit-0.1.1-cp39-abi3-macosx_11_0_arm64.whl.
File metadata
- Download URL: cosmolkit-0.1.1-cp39-abi3-macosx_11_0_arm64.whl
- Upload date:
- Size: 2.2 MB
- Tags: CPython 3.9+, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a9ed4209aacb7e0cc24f6b43ca90df7782b0e302eeeaaa27ef75bdafccb1203c
|
|
| MD5 |
18b2efc341d3b7d01a7321013a860fbc
|
|
| BLAKE2b-256 |
6908f5ca4d442a0fd890cb87df17236f4e542ee21f840fe650a073126c92e39f
|