Python bindings for COSMolKit
Project description
COSMolKit Python
COSMolKit is a Python package for molecule graph workflows, SMILES/SDF IO, coordinate access, Morgan fingerprints, molecule depiction, and high-throughput batch processing.
Current Python documentation: https://kit.cosmol.org/
API Model: Copy-On-Write (COW) Molecule Values
COSMolKit's Python API uses copy-on-write (COW) value semantics for molecule
transforms. Methods such as with_hydrogens(), without_hydrogens(),
with_kekulized_bonds(), and with_2d_coords() return a new Molecule; they
do not mutate the original object in place.
This is intentionally different from common RDKit Python workflows, where many
operations mutate an RWMol or update a molecule object directly. In COSMolKit,
keep the returned value:
mol = Molecule.from_smiles("CCO")
mol_h = mol.with_hydrogens()
assert mol is not mol_h
print(mol.to_smiles())
print(mol_h.to_smiles())
Installation
pip install cosmolkit
Quick Start
from cosmolkit import Molecule
mol = Molecule.from_smiles("c1ccccc1O")
drawn = mol.with_2d_coords()
print(mol.to_smiles())
print(drawn.atoms()[0])
drawn.write_png("phenol.png", width=400, height=300)
Morgan fingerprints are exposed as RDKit-style sparse bit vectors. The
on_bits() output is a list of bit indexes set to 1 inside a fixed-length
binary vector, not a dense neural embedding:
fp = mol.fingerprint_morgan(radius=2, n_bits=2048)
print(fp.n_bits())
print(fp.on_bits())
print(fp.tanimoto(Molecule.from_smiles("c1ccccc1").fingerprint_morgan()))
Additional output mirrors RDKit's Morgan provenance helpers:
result = mol.fingerprint_morgan_with_output(radius=2, n_bits=2048)
info = result.additional_output()
print(result.fingerprint().on_bits())
print(info.atom_counts())
print(info.bit_info_map())
Chiral tags and bond orders are Python IntEnum values, so code can compare
against typed constants instead of strings:
from cosmolkit import BondOrder, ChiralTag, Molecule
chiral = Molecule.from_smiles("F[C@H](Cl)Br")
print(chiral.to_smiles())
print(chiral.to_smiles(isomeric_smiles=False))
for atom in chiral.atoms():
if atom.chiral_tag() != ChiralTag.CHI_UNSPECIFIED:
print(atom.idx(), atom.chiral_tag().name)
for bond in Molecule.from_smiles("C=C").bonds():
if bond.bond_type() == BondOrder.DOUBLE:
print("double bond:", bond.begin_atom_idx(), bond.end_atom_idx())
Read-only maps such as BOND_ORDER_MAP and CHIRAL_TAG_MAP convert external
string names to the enum members when needed.
Use Molecule.edit() when you want an explicit editing workflow:
editor = mol.edit()
cl = editor.add_atom("Cl")
editor.add_bond(0, cl, order="single")
mol2 = editor.commit()
Batch Workflows
from cosmolkit import BatchErrorMode, BatchErrorType, MoleculeBatch
smiles = ["CCO", "c1ccccc1", "not-smiles"]
batch = MoleculeBatch.from_smiles_list(
smiles,
errors=BatchErrorMode.KEEP,
).with_parallel_jobs(8)
for error in batch.errors():
if error.error_type() == BatchErrorType.SMILES_PARSE:
print(error.index(), error.message())
prepared = batch.add_hydrogens(errors=BatchErrorMode.KEEP).compute_2d_coords(
errors=BatchErrorMode.KEEP,
)
report = prepared.to_images(
"molecule_images",
format="png",
errors=BatchErrorMode.SKIP,
filenames=["ethanol", "benzene", "invalid"],
)
sdf_files = prepared.to_sdf_files(
"molecule_sdf_records",
format="v2000",
errors=BatchErrorMode.SKIP,
filenames=["ethanol", "benzene", "invalid"],
)
fingerprints = prepared.fingerprint_morgan_list(n_bits=2048)
print(prepared.valid_mask())
print(prepared.errors())
print([fp.on_bits() if fp is not None else None for fp in fingerprints])
print([mol.to_smiles() if mol is not None else None for mol in prepared])
print(report)
print(sdf_files)
The errors option controls invalid records:
errors="raise"raises on the first batch validation failure.errors="keep"preserves failed records and exposes structured errors.errors="skip"omits failed records from the returned result or export.
Batch SMILES output accepts formatting options:
canonical = prepared.to_smiles_list(canonical=True)
explicit = prepared.to_smiles_list(
all_bonds_explicit=True,
all_hs_explicit=True,
)
without_maps = prepared.to_smiles_list(ignore_atom_map_numbers=True)
Single-molecule SMILES output accepts the same writer options:
benzene = Molecule.from_smiles("c1ccccc1")
ethanol = Molecule.from_smiles("CCO")
print(benzene.to_smiles(kekule=True))
print(ethanol.to_smiles(all_bonds_explicit=True))
print(ethanol.to_smiles(canonical=False, rooted_at_atom=2))
with_parallel_jobs() configures the default worker count for later batch
operations while keeping the batch value-style. Method-level n_jobs can still
override it for one call, and prepared.to_list() returns list[Molecule | None] when a Python list is more convenient than iteration.
Directory exports accept optional filenames lists. Entries are aligned with
the batch records, None keeps the default numbered filename, and missing
extensions are filled from the selected output format.
SDF and Arrays
mol = Molecule.from_smiles("CCO").with_2d_coords()
sdf_text = mol.to_sdf_string(format="v2000")
restored = Molecule.read_sdf_record_from_str(sdf_text, coordinate_dim="2d")
coords = restored.coords_2d()
bounds = restored.dg_bounds_matrix()
print(coords.shape)
print(bounds.shape)
Main Features
- SMILES parsing and writing with
Molecule.from_smiles()andto_smiles() - RDKit-style SMILES writer options on single molecules and batches
- copy-on-write molecule value semantics for transforms
- SDF file and string IO
- atom and bond feature inspection
- hydrogen add/remove transforms
- Kekule bond representation
- CW/CCW chiral tags, chiral centers, and tetrahedral stereo inspection
- 2D coordinate generation and NumPy coordinate arrays
- distance-geometry bounds matrix export
- Morgan fingerprint bit vectors, Tanimoto similarity, and AdditionalOutput
- SVG and PNG molecule depictions
- ordered batch construction, transformation, filtering, and export
- batch Morgan fingerprint generation with Rust-side parallel scheduling
- batch-level default parallelism with
with_parallel_jobs() - Python-style batch iteration, slicing, integer-list selection, and boolean-mask selection
- custom per-record filenames for batch image and SDF directory exports
- explicit molecule editing with
Molecule.edit()
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cosmolkit-0.1.2.tar.gz.
File metadata
- Download URL: cosmolkit-0.1.2.tar.gz
- Upload date:
- Size: 532.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d022992e4f1473d1aa9e14a663890896f332f2ac569f8027937d64c0d17520e9
|
|
| MD5 |
56378dcd9bb813237170add2111c20be
|
|
| BLAKE2b-256 |
210abffcd035b3de687573287953ecd4656df7eb0aa6c347ae58450a91390e62
|
File details
Details for the file cosmolkit-0.1.2-cp39-abi3-win_amd64.whl.
File metadata
- Download URL: cosmolkit-0.1.2-cp39-abi3-win_amd64.whl
- Upload date:
- Size: 2.3 MB
- Tags: CPython 3.9+, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
01a6f4f4a9b4aef1f509c9ebafcfe12b4560f1805f54a161b3c73549bd2c6214
|
|
| MD5 |
cbbf36d2ba57b7908fb4f406c39c6b7c
|
|
| BLAKE2b-256 |
319b1b7dd9af5b30af2edc4174c5a072b07fbb60d284d43c29d3a08164aa1a0b
|
File details
Details for the file cosmolkit-0.1.2-cp39-abi3-manylinux_2_34_x86_64.whl.
File metadata
- Download URL: cosmolkit-0.1.2-cp39-abi3-manylinux_2_34_x86_64.whl
- Upload date:
- Size: 2.5 MB
- Tags: CPython 3.9+, manylinux: glibc 2.34+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f159dc737feb93e87f8a57e6fef0d7edc681f27a82c7bbb92cd9a408ae2619a1
|
|
| MD5 |
e8cba7159c403a5478409f7f97740ccc
|
|
| BLAKE2b-256 |
bc89991a0fb35fb06182b6badbf61e13265a242b9f82cc6e9e5249f159749051
|
File details
Details for the file cosmolkit-0.1.2-cp39-abi3-macosx_11_0_arm64.whl.
File metadata
- Download URL: cosmolkit-0.1.2-cp39-abi3-macosx_11_0_arm64.whl
- Upload date:
- Size: 2.3 MB
- Tags: CPython 3.9+, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4efba4ec2e0f3178da8b33b93229ee70ae4fd00e7baca9ab002aa8bf799bf09b
|
|
| MD5 |
7fe25d9c6a610e2e7943979969624269
|
|
| BLAKE2b-256 |
bc25eba4be6744fc90d2a91399b03a5dd7dcfb85ee9805cf96025c7c535bb182
|