Extensions for SQLAlchemy to work with chemical cartridges
Project description
molalchemy - Making chemical databases as easy as regular databases! ๐งชโจ
Extensions for SQLAlchemy to work with chemical cartridges
molalchemy provides seamless integration between python and chemical databases, enabling powerful chemical structure storage, indexing, and querying capabilities. The library supports popular chemical cartridges (Bingo PostgreSQL & RDKit PostgreSQL) and provides a unified API for chemical database operations.
This project was originally supposed to be a part of RDKit UGM 2025 hackathon, but COVID had other plans for me. Currently it is in alpha stage as a proof of concept. Contributions are welcome!
To give it a hackathon vibe, I build this PoC in couple hours, so expect some rough edges and missing features.
๐ Features
- Chemical Data Types: Custom SQLAlchemy types for molecules, reactions and fingerprints
- Chemical Cartridge Integration: Support for Bingo and RDKit PostgreSQL cartridges
- Substructure Search: Efficient substructure and similarity searching
- Chemical Indexing: High-performance chemical structure indexing
- Input Validation: Molecules and reactions are validated before being sent to the database
- Similarity Threshold Management: Get/set Tanimoto and Dice thresholds with a context manager
- Alembic Integration: Automatic handling of extensions, types, and indexes in database migrations
- Typing: As much type hints as possible - no need to remember yet another abstract function name
- Easy Integration: Drop-in replacement for standard SQLAlchemy types
๐ฆ Installation
Using pip
pip install molalchemy
From source
pip install git+https://github.com/asiomchen/molalchemy.git
# or clone the repo and install
git clone https://github.com/asiomchen/molalchemy.git
cd molalchemy
pip install .
Prerequisites
- Python 3.10+
- SQLAlchemy 2.0+
- rdkit 2024.3.1+
- Running PostgreSQL with chemical cartridge (Bingo or RDKit) (see
docker-compose.yamlfor a ready-to-use setup)
For development or testing, you can use the provided Docker setup:
# For RDKit cartridge
docker-compose up rdkit
# For Bingo cartridge
docker-compose up bingo
๐ Project Structure
molalchemy/
โโโ src/molalchemy/
โ โโโ types.py # Base type definitions
โ โโโ exceptions.py # Custom exception hierarchy
โ โโโ helpers.py # Common utilities
โ โโโ alembic_helpers.py # Alembic integration utilities
โ โโโ bingo/ # Bingo PostgreSQL cartridge support
โ โ โโโ types.py # Bingo-specific types
โ โ โโโ index.py # Bingo indexing
โ โ โโโ comparators.py # SQLAlchemy comparators
โ โ โโโ functions/ # Bingo database functions
โ โโโ rdkit/ # RDKit PostgreSQL cartridge support
โ โโโ types.py # RDKit-specific types
โ โโโ index.py # RDKit indexing
โ โโโ comparators.py # SQLAlchemy comparators
โ โโโ settings.py # Similarity threshold management
โ โโโ functions/ # RDKit database functions
โโโ tests/ # Test suite
โโโ docs/ # Documentation
โโโ dev_scripts/ # Development utilities
๐ง Quick Start
To learn how to use molalchemy, check out the tutorials in the documentation:
- Quick Start - RDKit ORM - Molecules, substructure search, fingerprints, similarity
- Quick Start - RDKit Core - Same features using SQLAlchemy Core API
- Quick Start - Bingo ORM - Bingo cartridge with ORM
- Similarity Thresholds - Managing RDKit similarity thresholds
- Chemical Reactions - Storing and querying reactions
๐๏ธ Supported Cartridges
Bingo Cartridge
from molalchemy.bingo.types import (
BingoMol, # Text-based molecule storage (SMILES/Molfile)
BingoBinaryMol, # Binary molecule storage with format conversion
BingoReaction, # Reaction storage (reaction SMILES/Rxnfile)
BingoBinaryReaction # Binary reaction storage
)
from molalchemy.bingo.index import (
BingoMolIndex, # Molecule indexing
BingoBinaryMolIndex, # Binary molecule indexing
BingoRxnIndex, # Reaction indexing
BingoBinaryRxnIndex # Binary reaction indexing
)
from molalchemy.bingo.functions import (
# Individual function imports available, see documentation
# for complete list of chemical analysis functions
)
RDKit Cartridge
from molalchemy.rdkit.types import (
RdkitMol, # RDKit molecule type with configurable return formats
RdkitBitFingerprint, # Binary fingerprints (bfp)
RdkitSparseFingerprint,# Sparse fingerprints (sfp)
RdkitReaction, # Chemical reactions with input validation
RdkitQMol, # Query molecules
RdkitXQMol, # Extended query molecules
)
from molalchemy.rdkit.index import (
RdkitIndex, # RDKit molecule indexing (GIST index)
)
from molalchemy.rdkit.settings import (
get_tanimoto_threshold, set_tanimoto_threshold, # Tanimoto threshold management
get_dice_threshold, set_dice_threshold, # Dice threshold management
similarity_threshold, # Context manager for temporary thresholds
)
from molalchemy.rdkit.functions import (
# Individual function imports available, see documentation
# for complete list of 150+ RDKit functions
)
๐ฏ Advanced Features
Chemical Indexing
from molalchemy.bingo.index import BingoMolIndex
from molalchemy.bingo.types import BingoMol
class Molecule(Base):
__tablename__ = 'molecules'
id: Mapped[int] = mapped_column(Integer, primary_key=True)
structure: Mapped[str] = mapped_column(BingoMol)
name: Mapped[str] = mapped_column(String(100))
# Add chemical index for faster searching
__table_args__ = (
BingoMolIndex('mol_idx', 'structure'),
)
Configurable Return Types
from molalchemy.rdkit.types import RdkitMol
class MoleculeWithFormats(Base):
__tablename__ = 'molecules_formatted'
id: Mapped[int] = mapped_column(Integer, primary_key=True)
# Return as SMILES string (default)
structure_smiles: Mapped[str] = mapped_column(RdkitMol())
# Return as RDKit Mol object
structure_mol: Mapped[bytes] = mapped_column(RdkitMol(return_type="mol"))
# Return as raw bytes
structure_bytes: Mapped[bytes] = mapped_column(RdkitMol(return_type="bytes"))
Similarity Threshold Management
RDKit PostgreSQL uses GUC variables to control similarity search behavior. MolAlchemy provides helpers to manage these thresholds:
from molalchemy.rdkit.settings import (
get_tanimoto_threshold,
set_tanimoto_threshold,
similarity_threshold,
)
# Get/set thresholds directly
print(get_tanimoto_threshold(session)) # 0.5 (default)
set_tanimoto_threshold(session, 0.3)
# Use context manager for temporary changes
with similarity_threshold(session, tanimoto=0.1, dice=0.2):
# Thresholds are active inside the block
results = session.execute(query).all()
# Original thresholds are restored automatically
Chemical Reactions
Store and query chemical reactions using RdkitReaction:
from molalchemy.rdkit.types import RdkitReaction
from molalchemy.rdkit.functions import rxn_has_smarts, reaction_numreactants
class Reaction(Base):
__tablename__ = 'reactions'
id: Mapped[int] = mapped_column(Integer, primary_key=True)
name: Mapped[str] = mapped_column(String(100))
rxn: Mapped[str] = mapped_column(RdkitReaction())
# Insert with validation (invalid SMARTS raises InvalidReactionError)
session.add(Reaction(name="Amide formation", rxn="[C:1](=O)[OH].[N:2]>>[C:1](=O)[N:2]"))
# Reaction substructure search
results = session.execute(
select(Reaction).where(rxn_has_smarts(Reaction.rxn, ">>[C:1][N:2]"))
).all()
Using Chemical Functions
The chemical functions are available as individual imports from the functions modules. Under the hood they use SQLAlchemy's func to call the corresponding database functions, and provide type hints and syntax highlighting in IDEs.
from molalchemy.bingo.functions import smiles, getweight, gross, inchikey
# Calculate molecular properties using Bingo functions
results = session.query(
Molecule.name,
getweight(Molecule.structure).label('molecular_weight'),
gross(Molecule.structure).label('formula'),
smiles(Molecule.structure).label('canonical_smiles')
).all()
# Validate molecular structures
from molalchemy.bingo.functions import checkmolecule
invalid_molecules = session.query(Molecule).filter(
checkmolecule(Molecule.structure).isnot(None)
).all()
# Format conversions
inchi_keys = session.query(
Molecule.id,
inchikey(Molecule.structure).label('inchikey')
).all()
For RDKit functions:
from molalchemy.rdkit.functions import mol_amw, mol_formula, mol_inchikey
# Calculate molecular properties using RDKit functions
results = session.query(
Molecule.name,
mol_amw(Molecule.structure).label('molecular_weight'),
mol_formula(Molecule.structure).label('formula'),
mol_inchikey(Molecule.structure).label('inchikey')
).all()
Alembic Database Migrations
Molalchemy provides utilities for Alembic integration.For automatic import handling in migrations, the library provides type rendering utilities that ensure proper import statements are generated for molalchemy types.
# ...
from molalchemy import alembic_helpers
# ...
def run_migrations_offline():
# ...
context.configure(
# ...
render_item=alembic_helpers.render_item,
)
# ...
def run_migrations_online():
# ...
context.configure(
# ...
render_item=alembic_helpers.render_item,
)
# ...
๐งช Development
Setting Up Development Environment
- Clone the repository:
git clone https://github.com/asiomchen/molalchemy.git
cd molalchemy
- Install dependencies:
uv sync
- Activate the virtual environment:
source .venv/bin/activate
Running Tests
# Run all tests with coverage
make test
# Or use uv directly
uv run pytest
# Run specific test module
uv run pytest tests/bingo/
# Run with coverage
uv run pytest --cov=src/molalchemy
Code Quality
This project uses modern Python development tools:
- uv: For virtual environment and dependency management
- Ruff: For linting and formatting
- pytest: For testing
Building Function Bindings
The chemical function bindings are automatically generated from cartridge documentation:
# Update RDKit function bindings
make update-rdkit-func
# Update Bingo function bindings
make update-bingo-func
# Update all function bindings
make update-func
๐ Documentation
- ๐ Project Roadmap - Development phases, timeline, and contribution opportunities
- ๐ค Contributing Guide - How to contribute to the project
- ๐ง API Reference - Complete API documentation
- ๐ณ Bingo Manual - Bingo PostgreSQL cartridge guide
- โ๏ธ RDKit Manual - RDKit PostgreSQL cartridge guide
๐ค Contributing
We welcome contributions! molalchemy offers many opportunities for developers interested in chemical informatics:
- ๐ฐ New to the project? Check out good first issues
- ๐ฌ Chemical expertise? Help complete RDKit integration or add ChemAxon support
- ๐ณ DevOps skills? Optimize our Docker containers and CI/CD pipeline
- ๐ Love documentation? Create tutorials and improve API docs
Read our Contributing Guide for detailed instructions on getting started.
๐ License
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
๐ Acknowledgments
Core Technologies
- RDKit - Open-source cheminformatics toolkit
- Bingo - Chemical database cartridge by EPAM
- SQLAlchemy - Python SQL toolkit and ORM
Inspiration and Similar Projects
- GeoAlchemy2 - Spatial extension for SQLAlchemy, served as architectural inspiration for cartridge integration patterns
- ord-schema - Open Reaction Database schema, is one of the few projects using custom chemical types with SQLAlchemy
- Riccardo Vianello - His work on django-rdkit and razi provided valuable insights for chemical database integration (discovered after starting this project)
๐ง Contact
- Author: Anton Siomchen
- Email: anton.siomchen+molalchemy@gmail.com
- GitHub: @asiomchen
- LinkedIn: Anton Siomchen
molalchemy - Making chemical databases as easy as regular databases! ๐งชโจ
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file molalchemy-0.0.6.tar.gz.
File metadata
- Download URL: molalchemy-0.0.6.tar.gz
- Upload date:
- Size: 206.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
63e59bcfd3c565397edcde070da2141d7da2fd66dba0a5fa78a57149a335ce0a
|
|
| MD5 |
62a9ff3b65b1d292ab9ede7c01b9aa0e
|
|
| BLAKE2b-256 |
4afcd9e4981d9be756d0bce9d8f7ed1b002a3f50f5e8210c75536e660ded030f
|
Provenance
The following attestation bundles were made for molalchemy-0.0.6.tar.gz:
Publisher:
ci.yaml on asiomchen/molalchemy
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
molalchemy-0.0.6.tar.gz -
Subject digest:
63e59bcfd3c565397edcde070da2141d7da2fd66dba0a5fa78a57149a335ce0a - Sigstore transparency entry: 1155437104
- Sigstore integration time:
-
Permalink:
asiomchen/molalchemy@3764adf62219024ce7708840d747d3b769392144 -
Branch / Tag:
refs/tags/v0.0.6 - Owner: https://github.com/asiomchen
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
ci.yaml@3764adf62219024ce7708840d747d3b769392144 -
Trigger Event:
push
-
Statement type:
File details
Details for the file molalchemy-0.0.6-py3-none-any.whl.
File metadata
- Download URL: molalchemy-0.0.6-py3-none-any.whl
- Upload date:
- Size: 50.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
58bb77d9ebcc922f843b5fe13b22ad398271a83a8a7174da424f9026e52735e8
|
|
| MD5 |
9dfee63cc5b67f74f59099037eff0428
|
|
| BLAKE2b-256 |
99ef6fd93406c25311ccbd1a15cb5330ac5ee04d90b17ef48f6b0b5d8ff6d295
|
Provenance
The following attestation bundles were made for molalchemy-0.0.6-py3-none-any.whl:
Publisher:
ci.yaml on asiomchen/molalchemy
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
molalchemy-0.0.6-py3-none-any.whl -
Subject digest:
58bb77d9ebcc922f843b5fe13b22ad398271a83a8a7174da424f9026e52735e8 - Sigstore transparency entry: 1155437109
- Sigstore integration time:
-
Permalink:
asiomchen/molalchemy@3764adf62219024ce7708840d747d3b769392144 -
Branch / Tag:
refs/tags/v0.0.6 - Owner: https://github.com/asiomchen
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
ci.yaml@3764adf62219024ce7708840d747d3b769392144 -
Trigger Event:
push
-
Statement type: