Skip to main content

Extensions for SQLAlchemy to work with chemical cartridges

Project description

MolAlchemy

molalchemy - Making chemical databases as easy as regular databases! ๐Ÿงชโœจ

pypi version license python versions PyPI - Downloads PyPI Downloads codecov powered by rdkit SQLAlchemy Ruff

Extensions for SQLAlchemy to work with chemical cartridges

molalchemy provides seamless integration between python and chemical databases, enabling powerful chemical structure storage, indexing, and querying capabilities. The library supports popular chemical cartridges (Bingo PostgreSQL & RDKit PostgreSQL) and provides a unified API for chemical database operations.

This project was originally supposed to be a part of RDKit UGM 2025 hackathon, but COVID had other plans for me. Currently it is in alpha stage as a proof of concept. Contributions are welcome!

To give it a hackathon vibe, I build this PoC in couple hours, so expect some rough edges and missing features.

๐Ÿš€ Features

  • Chemical Data Types: Custom SQLAlchemy types for molecules, reactions and fingerprints
  • Chemical Cartridge Integration: Support for Bingo and RDKit PostgreSQL cartridges
  • Substructure Search: Efficient substructure and similarity searching
  • Chemical Indexing: High-performance chemical structure indexing
  • Input Validation: Molecules and reactions are validated before being sent to the database
  • Similarity Threshold Management: Get/set Tanimoto and Dice thresholds with a context manager
  • Alembic Integration: Automatic handling of extensions, types, and indexes in database migrations
  • Typing: As much type hints as possible - no need to remember yet another abstract function name
  • Easy Integration: Drop-in replacement for standard SQLAlchemy types

๐Ÿ“ฆ Installation

Using pip

pip install molalchemy

From source

pip install git+https://github.com/asiomchen/molalchemy.git

# or clone the repo and install
git clone https://github.com/asiomchen/molalchemy.git
cd molalchemy
pip install .

Prerequisites

  • Python 3.10+
  • SQLAlchemy 2.0+
  • rdkit 2024.3.1+
  • Running PostgreSQL with chemical cartridge (Bingo or RDKit) (see docker-compose.yaml for a ready-to-use setup)

For development or testing, you can use the provided Docker setup:

# For RDKit cartridge
docker-compose up rdkit

# For Bingo cartridge  
docker-compose up bingo

๐Ÿ“ Project Structure

molalchemy/
โ”œโ”€โ”€ src/molalchemy/
โ”‚   โ”œโ”€โ”€ types.py              # Base type definitions
โ”‚   โ”œโ”€โ”€ exceptions.py         # Custom exception hierarchy
โ”‚   โ”œโ”€โ”€ helpers.py            # Common utilities
โ”‚   โ”œโ”€โ”€ alembic_helpers.py    # Alembic integration utilities
โ”‚   โ”œโ”€โ”€ bingo/               # Bingo PostgreSQL cartridge support
โ”‚   โ”‚   โ”œโ”€โ”€ types.py         # Bingo-specific types
โ”‚   โ”‚   โ”œโ”€โ”€ index.py         # Bingo indexing
โ”‚   โ”‚   โ”œโ”€โ”€ comparators.py   # SQLAlchemy comparators
โ”‚   โ”‚   โ””โ”€โ”€ functions/       # Bingo database functions
โ”‚   โ””โ”€โ”€ rdkit/               # RDKit PostgreSQL cartridge support
โ”‚       โ”œโ”€โ”€ types.py         # RDKit-specific types
โ”‚       โ”œโ”€โ”€ index.py         # RDKit indexing
โ”‚       โ”œโ”€โ”€ comparators.py   # SQLAlchemy comparators
โ”‚       โ”œโ”€โ”€ settings.py      # Similarity threshold management
โ”‚       โ””โ”€โ”€ functions/       # RDKit database functions
โ”œโ”€โ”€ tests/                   # Test suite
โ”œโ”€โ”€ docs/                    # Documentation
โ””โ”€โ”€ dev_scripts/             # Development utilities

๐Ÿ”ง Quick Start

To learn how to use molalchemy, check out the tutorials in the documentation:

๐Ÿ—๏ธ Supported Cartridges

Bingo Cartridge

from molalchemy.bingo.types import (
    BingoMol,              # Text-based molecule storage (SMILES/Molfile)
    BingoBinaryMol,        # Binary molecule storage with format conversion
    BingoReaction,         # Reaction storage (reaction SMILES/Rxnfile)
    BingoBinaryReaction    # Binary reaction storage
)
from molalchemy.bingo.index import (
    BingoMolIndex,         # Molecule indexing
    BingoBinaryMolIndex,   # Binary molecule indexing
    BingoRxnIndex,         # Reaction indexing
    BingoBinaryRxnIndex    # Binary reaction indexing
)
from molalchemy.bingo.functions import (
    # Individual function imports available, see documentation
    # for complete list of chemical analysis functions
)

RDKit Cartridge

from molalchemy.rdkit.types import (
    RdkitMol,              # RDKit molecule type with configurable return formats
    RdkitBitFingerprint,   # Binary fingerprints (bfp)
    RdkitSparseFingerprint,# Sparse fingerprints (sfp)
    RdkitReaction,         # Chemical reactions with input validation
    RdkitQMol,             # Query molecules
    RdkitXQMol,            # Extended query molecules
)
from molalchemy.rdkit.index import (
    RdkitIndex,            # RDKit molecule indexing (GIST index)
)
from molalchemy.rdkit.settings import (
    get_tanimoto_threshold, set_tanimoto_threshold,  # Tanimoto threshold management
    get_dice_threshold, set_dice_threshold,          # Dice threshold management
    similarity_threshold,                            # Context manager for temporary thresholds
)
from molalchemy.rdkit.functions import (
    # Individual function imports available, see documentation
    # for complete list of 150+ RDKit functions
)

๐ŸŽฏ Advanced Features

Chemical Indexing

from molalchemy.bingo.index import BingoMolIndex
from molalchemy.bingo.types import BingoMol

class Molecule(Base):
    __tablename__ = 'molecules'
    
    id: Mapped[int] = mapped_column(Integer, primary_key=True)
    structure: Mapped[str] = mapped_column(BingoMol)
    name: Mapped[str] = mapped_column(String(100))
    
    # Add chemical index for faster searching
    __table_args__ = (
        BingoMolIndex('mol_idx', 'structure'),
    )

Configurable Return Types

from molalchemy.rdkit.types import RdkitMol

class MoleculeWithFormats(Base):
    __tablename__ = 'molecules_formatted'
    
    id: Mapped[int] = mapped_column(Integer, primary_key=True)
    # Return as SMILES string (default)
    structure_smiles: Mapped[str] = mapped_column(RdkitMol())
    # Return as RDKit Mol object
    structure_mol: Mapped[bytes] = mapped_column(RdkitMol(return_type="mol"))
    # Return as raw bytes
    structure_bytes: Mapped[bytes] = mapped_column(RdkitMol(return_type="bytes"))

Similarity Threshold Management

RDKit PostgreSQL uses GUC variables to control similarity search behavior. MolAlchemy provides helpers to manage these thresholds:

from molalchemy.rdkit.settings import (
    get_tanimoto_threshold,
    set_tanimoto_threshold,
    similarity_threshold,
)

# Get/set thresholds directly
print(get_tanimoto_threshold(session))  # 0.5 (default)
set_tanimoto_threshold(session, 0.3)

# Use context manager for temporary changes
with similarity_threshold(session, tanimoto=0.1, dice=0.2):
    # Thresholds are active inside the block
    results = session.execute(query).all()
# Original thresholds are restored automatically

Chemical Reactions

Store and query chemical reactions using RdkitReaction:

from molalchemy.rdkit.types import RdkitReaction
from molalchemy.rdkit.functions import rxn_has_smarts, reaction_numreactants

class Reaction(Base):
    __tablename__ = 'reactions'
    id: Mapped[int] = mapped_column(Integer, primary_key=True)
    name: Mapped[str] = mapped_column(String(100))
    rxn: Mapped[str] = mapped_column(RdkitReaction())

# Insert with validation (invalid SMARTS raises InvalidReactionError)
session.add(Reaction(name="Amide formation", rxn="[C:1](=O)[OH].[N:2]>>[C:1](=O)[N:2]"))

# Reaction substructure search
results = session.execute(
    select(Reaction).where(rxn_has_smarts(Reaction.rxn, ">>[C:1][N:2]"))
).all()

Using Chemical Functions

The chemical functions are available as individual imports from the functions modules. Under the hood they use SQLAlchemy's func to call the corresponding database functions, and provide type hints and syntax highlighting in IDEs.

from molalchemy.bingo.functions import smiles, getweight, gross, inchikey

# Calculate molecular properties using Bingo functions
results = session.query(
    Molecule.name,
    getweight(Molecule.structure).label('molecular_weight'),
    gross(Molecule.structure).label('formula'),
    smiles(Molecule.structure).label('canonical_smiles')
).all()

# Validate molecular structures
from molalchemy.bingo.functions import checkmolecule

invalid_molecules = session.query(Molecule).filter(
    checkmolecule(Molecule.structure).isnot(None)
).all()

# Format conversions
inchi_keys = session.query(
    Molecule.id,
    inchikey(Molecule.structure).label('inchikey')
).all()

For RDKit functions:

from molalchemy.rdkit.functions import mol_amw, mol_formula, mol_inchikey

# Calculate molecular properties using RDKit functions
results = session.query(
    Molecule.name,
    mol_amw(Molecule.structure).label('molecular_weight'),
    mol_formula(Molecule.structure).label('formula'),
    mol_inchikey(Molecule.structure).label('inchikey')
).all()

Alembic Database Migrations

Molalchemy provides utilities for Alembic integration.For automatic import handling in migrations, the library provides type rendering utilities that ensure proper import statements are generated for molalchemy types.

# ...
from molalchemy import alembic_helpers
# ...

def run_migrations_offline():
    # ...
    context.configure(
        # ...
        render_item=alembic_helpers.render_item,
    )
    # ...


def run_migrations_online():
    # ...
    context.configure(
        # ...
        render_item=alembic_helpers.render_item,
    )
    # ...

๐Ÿงช Development

Setting Up Development Environment

  1. Clone the repository:
git clone https://github.com/asiomchen/molalchemy.git
cd molalchemy
  1. Install dependencies:
uv sync
  1. Activate the virtual environment:
source .venv/bin/activate

Running Tests

# Run all tests with coverage
make test

# Or use uv directly
uv run pytest

# Run specific test module
uv run pytest tests/bingo/

# Run with coverage
uv run pytest --cov=src/molalchemy

Code Quality

This project uses modern Python development tools:

  • uv: For virtual environment and dependency management
  • Ruff: For linting and formatting
  • pytest: For testing

Building Function Bindings

The chemical function bindings are automatically generated from cartridge documentation:

# Update RDKit function bindings
make update-rdkit-func

# Update Bingo function bindings  
make update-bingo-func

# Update all function bindings
make update-func

๐Ÿ“š Documentation

๐Ÿค Contributing

We welcome contributions! molalchemy offers many opportunities for developers interested in chemical informatics:

  • ๐Ÿ”ฐ New to the project? Check out good first issues
  • ๐Ÿ”ฌ Chemical expertise? Help complete RDKit integration or add ChemAxon support
  • ๐Ÿณ DevOps skills? Optimize our Docker containers and CI/CD pipeline
  • ๐Ÿ“š Love documentation? Create tutorials and improve API docs

Read our Contributing Guide for detailed instructions on getting started.

๐Ÿ“„ License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

๐Ÿ™ Acknowledgments

Core Technologies

  • RDKit - Open-source cheminformatics toolkit
  • Bingo - Chemical database cartridge by EPAM
  • SQLAlchemy - Python SQL toolkit and ORM

Inspiration and Similar Projects

  • GeoAlchemy2 - Spatial extension for SQLAlchemy, served as architectural inspiration for cartridge integration patterns
  • ord-schema - Open Reaction Database schema, is one of the few projects using custom chemical types with SQLAlchemy
  • Riccardo Vianello - His work on django-rdkit and razi provided valuable insights for chemical database integration (discovered after starting this project)

๐Ÿ“ง Contact


molalchemy - Making chemical databases as easy as regular databases! ๐Ÿงชโœจ

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

molalchemy-0.0.6.tar.gz (206.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

molalchemy-0.0.6-py3-none-any.whl (50.6 kB view details)

Uploaded Python 3

File details

Details for the file molalchemy-0.0.6.tar.gz.

File metadata

  • Download URL: molalchemy-0.0.6.tar.gz
  • Upload date:
  • Size: 206.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for molalchemy-0.0.6.tar.gz
Algorithm Hash digest
SHA256 63e59bcfd3c565397edcde070da2141d7da2fd66dba0a5fa78a57149a335ce0a
MD5 62a9ff3b65b1d292ab9ede7c01b9aa0e
BLAKE2b-256 4afcd9e4981d9be756d0bce9d8f7ed1b002a3f50f5e8210c75536e660ded030f

See more details on using hashes here.

Provenance

The following attestation bundles were made for molalchemy-0.0.6.tar.gz:

Publisher: ci.yaml on asiomchen/molalchemy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file molalchemy-0.0.6-py3-none-any.whl.

File metadata

  • Download URL: molalchemy-0.0.6-py3-none-any.whl
  • Upload date:
  • Size: 50.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for molalchemy-0.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 58bb77d9ebcc922f843b5fe13b22ad398271a83a8a7174da424f9026e52735e8
MD5 9dfee63cc5b67f74f59099037eff0428
BLAKE2b-256 99ef6fd93406c25311ccbd1a15cb5330ac5ee04d90b17ef48f6b0b5d8ff6d295

See more details on using hashes here.

Provenance

The following attestation bundles were made for molalchemy-0.0.6-py3-none-any.whl:

Publisher: ci.yaml on asiomchen/molalchemy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page