Skip to main content

Transparent, flexible and reproducible ChEMBL data curation, aggregation and analysis.

Project description

CAPRICHO logo

The ChEMBL data curator that flags issues instead of silently dropping them.

Ruff Code style: black Imports: isort License: MIT GH Actions

Inspired in the Portuguese word "capricho" 🔊. Doing someting with capricho means doing it meticulously, with care and attention to detail.

CAPRICHO (ChEMBL Aggregation Package with Robust Inspection and Curation Handling Options) is a Python package that streamlines fetching, curating, and aggregating ChEMBL data into a machine learning-ready format for drug discovery in a flexible and reproducible manner. Instead of making opiniated decisions on the source data, CAPRICHO curates it based on several quality control filters that can be chosen by the user. Its guiding principle is to never silently drop data. Entries that don't meet the criteria are marked, allowing the user to analyze how each curation step affects the comparability of assay readouts for the same compound.

🎯 Goals

The development of CAPRICHO is guided by two core principles:

  • Transparency Above All: Data curation should never be a black box. Removed data points should be saved to be scrutinized by the user and the original data should be always preserved to ensure data integrity.
  • Flexibility by Design: Every modeling project is unique. Aggregation should be stratified by ChEMBL metadata columns, aggregating repeated compound measurements only within the scope you define as comparable.

✨ Features:

  • Data retrieval by any ChEMBL identifier (molecule IDs, target IDs, assay IDs, or document IDs)
  • ADMET data curation support with unit conversion and non-pChEMBL aggregation
  • Quality control through data flagging — never silently drops data
  • Customizable filtering options with max curation standards introduced by Landrum & Riniker (2024)
  • Configurable data aggregation options
  • Binary classification support with censored data handling
  • Save a fetching and processing recipe for reproducibility
  • Command-line interface for easy use

⚙️ Installation

The most recent release can be installed from PyPI with uv:

uv pip install capricho

or with pip:

python -m pip install capricho

Alternatively, install directly from the GitHub repository with uv using the command:

uv pip install git+https://github.com/David-Araripe/Capricho.git

or with pip

python -m pip install git+https://github.com/David-Araripe/Capricho.git

🚀 Quick Start

Basic Usage

# Download ChEMBL database
capricho download

# Get bioactivity data for EGFR
capricho get --target-ids CHEMBL203 --output-path egfr_data.csv

# Get high-confidence data for multiple targets
capricho get --target-ids CHEMBL203,CHEMBL204 --confidence-scores 8,9 --output-path results.csv

Tab Completion

Our CLI supports tab completion for commands and options. To enable it, run the following command in your terminal:

capricho --install-completion

Key Features

  • Five main commands: download, explore, get, prepare, binarize
  • Flexible filtering: By confidence, assay type, bioactivity type
  • Transparent processing: All filtering steps are logged and flagged
  • Reproducible workflows: Automatic recipe generation
  • Multiple backends: Local SQL or web API
  • Binary classification support: Convert continuous activity values to binary labels

📖 Documentation

For comprehensive documentation including detailed CLI options, advanced usage, tutorials, and API reference, visit our full documentation.

Quick Links:

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

capricho-1.0.1.tar.gz (1.0 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

capricho-1.0.1-py3-none-any.whl (107.5 kB view details)

Uploaded Python 3

File details

Details for the file capricho-1.0.1.tar.gz.

File metadata

  • Download URL: capricho-1.0.1.tar.gz
  • Upload date:
  • Size: 1.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.18

File hashes

Hashes for capricho-1.0.1.tar.gz
Algorithm Hash digest
SHA256 8c81662778c5ad58e3a430d3d305d9e2a367051fce77bd58ac0f24e55e641ee4
MD5 a7fd411c87caba1f3f1e1b100cd9721e
BLAKE2b-256 42a1406add2f0d80d7516cc8806ed15f27119c97a6d8a44a0b7805340205966c

See more details on using hashes here.

File details

Details for the file capricho-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: capricho-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 107.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.18

File hashes

Hashes for capricho-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 c36d6f1626862a2b907d606b7ba285215e71dcf33ea3c2e7e2bcce0a98878b8f
MD5 421f8573ef787965b56172031d99e8b4
BLAKE2b-256 e2aa0852e3dab61fc75a5d215911facc116bad6b968198972417c0a1d05982bb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page