Skip to main content

Transparent, flexible and reproducible ChEMBL data curation, aggregation and analysis.

Project description

The ChEMBL data curator that flags issues instead of silently dropping them.

Ruff Code style: black Imports: isort License: MIT GH Actions

Inspired in the Portuguese word "capricho" 🔊. Doing someting with capricho means doing it meticulously, with care and attention to detail.

CAPRICHO (ChEMBL Aggregation Package with Robust Inspection and Curation Handling Options) is a Python package that streamlines fetching, curating, and aggregating ChEMBL data into a machine learning-ready format for drug discovery in a flexible and reproducible manner. Instead of making opiniated decisions on the source data, CAPRICHO curates it based on several quality control filters that can be chosen by the user. Its guiding principle is to never silently drop data. Entries that don't meet the criteria are marked, allowing the user to analyze how each curation step affects the comparability of assay readouts for the same compound.

🎯 Goals

The development of CAPRICHO is guided by two core principles:

  • Transparency Above All: Data curation should never be a black box. Removed data points should be saved to be scrutinized by the user and the original data should be always preserved to ensure data integrity.
  • Flexibility by Design: Every modeling project is unique. The tool must support flexible data collection and aggregation, allowing the incorporation of any ChEMBL metadata column to be incorporated into same-compound bioactivity values.

✨ Features:

  • Data retrieval by any ChEMBL identifier (molecule IDs, target IDs, assay IDs, or document IDs)
  • ADMET data curation support with unit conversion and non-pChEMBL aggregation
  • Quality control through data flagging — never silently drops data
  • Customizable filtering options with max curation standards introduced by Landrum & Riniker (2024)
  • Configurable data aggregation options
  • Binary classification support with censored data handling
  • Save a fetching and processing recipe for reproducibility
  • Command-line interface for easy use

⚙️ Installation

The most recent release can be installed from PyPI with uv:

uv pip install capricho

or with pip:

python -m pip install capricho

Alternatively, install directly from the GitHub repository with uv using the command:

uv pip install git+https://github.com/David-Araripe/Capricho.git

or with pip

python -m pip install git+https://github.com/David-Araripe/Capricho.git

🚀 Quick Start

Basic Usage

# Download ChEMBL database
capricho download

# Get bioactivity data for EGFR
capricho get --target-ids CHEMBL203 --output-path egfr_data.csv

# Get high-confidence data for multiple targets
capricho get --target-ids CHEMBL203,CHEMBL204 --confidence-scores 8,9 --output-path results.csv

Tab Completion

Our CLI supports tab completion for commands and options. To enable it, run the following command in your terminal:

capricho --install-completion

Key Features

  • Five main commands: download, explore, get, prepare, binarize
  • Flexible filtering: By confidence, assay type, bioactivity type
  • Transparent processing: All filtering steps are logged and flagged
  • Reproducible workflows: Automatic recipe generation
  • Multiple backends: Local SQL or web API
  • Binary classification support: Convert continuous activity values to binary labels

📖 Documentation

For comprehensive documentation including detailed CLI options, advanced usage, tutorials, and API reference, visit our full documentation.

Quick Links:

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

capricho-1.0.0.tar.gz (1.0 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

capricho-1.0.0-py3-none-any.whl (107.6 kB view details)

Uploaded Python 3

File details

Details for the file capricho-1.0.0.tar.gz.

File metadata

  • Download URL: capricho-1.0.0.tar.gz
  • Upload date:
  • Size: 1.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.18

File hashes

Hashes for capricho-1.0.0.tar.gz
Algorithm Hash digest
SHA256 dbdec012e1e50eb897f07e1c64ef1b4d11654a08bf7bb37063ffce9a598d4fc2
MD5 a2ff448dfb6607ba47a401352f5fe74f
BLAKE2b-256 bd90bf7eeee6b3bcb49e5c1749dfd3b51a387a1af567398122c2d29e367212d2

See more details on using hashes here.

File details

Details for the file capricho-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: capricho-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 107.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.18

File hashes

Hashes for capricho-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 763ed5f7ed53313132856b358364b99a1afdebdae0e0ccb0ea75fa472e42fdaf
MD5 82de1a6441032268c60a1ec5e26b36ef
BLAKE2b-256 3b6d8d6eba52175dd5ef4dd52321fa7f9a5ebbf21d33bea68605a06b5bde5e1b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page