Transparent, flexible and reproducible ChEMBL data curation, aggregation and analysis.
Project description
Inspired in the Portuguese word "capricho" 🔊. Doing someting with capricho means doing it meticulously, with care and attention to detail.
CAPRICHO (ChEMBL Aggregation Package with Robust Inspection and Curation Handling Options) is a Python package that streamlines fetching, curating, and aggregating ChEMBL data into a machine learning-ready format for drug discovery in a flexible and reproducible manner. Instead of making opiniated decisions on the source data, CAPRICHO curates it based on several quality control filters that can be chosen by the user. Its guiding principle is to never silently drop data. Entries that don't meet the criteria are marked, allowing the user to analyze how each curation step affects the comparability of assay readouts for the same compound.
🎯 Goals
The development of CAPRICHO is guided by two core principles:
- Transparency Above All: Data curation should never be a black box. Removed data points should be saved to be scrutinized by the user and the original data should be always preserved to ensure data integrity.
- Flexibility by Design: Every modeling project is unique. The tool must support flexible data collection and aggregation, allowing the incorporation of any ChEMBL metadata column to be incorporated into same-compound bioactivity values.
✨ Features:
- Data retrieval by any ChEMBL identifier (molecule IDs, target IDs, assay IDs, or document IDs)
- ADMET data curation support with unit conversion and non-pChEMBL aggregation
- Quality control through data flagging — never silently drops data
- Customizable filtering options with max curation standards introduced by Landrum & Riniker (2024)
- Configurable data aggregation options
- Binary classification support with censored data handling
- Save a fetching and processing recipe for reproducibility
- Command-line interface for easy use
⚙️ Installation
The most recent release can be installed from PyPI with uv:
uv pip install capricho
or with pip:
python -m pip install capricho
Alternatively, install directly from the GitHub repository with uv using the command:
uv pip install git+https://github.com/David-Araripe/Capricho.git
or with pip
python -m pip install git+https://github.com/David-Araripe/Capricho.git
🚀 Quick Start
Basic Usage
# Download ChEMBL database
capricho download
# Get bioactivity data for EGFR
capricho get --target-ids CHEMBL203 --output-path egfr_data.csv
# Get high-confidence data for multiple targets
capricho get --target-ids CHEMBL203,CHEMBL204 --confidence-scores 8,9 --output-path results.csv
Tab Completion
Our CLI supports tab completion for commands and options. To enable it, run the following command in your terminal:
capricho --install-completion
Key Features
- Five main commands:
download,explore,get,prepare,binarize - Flexible filtering: By confidence, assay type, bioactivity type
- Transparent processing: All filtering steps are logged and flagged
- Reproducible workflows: Automatic recipe generation
- Multiple backends: Local SQL or web API
- Binary classification support: Convert continuous activity values to binary labels
📖 Documentation
For comprehensive documentation including detailed CLI options, advanced usage, tutorials, and API reference, visit our full documentation.
Quick Links:
License
This project is licensed under the MIT License - see the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file capricho-1.0.0.tar.gz.
File metadata
- Download URL: capricho-1.0.0.tar.gz
- Upload date:
- Size: 1.0 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.18
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dbdec012e1e50eb897f07e1c64ef1b4d11654a08bf7bb37063ffce9a598d4fc2
|
|
| MD5 |
a2ff448dfb6607ba47a401352f5fe74f
|
|
| BLAKE2b-256 |
bd90bf7eeee6b3bcb49e5c1749dfd3b51a387a1af567398122c2d29e367212d2
|
File details
Details for the file capricho-1.0.0-py3-none-any.whl.
File metadata
- Download URL: capricho-1.0.0-py3-none-any.whl
- Upload date:
- Size: 107.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.18
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
763ed5f7ed53313132856b358364b99a1afdebdae0e0ccb0ea75fa472e42fdaf
|
|
| MD5 |
82de1a6441032268c60a1ec5e26b36ef
|
|
| BLAKE2b-256 |
3b6d8d6eba52175dd5ef4dd52321fa7f9a5ebbf21d33bea68605a06b5bde5e1b
|