Thompson Sampling-Assisted Chemical Targeting and Iterative Compound Selection for Drug Discovery
Project description
TACTICS: Thompson Sampling-Assisted Chemical Targeting and Iterative Compound Selection for Drug Discovery
A comprehensive library for Thompson Sampling-based optimization of chemical combinatorial libraries, featuring a unified architecture with flexible strategy selection, modern Pydantic configuration, and preset configurations for out-of-the-box usage.
Quick Start with Interactive Tutorials
TACTICS includes interactive marimo notebooks for learning and exploration. For full documentation, see the TACTICS Documentation.
Installation
pip install chem-tactics[tutorials] # Includes marimo
Running Tutorials
As an interactive app (recommended for exploration):
marimo run tutorials/thompson_sampling_tutorial.py
In edit mode (for learning/modification):
marimo edit tutorials/thompson_sampling_tutorial.py
Available Tutorials
| Tutorial | Description |
|---|---|
library_enumeration_tutorial.py |
SynthesisPipeline and enumeration |
thompson_sampling_tutorial.py |
Selection strategies comparison |
reaction_config_builder.py |
ReactionConfig builder |
library_component_comparison.py |
Library component analysis |
legacy_vs_current_comparison.py |
Legacy vs current benchmark |
Note: Tutorials default to the bundled Thrombin dataset. Select "Local Data" mode to use your own files.
Key Features
- Unified Thompson Sampling Framework: Single
ThompsonSamplerwith pluggable selection strategies - Multiple Selection Strategies:
- Greedy (pure exploitation)
- Roulette Wheel (adaptive thermal cycling)
- UCB (Upper Confidence Bound)
- Epsilon-Greedy (balanced exploration/exploitation)
- Bayes-UCB (Bayesian upper confidence bound)
- Boltzmann (temperature-based selection)
- Warmup Strategies: Balanced (recommended), Standard, Enhanced
- Preset Configurations: 5 ready-to-use presets for common use cases
- Modern Pydantic Configuration: Type-safe configuration with full validation
- Parallel Processing: Batch mode with multiprocessing for expensive evaluators
- Multiple Evaluators: Lookup, Database, ROCS, Fred, ML classifiers, and more
- Synthesis Pipeline:
SynthesisPipelinearchitecture for single-step, alternative SMARTS, and multi-step reactions - SMARTS Toolkit:
ReactionDefwith built-in validation, visualization, and protecting group support - Library Enumeration: Efficient generation of combinatorial reaction products with
write_enumerated_library() - Library Analysis: Comprehensive analysis and visualization tools
- Polars DataFrames: Fast, efficient data handling throughout
Package Structure
TACTICS/
├── thompson_sampling/
│ ├── config.py # ThompsonSamplingConfig (Pydantic v2)
│ ├── presets.py # Preset configurations
│ ├── factories.py # Factory functions for component creation
│ ├── core/ # Core unified sampler
│ │ ├── sampler.py # ThompsonSampler (unified)
│ │ ├── evaluators.py # All evaluator classes
│ │ └── evaluator_config.py # Evaluator Pydantic configs
│ ├── strategies/ # Selection strategies
│ │ ├── greedy.py
│ │ ├── roulette_wheel.py
│ │ ├── ucb.py
│ │ ├── epsilon_greedy.py
│ │ ├── bayes_ucb.py
│ │ └── config.py # Strategy Pydantic configs
│ ├── warmup/ # Warmup strategies
│ │ └── config.py # Warmup Pydantic configs (Balanced, Standard, Enhanced)
│ └── baseline.py # Random baseline sampling
├── library_enumeration/ # Library generation tools
│ ├── synthesis_pipeline.py # SynthesisPipeline - main entry point
│ ├── enumeration_utils.py # EnumerationResult, EnumerationError
│ ├── file_writer.py # write_enumerated_library()
│ ├── generate_products.py # Product generation utilities
│ └── smarts_toolkit/ # SMARTS validation and configuration
│ ├── config.py # ReactionDef, ReactionConfig, StepInput, DeprotectionSpec
│ ├── _validator.py # ValidationResult, internal validation
│ └── constants.py # Protecting groups, salt fragments
└── library_analysis/ # Analysis and visualization
Repository Structure
TACTICS/
├── src/TACTICS/ # Core package (pip installable)
│ ├── thompson_sampling/ # Thompson Sampling algorithms
│ ├── library_enumeration/ # Library generation tools
│ ├── library_analysis/ # Analysis and visualization
│ └── data/ # Bundled tutorial datasets
│ └── thrombin/ # Thrombin inhibitor dataset
│
├── tutorials/ # Interactive marimo tutorials
├── tests/ # Unit and integration tests
└── docs/ # Sphinx documentation
Quick Start
Simple Out-of-the-Box Usage with Presets (Recommended)
The easiest way to get started is using presets with SynthesisPipeline:
from TACTICS.library_enumeration import SynthesisPipeline, ReactionConfig, ReactionDef
from TACTICS.thompson_sampling import ThompsonSampler, get_preset
from TACTICS.thompson_sampling.core.evaluator_config import LookupEvaluatorConfig
# 1. Create synthesis pipeline (single source of truth for reactions)
rxn_config = ReactionConfig(
reactions=[ReactionDef(
reaction_smarts="[C:1](=O)[OH].[NH2:2]>>[C:1](=O)[NH:2]",
step_index=0,
description="Amide coupling"
)],
reagent_file_list=["acids.smi", "amines.smi"]
)
pipeline = SynthesisPipeline(rxn_config)
# 2. Create evaluator config
evaluator = LookupEvaluatorConfig(ref_filename="scores.csv")
# 3. Get a preset configuration
config = get_preset(
"fast_exploration", # Quick screening with epsilon-greedy
synthesis_pipeline=pipeline,
evaluator_config=evaluator,
mode="minimize", # Use "minimize" for docking scores
num_iterations=1000
)
# 4. Create sampler from config and run
sampler = ThompsonSampler.from_config(config)
warmup_df = sampler.warm_up(num_warmup_trials=config.num_warmup_trials)
results_df = sampler.search(num_cycles=config.num_ts_iterations)
sampler.close()
# 5. Analyze top results
print(results_df.sort("score").head(10))
Available Presets:
"fast_exploration"- Epsilon-greedy strategy, quick screening"parallel_batch"- Batch processing with multiprocessing (for slow evaluators)"conservative_exploit"- Greedy strategy, focus on best reagents"balanced_sampling"- UCB strategy with theoretical guarantees"diverse_coverage"- Maximum diversity exploration
Parallel Batch Processing (for Expensive Evaluators)
For slow evaluators (docking, ML models), use batch mode with multiprocessing:
from TACTICS.library_enumeration import SynthesisPipeline, ReactionConfig, ReactionDef
from TACTICS.thompson_sampling import ThompsonSampler, get_preset
from TACTICS.thompson_sampling.core.evaluator_config import FredEvaluatorConfig
# Create synthesis pipeline
rxn_config = ReactionConfig(
reactions=[ReactionDef(
reaction_smarts="[C:1](=O)[OH].[NH2:2]>>[C:1](=O)[NH:2]",
step_index=0
)],
reagent_file_list=["acids.smi", "amines.smi"]
)
pipeline = SynthesisPipeline(rxn_config)
# Configure slow evaluator (molecular docking)
evaluator = FredEvaluatorConfig(design_unit_file="receptor.oedu")
# Get parallel batch preset
config = get_preset(
"parallel_batch",
synthesis_pipeline=pipeline,
evaluator_config=evaluator,
mode="minimize", # Docking scores (lower is better)
batch_size=100, # Sample 100 compounds per cycle
)
# Create sampler and run
sampler = ThompsonSampler.from_config(config)
warmup_df = sampler.warm_up(num_warmup_trials=config.num_warmup_trials)
results_df = sampler.search(num_cycles=config.num_ts_iterations)
sampler.close()
Custom Configuration (Advanced)
For full control, create custom configurations:
from TACTICS.library_enumeration import SynthesisPipeline, ReactionConfig, ReactionDef
from TACTICS.thompson_sampling import ThompsonSampler, ThompsonSamplingConfig
from TACTICS.thompson_sampling.strategies.config import RouletteWheelConfig
from TACTICS.thompson_sampling.warmup.config import BalancedWarmupConfig
from TACTICS.thompson_sampling.core.evaluator_config import LookupEvaluatorConfig
# Create synthesis pipeline
rxn_config = ReactionConfig(
reactions=[ReactionDef(
reaction_smarts="[C:1](=O)[OH].[NH2:2]>>[C:1](=O)[NH:2]",
step_index=0
)],
reagent_file_list=["acids.smi", "amines.smi"]
)
pipeline = SynthesisPipeline(rxn_config)
# Create fully customized configuration
config = ThompsonSamplingConfig(
synthesis_pipeline=pipeline,
num_ts_iterations=5000,
num_warmup_trials=5,
strategy_config=RouletteWheelConfig(
mode="maximize",
alpha=0.1, # Initial heating temperature
beta=0.1, # Initial cooling temperature
),
warmup_config=BalancedWarmupConfig(
observations_per_reagent=5,
use_per_reagent_variance=True,
),
evaluator_config=LookupEvaluatorConfig(
ref_filename="scores.csv",
score_col="binding_affinity"
),
batch_size=10,
log_filename="optimization.log"
)
# Create sampler and run
sampler = ThompsonSampler.from_config(config)
warmup_df = sampler.warm_up(num_warmup_trials=config.num_warmup_trials)
results_df = sampler.search(num_cycles=config.num_ts_iterations)
sampler.close()
# Save results
results_df.write_csv("my_results.csv")
Random Baseline Sampling
from TACTICS.library_enumeration import SynthesisPipeline, ReactionConfig, ReactionDef
from TACTICS.thompson_sampling import RandomBaselineConfig, run_random_baseline
from TACTICS.thompson_sampling.core.evaluator_config import LookupEvaluatorConfig
# Create synthesis pipeline
rxn_config = ReactionConfig(
reactions=[ReactionDef(
reaction_smarts="[C:1](=O)[OH].[NH2:2]>>[C:1](=O)[NH:2]",
step_index=0
)],
reagent_file_list=["acids.smi", "amines.smi"]
)
pipeline = SynthesisPipeline(rxn_config)
config = RandomBaselineConfig(
synthesis_pipeline=pipeline,
evaluator_config=LookupEvaluatorConfig(ref_filename="scores.csv"),
num_trials=1000,
num_to_save=100,
ascending_output=False,
outfile_name="random_results.csv"
)
results_df = run_random_baseline(config)
Configuration
Pydantic Configuration Models
The package uses Pydantic v2 for robust configuration validation:
from TACTICS.library_enumeration import SynthesisPipeline, ReactionConfig, ReactionDef
from TACTICS.thompson_sampling import ThompsonSamplingConfig
from TACTICS.thompson_sampling.strategies.config import EpsilonGreedyConfig
from TACTICS.thompson_sampling.warmup.config import BalancedWarmupConfig
from TACTICS.thompson_sampling.core.evaluator_config import LookupEvaluatorConfig
# Create synthesis pipeline
rxn_config = ReactionConfig(
reactions=[ReactionDef(
reaction_smarts="[C:1](=O)[OH].[NH2:2]>>[C:1](=O)[NH:2]",
step_index=0
)],
reagent_file_list=["acids.smi", "amines.smi"]
)
pipeline = SynthesisPipeline(rxn_config)
# Automatic validation and type checking
config = ThompsonSamplingConfig(
synthesis_pipeline=pipeline, # Required: single source of truth
num_ts_iterations=1000,
strategy_config=EpsilonGreedyConfig(mode="maximize", epsilon=0.2),
warmup_config=BalancedWarmupConfig(),
evaluator_config=LookupEvaluatorConfig(ref_filename="scores.csv"),
)
Configuration Validation
from pydantic import ValidationError
# Invalid configuration raises ValidationError
try:
rxn = ReactionDef(
reaction_smarts="invalid-smarts", # ValidationError: Invalid SMARTS
step_index=0
)
except ValidationError as e:
print(f"Configuration error: {e}")
Testing
The package includes comprehensive tests for configuration validation:
# Run all tests
pytest tests/
# Run configuration tests
pytest tests/test_config_validation.py -v
# Run with coverage
pytest tests/ --cov=TACTICS --cov-report=html
Documentation
- Full Documentation: TACTICS Documentation
- Interactive Tutorials: See
tutorials/for marimo notebooks - API Reference: Build locally with
cd docs && make html
Installation
# Clone repository and install package in development mode
git clone https://github.com/aakankschit/TACTICS.git
cd TACTICS
pip install -e .
# With interactive tutorials (marimo):
pip install -e ".[tutorials]"
# With test dependencies:
pip install -e ".[test]"
Requirements
- Python 3.11+
- Multiprocessing support
Contributing
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests for new functionality
- Ensure all tests pass
- Submit a pull request
License
This project is licensed under the MIT License - see the LICENSE file for details.
Citation
If you use TACTICS in your research, please cite:
@software{tactics,
title={TACTICS: Thompson Sampling-Assisted Chemical Targeting and Iterative Compound Selection for Drug Discovery},
author={Aakankschit Nandkeolyar},
year={2025},
url={https://github.com/your-org/TACTICS}
}
Support
For questions and support:
- Open an issue on GitHub
- Contact: anandkeo@uci.edu
This work is based on previous work by Patrick Walters. This project is a collaboration between the University of California Irvine, Leiden University and Groningen University.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file chem_tactics-1.0.1.tar.gz.
File metadata
- Download URL: chem_tactics-1.0.1.tar.gz
- Upload date:
- Size: 3.5 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
29780686cff420c0bdc2931ae8c5bd2dac5a8e1ca000e2064f68fdd411160677
|
|
| MD5 |
4060eaa85a544ac0330bb806abf51b0a
|
|
| BLAKE2b-256 |
20e532e6205c217465ce69c556f29418def3dd5608b31b3d1f1aa9696ac5814f
|
File details
Details for the file chem_tactics-1.0.1-py3-none-any.whl.
File metadata
- Download URL: chem_tactics-1.0.1-py3-none-any.whl
- Upload date:
- Size: 3.6 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5018fd380d383118c4221b4ba0df48aa1c2d5387eb38a2fc155773b9f9fc832d
|
|
| MD5 |
25c5de324b9e8db43af2fe4cea784355
|
|
| BLAKE2b-256 |
ef19d281700ec43ffe47f67ca3acf830655be1c724da0fd3876e61bdb715068d
|