Dimensio: A flexible configuration space compression library for Bayesian Optimization
Project description
Dimensio
English | 简体中文
A flexible configuration space compression library designed for Bayesian Optimization. Supports combining multiple compression strategies through a Pipeline architecture to significantly improve the efficiency of high-dimensional hyperparameter optimization.
Table of Contents
- Features
- Overview
- Installation
- Quick Start
- Examples
- Compression Strategies
- Visualization
- Integration with Bayesian Optimization
- API Documentation
- Advanced Usage
Features
✨ Pipeline Architecture: Build flexible compression strategies by combining multiple compression steps
🎯 Multiple Compression Strategies: Support dimension selection, range compression, and projection transformations
🔄 Adaptive Updates: Dynamically adjust compression strategies during optimization
🎨 Rich Visualizations: Provide various visualization tools for compression process and parameter importance
📊 Transfer Learning Support: Dynamically transform multi-source historical data to adapt to compression strategy changes
🔧 Extensible Design: Easy to add custom compression steps and filling strategies
Overview
Space Concepts
Original Space
- Complete, uncompressed configuration space
- Contains all parameters with their original ranges
Sample Space
- Space used for sampling new configurations
- Affected by dimension selection and range compression
- Low-dimensional if projection steps are used
Surrogate Space
- Space used for surrogate model training and prediction
- Final output space of the pipeline
Unprojected Space
- Space before projection
- Used to map low-dimensional configs back to high-dimensional space for evaluation
- If dimension compression or range compression was performed before the projection step, this space is the space after dimension/range compression and before projection; otherwise, it is the original space.
Compression Flow
Original Space
↓ [Dimension Selection - DimensionSelectionStep]
Dimension-reduced Space
↓ [Range Compression - RangeCompressionStep]
Range-compressed Space
↓ [Projection - ProjectionStep]
Final Returned Compressed Space
├── Sample Space: for generating new configurations
└── Surrogate Space: for model training
Installation
From PyPI
pip install dimensio
From Source
git clone https://github.com/Elubrazione/dimensio.git
cd dimensio
pip install -e .
Quick Start
💡 See Full Examples: The examples/ directory contains multiple runnable examples covering all features and use cases. See examples/README.md for detailed documentation.
Basic Usage
Strongly recommend combining compression steps step yourself
from dimensio import Compressor, SHAPDimensionStep, BoundaryRangeStep
from ConfigSpace import ConfigurationSpace, UniformFloatHyperparameter
# 1. Create configuration space
config_space = ConfigurationSpace()
config_space.add_hyperparameter(UniformFloatHyperparameter('x1', 1, 100))
config_space.add_hyperparameter(UniformFloatHyperparameter('x2', -5, 1028))
config_space.add_hyperparameter(UniformFloatHyperparameter('x3', 3140, 7890))
# 2. Define compression steps
steps = [
SHAPDimensionStep(strategy='shap', topk=2),
BoundaryRangeStep(method='boundary', top_ratio=0.8)
]
# 3. Create compressor
compressor = Compressor(
config_space=config_space,
steps=steps,
save_compression_info=True,
output_dir='./results/compression'
)
# 4. Compress configuration space
surrogate_space, sample_space = compressor.compress_space(space_history=None)
print(f"Original dimensions: {len(config_space.get_hyperparameters())}")
print(f"Surrogate space dimensions: {len(surrogate_space.get_hyperparameters())}")
print(f"Sample space dimensions: {len(sample_space.get_hyperparameters())}")
Using Convenience Functions
from dimensio import get_compressor
# LlamaTune strategy (quantization + projection)
compressor = get_compressor(
compressor_type='llamatune',
config_space=config_space,
adapter_alias='rembo', # or 'hesbo'
le_low_dim=10,
max_num_values=50
)
# Expert knowledge strategy
compressor = get_compressor(
compressor_type='expert',
config_space=config_space,
expert_params=['x1', 'x3'],
top_ratio=0.9
)
Configure Logging
from dimensio import setup_logging, disable_logging
import logging
# Set log level
setup_logging(level=logging.INFO)
# Or save to file
setup_logging(level=logging.DEBUG, log_file='dimensio.log')
# Disable logging
disable_logging()
Examples
The examples/ directory contains comprehensive, runnable examples:
1. Quick Start (quick_start.py)
A simple example demonstrating basic usage:
- Creating configuration spaces
- Generating mock history data
- Using convenience functions and custom steps
- Basic visualization
Run: python examples/quick_start.py
2. Comprehensive Examples (comprehensive.py)
Six complete examples covering different compression strategies:
- Example 1: SHAP dimension selection + standard boundary range compression
- Example 2: Correlation dimension selection + SHAP range compression
- Example 3: KDE range compression (retain all dimensions)
- Example 4: Quantization + REMBO projection
- Example 5: Expert knowledge-based compression
- Example 6: Using convenience functions
Run: python examples/comprehensive.py
3. Adaptive Update Strategies (adaptive_strategies.py)
Compares four different adaptive update strategies:
- Periodic Update: Updates at fixed intervals
- Stagnation Detection: Triggers when optimization stagnates
- Improvement Detection: Triggers on consecutive improvements
- Composite Strategy: Combines multiple strategies (demonstrates Stagnation + Improvement)
Run: python examples/adaptive_strategies.py
4. Multi-Source Transfer Learning (multi_single_source.py)
Demonstrates transfer learning with multiple source tasks:
- Generating historical data from multiple source tasks
- Calculating task similarities
- Comparing single-source vs multi-source compression
- Visualizing transfer learning effects
Run: python examples/multi_single_source.py
For detailed documentation on all examples, see examples/README.md.
Compression Strategies
1. Dimension Selection
Reduce the number of parameters by keeping the most important ones.
SHAPDimensionStep
Parameter selection based on SHAP values. Supports multi-source transfer learning.
from dimensio import SHAPDimensionStep
step = SHAPDimensionStep(
strategy='shap',
topk=10 # Select top-10 important parameters
)
How it works:
- Train a Random Forest regression model using historical evaluation data
- Calculate SHAP values to quantify each parameter's importance
- Select the top-k most important parameters
Transfer learning support:
- Can leverage historical data from multiple source tasks
- Automatically weight different sources by task similarity
CorrelationDimensionStep
Parameter selection based on Spearman or Pearson correlation. Supports multi-source transfer learning.
from dimensio import CorrelationDimensionStep
step = CorrelationDimensionStep(
method='spearman', # or 'pearson'
topk=10
)
How it works:
- Calculate correlation (Spearman or Pearson) between each parameter and objective
- Select parameters with highest correlation
Transfer learning support:
- Can leverage historical data from multiple source tasks
- Automatically weight different sources by task similarity
ExpertDimensionStep
Parameter selection based on expert knowledge.
from dimensio import ExpertDimensionStep
step = ExpertDimensionStep(
strategy='expert',
expert_params=['param1', 'param2', 'param3']
)
AdaptiveDimensionStep
Adaptively adjust the number of parameters. Configurable importance calculator and update strategy.
from dimensio import AdaptiveDimensionStep
from dimensio.steps.dimension import SHAPImportanceCalculator
from dimensio.core.update import PeriodicUpdateStrategy
step = AdaptiveDimensionStep(
importance_calculator=SHAPImportanceCalculator(), # Optional, default SHAP
update_strategy=PeriodicUpdateStrategy(period=5), # Update every 5 iterations
initial_topk=30,
reduction_ratio=0.2,
min_dimensions=5,
max_dimensions=50 # Optional
)
Parameters:
importance_calculator: Importance calculator (default SHAP)update_strategy: Update strategy (default every 5 iterations), see below for optionsinitial_topk: Initial number of parametersreduction_ratio: Ratio for dimension adjustment per update (for increase or decrease)min_dimensions: Minimum number of dimensionsmax_dimensions: Maximum number of dimensions (optional)
Supported Update Strategies:
1. PeriodicUpdateStrategy (Periodic Update)
Execute updates at fixed iteration intervals, gradually reducing parameter count.
from dimensio.core.update import PeriodicUpdateStrategy
update_strategy = PeriodicUpdateStrategy(period=10) # Update every 10 iterations
Behavior: Every period iterations, reduce by current_topk × reduction_ratio parameters.
2. StagnationUpdateStrategy (Stagnation Detection)
Increase parameter count when optimization stagnates to expand search space.
from dimensio.core.update import StagnationUpdateStrategy
update_strategy = StagnationUpdateStrategy(threshold=5) # Trigger after 5 stagnant iterations
Behavior: When best value hasn't improved for threshold consecutive iterations, increase by current_topk × reduction_ratio parameters.
3. ImprovementUpdateStrategy (Improvement Detection)
Reduce parameter count when improvements are detected to focus search.
from dimensio.core.update import ImprovementUpdateStrategy
update_strategy = ImprovementUpdateStrategy(threshold=3) # Trigger after 3 consecutive improvements
Behavior: When best value improves for threshold consecutive iterations, reduce by current_topk × reduction_ratio parameters.
4. HybridUpdateStrategy (Hybrid Strategy)
Combines periodic, stagnation detection, and improvement detection strategies.
from dimensio.core.update import HybridUpdateStrategy
update_strategy = HybridUpdateStrategy(
period=10, # Base period: every 10 iterations
stagnation_threshold=5, # Stagnation: 5 iterations without improvement
improvement_threshold=3 # Improvement: 3 consecutive improvements
)
Behavior:
- Priority: Stagnation > Improvement > Periodic
- On stagnation: increase dimensions
- On improvement: reduce dimensions
- On period reached: reduce dimensions
5. CompositeUpdateStrategy (Composite Strategy)
Freely combine multiple strategies, update when any strategy triggers.
from dimensio.core.update import CompositeUpdateStrategy, StagnationUpdateStrategy, ImprovementUpdateStrategy
update_strategy = CompositeUpdateStrategy(
StagnationUpdateStrategy(threshold=5),
ImprovementUpdateStrategy(threshold=3)
)
Behavior: Check each strategy in order, the first triggered strategy determines how to update dimensions.
Usage Recommendations:
- Stable optimization: Use
PeriodicUpdateStrategyfor smooth dimension reduction - Prone to stagnation: Use
StagnationUpdateStrategyorHybridUpdateStrategy - Fast convergence: Use
ImprovementUpdateStrategy - Complex scenarios: Use
HybridUpdateStrategyorCompositeUpdateStrategy
2. Range Compression
Narrow parameter value ranges to focus on high-value regions.
BoundaryRangeStep
Range compression based on mean and standard deviation of best configurations.
from dimensio import BoundaryRangeStep
step = BoundaryRangeStep(
method='boundary',
top_ratio=0.8, # Use top-80% configs to compute bounds
sigma=2.0 # Standard deviation multiplier (μ ± 2σ)
)
SHAPBoundaryRangeStep
SHAP-weighted range compression. Supports multi-source transfer learning.
from dimensio import SHAPBoundaryRangeStep
step = SHAPBoundaryRangeStep(
method='shap_boundary',
top_ratio=0.8,
sigma=2.0
)
How it works:
- Adjust compression level based on parameter importance
- Important parameters retain larger search ranges
KDEBoundaryRangeStep
Range compression based on kernel density estimation. Supports multi-source transfer learning.
from dimensio import KDEBoundaryRangeStep
step = KDEBoundaryRangeStep(
method='kde_boundary',
source_top_ratio=0.3, # Use top-30% configs from source tasks
kde_coverage=0.6 # KDE coverage ratio (retain 60% of probability density)
)
How it works:
- Use KDE (Kernel Density Estimation) to estimate parameter probability density distribution
- Determine high-density region retention ratio based on
kde_coverage - For multi-source task data, weight by task similarity
ExpertRangeStep
Expert-specified parameter ranges.
from dimensio import ExpertRangeStep
step = ExpertRangeStep(
method='expert',
expert_ranges={
'param1': (0, 10),
'param2': (5, 15)
}
)
3. Projection
Transform parameter representation to reduce search complexity.
QuantizationProjectionStep
Integer parameter quantization, compressing large-range integer parameters to smaller discrete value sets.
from dimensio import QuantizationProjectionStep
step = QuantizationProjectionStep(
method='quantization',
max_num_values=50, # Maximum number of discrete values
adaptive=False # Whether to adaptively adjust
)
How it works:
- Only quantizes
UniformIntegerHyperparameterwith range larger thanmax_num_values - Maps original range
[lower, upper]to quantized range[1, max_num_values] - Samples integers in quantized space, unprojects back to original range for evaluation
- Other parameter types remain unchanged
Example:
- Original parameter:
x ∈ [1000, 5000](4001 values) - Quantized:
x|q ∈ [1, 50](50 values) - Compression ratio: 50/4001 ≈ 1.25%
REMBOProjectionStep
Random Embedding Bayesian Optimization.
from dimensio import REMBOProjectionStep
step = REMBOProjectionStep(
method='rembo',
low_dim=10, # Low-dimensional space dimension
max_num_values=50 # Use with quantization
)
How it works:
- Assumes high-dimensional space has low-dimensional effective subspace
- Project d dimensions to d_e dimensions via random matrix (d_e << d)
- Low-dim sampling range:
[-√d_e, √d_e]
HesBOProjectionStep
Hashing Embedding Bayesian Optimization.
from dimensio import HesBOProjectionStep
step = HesBOProjectionStep(
method='hesbo',
low_dim=10,
max_num_values=50
)
How it works:
- Use hashing functions for dimension mapping
- Low-dim sampling range:
[-1, 1] - More memory-efficient than REMBO
KPCAProjectionStep
Kernel Principal Component Analysis projection.
from dimensio import KPCAProjectionStep
step = KPCAProjectionStep(
method='kpca',
n_components=10,
kernel='rbf'
)
How it works:
- Extract nonlinear principal components using kernel methods
- Note: This method only uses the extracted principal component dimensions to train the surrogate model; the returned sample space is still the space before this step
Visualization
Dimensio provides rich visualization tools to analyze compression effects. The visualization system automatically detects which compression steps are used and generates relevant plots.
Automatic Visualization
from dimensio.viz import visualize_compression_details
# Automatically generates all relevant visualizations based on used steps
visualize_compression_details(
compressor=compressor,
save_dir='./results/visualization'
)
Generated plots (automatically selected based on compression steps):
-
compression_summary.png: Compression summary
- Dimension changes across steps
- Compression ratio statistics
- Range compression statistics
- Text summary
-
range_compression_step_*.png: Detailed view for each range compression step
- ✅ Auto-detected when using
BoundaryRangeStep,SHAPBoundaryRangeStep,KDEBoundaryRangeStep - Original range vs compressed range
- Compression ratio for each parameter
- Quantization info (if used)
- ✅ Auto-detected when using
-
parameter_importance_step_*.png: Parameter importance visualization
- ✅ Auto-detected when using
SHAPDimensionStep,CorrelationDimensionStep,AdaptiveDimensionStep - Top-K parameter importance scores
- ✅ Auto-detected when using
-
dimension_evolution.png: Dimension evolution curve
- ✅ Auto-detected when using
AdaptiveDimensionStepwith update history - Shows dimension changes over iterations
- Highlights each dimension adjustment
- ✅ Auto-detected when using
-
source_task_similarities.png: Source task similarities
- ✅ Auto-detected when using multi-source task (≥2 tasks) transfer learning (providing
source_similarities) - Bar chart of similarity scores between source tasks and target task
- ✅ Auto-detected when using multi-source task (≥2 tasks) transfer learning (providing
-
multi_task_importance_heatmap_step_*.png: Multi-task importance heatmap
- ✅ Auto-detected when using SHAP/Correlation-based dimension compression methods + multiple source tasks
- Heatmap comparing parameter importance across different tasks
- Useful for discovering common important parameters and task-specific key parameters
Manual Visualization
You can also call specific visualization functions:
from dimensio.viz import visualize_parameter_importance, visualize_importance_heatmap
# Single-task parameter importance
visualize_parameter_importance(
param_names=['x1', 'x2', 'x3', ...],
importances=[0.5, 0.3, 0.2, ...],
save_path='./results/parameter_importance.png',
topk=20
)
# Multi-task importance heatmap
import numpy as np
importances = np.array([
[0.5, 0.3, 0.2, ...], # Task 1
[0.4, 0.4, 0.2, ...], # Task 2
[0.6, 0.2, 0.2, ...] # Task 3
])
visualize_importance_heatmap(
param_names=['x1', 'x2', 'x3', ...],
importances=importances,
save_path='./results/importance_heatmap.png',
tasks=['Task 1', 'Task 2', 'Task 3']
)
Integration with Bayesian Optimization
Dimensio can be seamlessly integrated into Bayesian Optimization systems:
- Use compressor's transformation interface
transform_source_datato transform historical data automatically - Use
surrogate_spaceto train surrogate models, usesample_spacefor data sampling - If sampled configurations are projected, they can be converted via
compressor.unproject_point(), and the converted space can be obtained viacompressor.get_unprojected_space()
Integration with Advisor
from openbox import Advisor
from dimensio import get_compressor
# 1. Create compressor
compressor = get_compressor(
compressor_type='shap',
config_space=config_space,
topk=10,
top_ratio=0.8
)
# 2. Compress configuration space
surrogate_space, sample_space = compressor.compress_space()
# 3. Create Advisor (use compressed space)
advisor = Advisor(
config_space=surrogate_space,
num_objectives=1,
num_constraints=0,
# ... other parameters
)
# 4. Use in optimization loop
for iteration in range(max_iterations):
# 4.1 Get suggested configs from sample space
sampling_strategy = compressor.get_sampling_strategy()
configs = sampling_strategy.sample(n=batch_size)
# 4.2 Unproject if using projection
eval_configs = compressor.unproject_points(configs)
# 4.3 Evaluate configs
results = []
for config in eval_configs:
obj_value = objective_function(config)
results.append((config, obj_value))
# 4.4 Update Advisor
for eval_config, obj_value in results:
# Convert to surrogate space
surrogate_config = compressor.convert_config_to_surrogate_space(eval_config)
advisor.update_observation(
observation=(surrogate_config, obj_value)
)
# 4.5 Adaptive update compression (optional)
if iteration % update_interval == 0:
updated = compressor.update_compression(advisor.history)
if updated:
# Update Advisor's config space
advisor.config_space = compressor.surrogate_space
# Transform history
advisor.history = transform_history(
advisor.history,
compressor.surrogate_space
)
Integration with Optimizer (with Transfer Learning)
from openbox import Optimizer
from dimensio import get_compressor
class CompressedOptimizer:
def __init__(self, config_space, compressor_config, **kwargs):
# 1. Create compressor
self.compressor = get_compressor(
config_space=config_space,
**compressor_config
)
# 2. Load source task histories
source_hpo_data = self.load_source_histories()
# 3. Compress configuration space
surrogate_space, sample_space = self.compressor.compress_space(
space_history=source_hpo_data
)
# 4. Transform source data to compressed space
transformed_source_data = self.compressor.transform_source_data(
source_hpo_data
)
# 5. Create Optimizer
self.optimizer = Optimizer(
config_space=surrogate_space,
source_hpo_data=transformed_source_data,
**kwargs
)
def optimize(self, objective_function, max_iterations):
for iteration in range(max_iterations):
# Get suggested config
config = self.optimizer.ask()
# Unproject if needed
if self.compressor.needs_unproject():
eval_config = self.compressor.unproject_point(config)
else:
eval_config = config
# Evaluate
obj_value = objective_function(eval_config)
# Tell result (use compressed space config)
self.optimizer.tell(config, obj_value)
# Adaptive update
if iteration % 10 == 0:
self.compressor.update_compression(self.optimizer.history)
return self.optimizer.get_incumbent()
Complete Example
See integration examples in multique_fidelity_spark project:
multique_fidelity_spark/
├── Compressor/ # Early version of Dimensio
├── Advisor/ # Advisor with compressor integration
├── Optimizer/ # Optimizer implementation
└── main.py # Complete usage example
API Documentation
Compressor
Main compressor class that manages compression pipeline and configuration space transformations.
class Compressor:
def __init__(
self,
config_space: ConfigurationSpace,
steps: Optional[List[CompressionStep]] = None,
filling_strategy: Optional[FillingStrategy] = None,
save_compression_info: bool = False,
output_dir: str = './results/compression',
**kwargs
):
"""
Args:
config_space: Original configuration space
steps: List of compression steps
filling_strategy: Filling strategy (default: uses search space default values)
save_compression_info: Whether to save compression info
output_dir: Output directory
"""
Main methods:
def compress_space(
self,
space_history: Optional[List] = None,
source_similarities: Optional[Dict[int, float]] = None
) -> Tuple[ConfigurationSpace, ConfigurationSpace]:
"""
Compress configuration space
Args:
space_history: Historical data (for SHAP, KDE, etc.)
source_similarities: Source task similarities (for transfer learning)
Returns:
(surrogate_space, sample_space)
"""
def convert_config_to_surrogate_space(
self,
config: Configuration
) -> Configuration:
"""Convert config to surrogate space"""
def unproject_point(self, point: Configuration) -> Configuration:
"""Unproject config (projection step -> original space)"""
def update_compression(self, history: History) -> bool:
"""Adaptive update of compression strategy"""
def get_sampling_strategy(self) -> SamplingStrategy:
"""Get sampling strategy"""
def transform_source_data(
self,
source_hpo_data: Optional[List[History]]
) -> Optional[List[History]]:
"""Transform source task data to current compressed space"""
def get_compression_summary(self) -> dict:
"""Get compression summary info"""
CompressionStep
Base class for compression steps.
class CompressionStep(ABC):
@abstractmethod
def compress(
self,
config_space: ConfigurationSpace,
space_history: Optional[List] = None,
source_similarities: Optional[Dict[int, float]] = None
) -> ConfigurationSpace:
"""Execute compression"""
def affects_sampling_space(self) -> bool:
"""Whether affects sampling space"""
def needs_unproject(self) -> bool:
"""Whether needs unprojection"""
def supports_adaptive_update(self) -> bool:
"""Whether supports adaptive update"""
def get_sampling_strategy(self) -> Optional[SamplingStrategy]:
"""Get sampling strategy"""
SamplingStrategy
Sampling strategy interface.
class SamplingStrategy(ABC):
@abstractmethod
def sample(self, n: int = 1) -> List[Configuration]:
"""Sample n configurations"""
# Standard sampling strategy
class StandardSamplingStrategy(SamplingStrategy):
def __init__(self, config_space: ConfigurationSpace, seed: int = 42):
...
FillingStrategy
Filling strategy interface for handling parameter filling during dimension changes.
class FillingStrategy(ABC):
@abstractmethod
def fill_missing_parameters(
self,
config_dict: Dict[str, Any],
target_space: ConfigurationSpace
) -> Dict[str, Any]:
"""Fill missing parameters"""
# Default filling (use search space default values)
class DefaultValueFilling(FillingStrategy):
...
# Clipping filling (clip to range)
class ClippingValueFilling(FillingStrategy):
...
UpdateStrategy
Update strategy interface for adaptive updates in AdaptiveDimensionStep.
class UpdateStrategy(ABC):
@abstractmethod
def should_update(self, progress: OptimizerProgress, history: History) -> bool:
"""Determine if update should be performed"""
@abstractmethod
def compute_new_topk(
self,
current_topk: int,
reduction_ratio: float,
min_dimensions: int,
max_dimensions: Optional[int],
progress: OptimizerProgress
) -> Tuple[int, str]:
"""Compute new parameter count"""
# Periodic update strategy
class PeriodicUpdateStrategy(UpdateStrategy):
def __init__(self, period: int = 10):
"""period: Update period (number of iterations)"""
# Stagnation detection update strategy
class StagnationUpdateStrategy(UpdateStrategy):
def __init__(self, threshold: int = 5):
"""threshold: Stagnation threshold (consecutive iterations without improvement)"""
# Improvement detection update strategy
class ImprovementUpdateStrategy(UpdateStrategy):
def __init__(self, threshold: int = 3):
"""threshold: Improvement threshold (consecutive improvements)"""
# Hybrid update strategy
class HybridUpdateStrategy(UpdateStrategy):
def __init__(
self,
period: int = 10,
stagnation_threshold: Optional[int] = None,
improvement_threshold: Optional[int] = None
):
"""Combines periodic, stagnation, and improvement detection"""
# Composite update strategy
class CompositeUpdateStrategy(UpdateStrategy):
def __init__(self, *strategies: UpdateStrategy):
"""Freely combine multiple strategies"""
Advanced Usage
Custom Compression Step
from dimensio import CompressionStep
from ConfigSpace import ConfigurationSpace
class MyCustomStep(CompressionStep):
def __init__(self, my_param):
super().__init__(name='CustomStep', method='custom')
self.my_param = my_param
def compress(
self,
config_space: ConfigurationSpace,
space_history=None,
source_similarities=None
) -> ConfigurationSpace:
# Implement your compression logic
compressed_space = # ... your processing
return compressed_space
def affects_sampling_space(self) -> bool:
return True # Whether affects sampling space
def needs_unproject(self) -> bool:
return False # Whether needs unprojection
# Use custom step
steps = [
MyCustomStep(my_param=42),
BoundaryRangeStep(method='boundary', top_ratio=0.8)
]
compressor = Compressor(config_space=config_space, steps=steps)
Combining Multiple Strategies
# Example 1: Dimension selection + Range compression + Projection
steps = [
SHAPDimensionStep(strategy='shap', topk=20), # Select 20 important params
BoundaryRangeStep(method='boundary', top_ratio=0.8), # Compress to top-80% range
REMBOProjectionStep(method='rembo', low_dim=10) # Project to 10 dims
]
# Example 2: Quantization + Projection only (LlamaTune style)
steps = [
QuantizationProjectionStep(method='quantization', max_num_values=50),
HesBOProjectionStep(method='hesbo', low_dim=15)
]
# Example 3: Expert knowledge + Adaptive range compression
steps = [
ExpertDimensionStep(strategy='expert', expert_params=['x1', 'x2', 'x3']),
SHAPBoundaryRangeStep(method='shap_boundary', top_ratio=0.9)
]
Handling Multi-source Transfer Learning Data
from openbox.utils.history import History
# 1. Load multiple source task histories
source_histories = [history1, history2, history3]
# 2. Calculate task similarities (optional, for weighting)
source_similarities = {
0: 0.8, # Similarity of source task 0
1: 0.6,
2: 0.4
}
# 3. Compress with multi-source data
surrogate_space, sample_space = compressor.compress_space(
space_history=source_histories,
source_similarities=source_similarities
)
# 4. Transform source data to compressed space
transformed_histories = compressor.transform_source_data(source_histories)
Dynamic Compression Strategy Update
Integration in BO:
def _get_surrogate_config_array(self):
X_surrogate = []
for obs in self.history.observations:
surrogate_config = self.compressor. \
convert_config_to_surrogate_space(obs.config)
X_surrogate.append(surrogate_config.get_array())
return np.array(X_surrogate)
def update_compression(self, history):
updated = self.compressor.update_compression(history)
if updated:
# compressor.update_compression already updated the spaces
# Rebuild surrogate model with new space dimensions
self.surrogate = build_my_surrogate(
# use surrogate_space here
config_space=self.compressor.surrogate_space,
# transform_source_data
transfer_learning_history= \
self.compressor.transform_source_data(
self.source_hpo_data
),
...
)
self.acq_optimizer = InterleavedLocalAndRandomSearch(
acquisition_function=self.acq_func,
# use sample_space here
config_space=self.compressor.sample_space,
...
)
X_surrogate = self._get_surrogate_config_array()
Y = self.history.get_objectives()
self.surrogate.train(X_surrogate, Y)
self.acq_func.update(
model=self.surrogate,
eta=self.history.get_incumbent_value(),
num_data=len(self.history)
)
return True
return False
from dimensio import AdaptiveDimensionStep, Compressor
from dimensio.steps.dimension import SHAPImportanceCalculator
from dimensio.core.update import PeriodicUpdateStrategy
# Create adaptive dimension selection step
step = AdaptiveDimensionStep(
importance_calculator=SHAPImportanceCalculator(),
update_strategy=PeriodicUpdateStrategy(period=10), # Check every 10 iterations
initial_topk=30,
reduction_ratio=0.2, # Reduce by 20% each time
min_dimensions=5,
max_dimensions=50
)
compressor = Compressor(config_space=config_space, steps=[step])
# Then mount to advisor
# Auto-update in optimization loop
for iteration in range(max_iterations):
# ... optimization logic
# Periodically check if update needed
updated = self.advisor.update_compression(history)
if updated:
print(f"Compression strategy updated (iteration {iteration})")
# Update sampling strategy
sampling_strategy = compressor.get_sampling_strategy()
Save and Analyze Compression Info
# 1. Enable compression info saving
compressor = Compressor(
config_space=config_space,
steps=steps,
save_compression_info=True,
output_dir='./results/compression'
)
# 2. Perform compression
compressor.compress_space()
# 3. Get compression summary
summary = compressor.get_compression_summary()
print(f"Original dimensions: {summary['original_dimensions']}")
print(f"Compressed dimensions: {summary['surrogate_dimensions']}")
print(f"Compression ratio: {summary['surrogate_compression_ratio']:.2%}")
# 4. View saved detailed info
# ./results/compression/compression_initial_compression_*.json
# ./results/compression/compression_history.json
# 5. Visualize
from dimensio.viz import visualize_compression_details
visualize_compression_details(compressor, save_dir='./results/viz')
Dependencies
- numpy >= 1.19.0
- pandas >= 1.2.0
- scikit-learn >= 0.24.0
- ConfigSpace >= 0.6.0
- shap >= 0.41.0
- openbox >= 0.8.0
- matplotlib >= 3.3.0
- seaborn >= 0.11.0
License
MIT License - see LICENSE file for details
Contributing
Issues and Pull Requests are welcome!
Author
Lingching Tung - lingchingtung@stu.pku.edu.cn
Changelog
0.2.1 (2025-11-17)
Fixed
- Resolved back-projection bug to keep high/low dimensional mappings consistent
0.2.0 (2025-11-15)
Added
- Enhanced compression visualization coverage
- Added visualization tracking functionality
- Added documentation (README.md)
- Added example code directory (examples/)
- Quick start example
- Adaptive strategy example
- Multi-source/single-source data example
- Comprehensive examples
Fixed
- Fixed bug with duplicate names in utility module (logger => _logger)
0.1.0 (2025-11-13)
Added
- 🎉 Initial release of Dimensio
- Core compressor class
Compressor - Compression pipeline
CompressionPipeline - Three major compression strategy categories
- Flexible sampling strategies
- Filling strategies
- Standard logging system (based on Python logging)
- Convenience function
get_compressor() - Optimization progress tracking
- Multiple update strategies (periodic, stagnation detection, improvement detection, etc.)
Citation
If you use this project in your research, please cite:
@software{dimensio2025,
author = {Lingching Tung},
title = {Dimensio: Configuration Space Compression for Bayesian Optimization},
year = {2025},
url = {https://github.com/Elubrazione/dimensio}
}
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dimensio-0.2.1.tar.gz.
File metadata
- Download URL: dimensio-0.2.1.tar.gz
- Upload date:
- Size: 74.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c33bb4c875f7212f975603e3d03e79b6e1ea9de8af8a586f792cf84a7035347a
|
|
| MD5 |
2f39f7ac4a1b024450d53ce2274c9ab9
|
|
| BLAKE2b-256 |
5907081940f1ec5cf1908521256ed287055e5dc5d4641f8c2e80537b6465fd63
|
File details
Details for the file dimensio-0.2.1-py3-none-any.whl.
File metadata
- Download URL: dimensio-0.2.1-py3-none-any.whl
- Upload date:
- Size: 65.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
073b51908ff0cfb811a7977973f025a2db4fdae3caa00df2961dd7a6275be0ca
|
|
| MD5 |
26050528bf5bb1cea556da90975a6e4f
|
|
| BLAKE2b-256 |
8f99848f7ec8d098fbd13d3c410f5109be9491096bc96ffa9b2d72bd48aea15a
|