Sample-Importance-Aware Selection (SIAS) - Continual learning and coreset selection algorithms
Project description
SAGE SIAS (Sample-Importance-Aware Selection)
Independent package for sample-importance-aware selection, continual learning, and coreset algorithms
🎯 Overview
sage-sias provides Sample-Importance-Aware Selection algorithms for:
- Continual Learning: Efficient sample selection for continual/lifelong learning scenarios
- Coreset Selection: Select representative subsets from large datasets
- Active Learning: Importance-based data selection strategies
- Tool/Trajectory Curation: Select important samples for agent training
📦 Installation
# Basic installation
pip install isage-sias
# With PyTorch support
pip install isage-sias[torch]
# Development installation
pip install isage-sias[dev]
🚀 Quick Start
Continual Learning
from sage_sias import ContinualLearner
# Create continual learner
learner = ContinualLearner(
buffer_size=1000,
selection_strategy="importance"
)
# Add samples
for data, label in stream:
learner.add_sample(data, label)
# Get selected samples
important_samples = learner.get_buffer()
Coreset Selection
from sage_sias import CoresetSelector
# Create coreset selector
selector = CoresetSelector(
target_size=100,
method="kmeans++"
)
# Select representative samples
coreset = selector.select(dataset, features)
📚 Key Components
1. Continual Learner (continual_learner.py)
Manages sample selection for continual learning:
- Buffer management with importance-based eviction
- Multiple selection strategies (random, importance, diversity)
- Support for experience replay
2. Coreset Selector (coreset_selector.py)
Selects representative subsets:
- K-means++ based selection
- Diversity-aware sampling
- Importance scoring
- Support for large-scale datasets
3. Types (types.py)
Common data types and protocols:
- Sample representation
- Importance scoring interfaces
- Selection strategies
🔧 Architecture
sage_sias/
├── continual_learner.py # Continual learning with buffer management
├── coreset_selector.py # Coreset selection algorithms
├── types.py # Common types and protocols
└── __init__.py # Public API exports
🎓 Use Cases
- Agent Training: Select important trajectories for fine-tuning
- Data Pruning: Reduce dataset size while maintaining performance
- Active Learning: Query most informative samples
- Memory Management: Maintain representative samples in limited buffers
- Transfer Learning: Select relevant samples for adaptation
🔗 Integration with SAGE
This package is part of the SAGE ecosystem but can be used independently:
# Standalone usage
from sage_sias import ContinualLearner, CoresetSelector
# With SAGE agentic (optional)
from sage_agentic import AgentTrainer
from sage_sias import CoresetSelector
trainer = AgentTrainer()
selector = CoresetSelector(target_size=100)
important_trajectories = selector.select(all_trajectories)
trainer.train(important_trajectories)
📖 Documentation
- Repository: https://github.com/intellistream/sage-sias
- SAGE Documentation: https://intellistream.github.io/SAGE-Pub/
- Issues: https://github.com/intellistream/sage-sias/issues
🤝 Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
📄 License
MIT License - see LICENSE file for details.
🙏 Acknowledgments
Originally part of the SAGE framework, now maintained as an independent package for broader community use.
📧 Contact
- Team: IntelliStream Team
- Email: shuhao_zhang@hust.edu.cn
- GitHub: https://github.com/intellistream
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file isage_sias-0.1.0-cp311-none-any.whl.
File metadata
- Download URL: isage_sias-0.1.0-cp311-none-any.whl
- Upload date:
- Size: 19.2 kB
- Tags: CPython 3.11
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d2775343d8da667c3bf2a19a6915058db1156fa6998910d0dc273b87a6e806fe
|
|
| MD5 |
9593b272043c54469af54b6cc8ca2001
|
|
| BLAKE2b-256 |
b513a66fc46315531d758a2b2c902909e4001e6e0e5cbba3f634dde2d51a5695
|