A flexible framework for experimenting with and evaluating different sample selection strategies
Project description
Exemplar Sample Selection Framework
A flexible framework for experimenting with and evaluating different sample selection strategies. This framework allows you to:
- Compare different selection strategies
- Evaluate using multiple metrics
- Work with various datasets
- Extend with custom strategies and metrics
Installation
- Clone the repository:
git clone https://github.com/yourusername/exemplar-sample-selection.git
cd exemplar-sample-selection
- Install dependencies using Poetry:
# Install Poetry if you haven't already:
# curl -sSL https://install.python-poetry.org | python3 -
# Install dependencies and create virtual environment
poetry install
- Activate the virtual environment:
poetry shell
Quick Start
Run the example experiment:
python examples/run_experiment.py
This will:
- Load the IMDB dataset
- Extract features using a sentence transformer
- Run random selection (baseline strategy)
- Evaluate using coverage metrics
- Save results to
outputs/imdb_random/
Framework Structure
exemplar-sample-selection/
├── src/
│ ├── data/ # Dataset handling
│ ├── selection/ # Selection strategies
│ ├── metrics/ # Evaluation metrics
│ ├── experiments/ # Experiment management
│ └── utils/ # Utilities
├── tests/ # Unit tests
├── configs/ # Experiment configs
├── examples/ # Example scripts
└── docs/ # Documentation
Core Components
1. Dataset Management
- Standardized dataset interface
- Built-in support for text datasets
- Feature extraction and caching
- Easy extension to other data types
2. Selection Strategies
- Base strategy interface
- Random selection baseline
- Support for both supervised and unsupervised selection
- Easy addition of new strategies
3. Evaluation Metrics
- Coverage metrics
- Distribution matching
- Performance metrics
- Extensible metric system
4. Experiment Management
- Configuration-based setup
- Automated logging
- Result tracking
- Reproducible experiments
Adding New Components
Adding a New Selection Strategy
- Create a new file in
src/selection/:
from .base import SelectionStrategy
class MyStrategy(SelectionStrategy):
def select(self, features, labels=None, n_samples=100):
# Implement your selection logic here
return selected_indices
- Register in
src/selection/__init__.py
Adding a New Metric
- Create a new file in
src/metrics/:
from .base import Metric
class MyMetric(Metric):
def compute(self, selected_features, full_features,
selected_labels=None, full_labels=None):
# Implement your metric computation here
return {'my_metric': value}
- Register in
src/metrics/__init__.py
Running Experiments
1. Create Configuration
from src.experiments import ExperimentConfig
from src.experiments.config import DatasetConfig, SelectionConfig, MetricConfig
config = ExperimentConfig(
name="My Experiment",
dataset=DatasetConfig(
name="dataset_name",
split="train"
),
selection=SelectionConfig(
name="strategy_name",
params={"param1": value1},
n_samples=1000
),
metrics=[
MetricConfig(
name="metric_name",
params={"param1": value1}
)
]
)
2. Run Experiment
from src.experiments import ExperimentRunner
runner = ExperimentRunner(config)
results = runner.run()
3. Examine Results
Results are saved in the output directory:
config.json: Experiment configurationresults.json: Detailed resultssummary.txt: Human-readable summaryexperiment.log: Execution log
Contributing
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
License
MIT License - see LICENSE file for details
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file coreset-0.0.1.tar.gz.
File metadata
- Download URL: coreset-0.0.1.tar.gz
- Upload date:
- Size: 12.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.9.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6e803a1baf065314ea930013f13cc2836a8fba3badaa5e2d2ee30ebe500f6261
|
|
| MD5 |
a70a35c451ab214fd136a38b5dab3537
|
|
| BLAKE2b-256 |
858b186c7ab171c013885d0892948f439355da2641cb431a4ca4e929388ac6cc
|
File details
Details for the file coreset-0.0.1-py3-none-any.whl.
File metadata
- Download URL: coreset-0.0.1-py3-none-any.whl
- Upload date:
- Size: 16.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.9.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
83d4cf8f805545c443c02eb0c9701c77156e432730e547ff7397680beb9ebe5a
|
|
| MD5 |
ed6f2324e8588bcf7c1b23c36865c616
|
|
| BLAKE2b-256 |
5fb2b4f554ccb5e5fe3b39a34ad895e4aa0d1fc88065e2d7e2005c5a80daf79e
|