Skip to main content

A flexible framework for experimenting with and evaluating different sample selection strategies

Project description

Exemplar Sample Selection Framework

A flexible framework for experimenting with and evaluating different sample selection strategies. This framework allows you to:

  • Compare different selection strategies
  • Evaluate using multiple metrics
  • Work with various datasets
  • Extend with custom strategies and metrics

Installation

  1. Clone the repository:
git clone https://github.com/yourusername/exemplar-sample-selection.git
cd exemplar-sample-selection
  1. Install dependencies using Poetry:
# Install Poetry if you haven't already:
# curl -sSL https://install.python-poetry.org | python3 -

# Install dependencies and create virtual environment
poetry install
  1. Activate the virtual environment:
poetry shell

Quick Start

Run the example experiment:

python examples/run_experiment.py

This will:

  1. Load the IMDB dataset
  2. Extract features using a sentence transformer
  3. Run random selection (baseline strategy)
  4. Evaluate using coverage metrics
  5. Save results to outputs/imdb_random/

Framework Structure

exemplar-sample-selection/
├── src/
│   ├── data/              # Dataset handling
│   ├── selection/         # Selection strategies
│   ├── metrics/           # Evaluation metrics
│   ├── experiments/       # Experiment management
│   └── utils/             # Utilities
├── tests/                 # Unit tests
├── configs/               # Experiment configs
├── examples/              # Example scripts
└── docs/                  # Documentation

Core Components

1. Dataset Management

  • Standardized dataset interface
  • Built-in support for text datasets
  • Feature extraction and caching
  • Easy extension to other data types

2. Selection Strategies

  • Base strategy interface
  • Random selection baseline
  • Support for both supervised and unsupervised selection
  • Easy addition of new strategies

3. Evaluation Metrics

  • Coverage metrics
  • Distribution matching
  • Performance metrics
  • Extensible metric system

4. Experiment Management

  • Configuration-based setup
  • Automated logging
  • Result tracking
  • Reproducible experiments

Adding New Components

Adding a New Selection Strategy

  1. Create a new file in src/selection/:
from .base import SelectionStrategy

class MyStrategy(SelectionStrategy):
    def select(self, features, labels=None, n_samples=100):
        # Implement your selection logic here
        return selected_indices
  1. Register in src/selection/__init__.py

Adding a New Metric

  1. Create a new file in src/metrics/:
from .base import Metric

class MyMetric(Metric):
    def compute(self, selected_features, full_features, 
                selected_labels=None, full_labels=None):
        # Implement your metric computation here
        return {'my_metric': value}
  1. Register in src/metrics/__init__.py

Running Experiments

1. Create Configuration

from src.experiments import ExperimentConfig
from src.experiments.config import DatasetConfig, SelectionConfig, MetricConfig

config = ExperimentConfig(
    name="My Experiment",
    dataset=DatasetConfig(
        name="dataset_name",
        split="train"
    ),
    selection=SelectionConfig(
        name="strategy_name",
        params={"param1": value1},
        n_samples=1000
    ),
    metrics=[
        MetricConfig(
            name="metric_name",
            params={"param1": value1}
        )
    ]
)

2. Run Experiment

from src.experiments import ExperimentRunner

runner = ExperimentRunner(config)
results = runner.run()

3. Examine Results

Results are saved in the output directory:

  • config.json: Experiment configuration
  • results.json: Detailed results
  • summary.txt: Human-readable summary
  • experiment.log: Execution log

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests if applicable
  5. Submit a pull request

License

MIT License - see LICENSE file for details

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

coreset-0.0.1.tar.gz (12.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

coreset-0.0.1-py3-none-any.whl (16.5 kB view details)

Uploaded Python 3

File details

Details for the file coreset-0.0.1.tar.gz.

File metadata

  • Download URL: coreset-0.0.1.tar.gz
  • Upload date:
  • Size: 12.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.9.9

File hashes

Hashes for coreset-0.0.1.tar.gz
Algorithm Hash digest
SHA256 6e803a1baf065314ea930013f13cc2836a8fba3badaa5e2d2ee30ebe500f6261
MD5 a70a35c451ab214fd136a38b5dab3537
BLAKE2b-256 858b186c7ab171c013885d0892948f439355da2641cb431a4ca4e929388ac6cc

See more details on using hashes here.

File details

Details for the file coreset-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: coreset-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 16.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.9.9

File hashes

Hashes for coreset-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 83d4cf8f805545c443c02eb0c9701c77156e432730e547ff7397680beb9ebe5a
MD5 ed6f2324e8588bcf7c1b23c36865c616
BLAKE2b-256 5fb2b4f554ccb5e5fe3b39a34ad895e4aa0d1fc88065e2d7e2005c5a80daf79e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page