A flexible framework for experimenting with and evaluating different sample selection strategies

These details have not been verified by PyPI

Project description

Exemplar Sample Selection Framework

A flexible framework for experimenting with and evaluating different sample selection strategies. This framework allows you to:

Compare different selection strategies
Evaluate using multiple metrics
Work with various datasets
Extend with custom strategies and metrics

Installation

Clone the repository:

git clone https://github.com/yourusername/exemplar-sample-selection.git
cd exemplar-sample-selection

Install dependencies using Poetry:

# Install Poetry if you haven't already:
# curl -sSL https://install.python-poetry.org | python3 -

# Install dependencies and create virtual environment
poetry install

Activate the virtual environment:

poetry shell

Quick Start

Run the example experiment:

python examples/run_experiment.py

This will:

Load the IMDB dataset
Extract features using a sentence transformer
Run random selection (baseline strategy)
Evaluate using coverage metrics
Save results to outputs/imdb_random/

Framework Structure

exemplar-sample-selection/
├── src/
│   ├── data/              # Dataset handling
│   ├── selection/         # Selection strategies
│   ├── metrics/           # Evaluation metrics
│   ├── experiments/       # Experiment management
│   └── utils/             # Utilities
├── tests/                 # Unit tests
├── configs/               # Experiment configs
├── examples/              # Example scripts
└── docs/                  # Documentation

Core Components

1. Dataset Management

Standardized dataset interface
Built-in support for text datasets
Feature extraction and caching
Easy extension to other data types

2. Selection Strategies

Base strategy interface
Random selection baseline
Support for both supervised and unsupervised selection
Easy addition of new strategies

3. Evaluation Metrics

Coverage metrics
Distribution matching
Performance metrics
Extensible metric system

4. Experiment Management

Configuration-based setup
Automated logging
Result tracking
Reproducible experiments

Adding New Components

Adding a New Selection Strategy

Create a new file in src/selection/:

from .base import SelectionStrategy

class MyStrategy(SelectionStrategy):
    def select(self, features, labels=None, n_samples=100):
        # Implement your selection logic here
        return selected_indices

Adding a New Metric

Create a new file in src/metrics/:

from .base import Metric

class MyMetric(Metric):
    def compute(self, selected_features, full_features, 
                selected_labels=None, full_labels=None):
        # Implement your metric computation here
        return {'my_metric': value}

Running Experiments

1. Create Configuration

from src.experiments import ExperimentConfig
from src.experiments.config import DatasetConfig, SelectionConfig, MetricConfig

config = ExperimentConfig(
    name="My Experiment",
    dataset=DatasetConfig(
        name="dataset_name",
        split="train"
    ),
    selection=SelectionConfig(
        name="strategy_name",
        params={"param1": value1},
        n_samples=1000
    ),
    metrics=[
        MetricConfig(
            name="metric_name",
            params={"param1": value1}
        )
    ]
)

2. Run Experiment

from src.experiments import ExperimentRunner

runner = ExperimentRunner(config)
results = runner.run()

3. Examine Results

Results are saved in the output directory:

config.json: Experiment configuration
results.json: Detailed results
summary.txt: Human-readable summary
experiment.log: Execution log

Contributing

Fork the repository
Create a feature branch
Make your changes
Add tests if applicable
Submit a pull request

License

MIT License - see LICENSE file for details

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.0.1

Jan 18, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

coreset-0.0.1.tar.gz (12.1 kB view details)

Uploaded Jan 18, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

coreset-0.0.1-py3-none-any.whl (16.5 kB view details)

Uploaded Jan 18, 2025 Python 3

File details

Details for the file coreset-0.0.1.tar.gz.

File metadata

Download URL: coreset-0.0.1.tar.gz
Upload date: Jan 18, 2025
Size: 12.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.0.1 CPython/3.9.9

File hashes

Hashes for coreset-0.0.1.tar.gz
Algorithm	Hash digest
SHA256	`6e803a1baf065314ea930013f13cc2836a8fba3badaa5e2d2ee30ebe500f6261`
MD5	`a70a35c451ab214fd136a38b5dab3537`
BLAKE2b-256	`858b186c7ab171c013885d0892948f439355da2641cb431a4ca4e929388ac6cc`

See more details on using hashes here.

File details

Details for the file coreset-0.0.1-py3-none-any.whl.

File metadata

Download URL: coreset-0.0.1-py3-none-any.whl
Upload date: Jan 18, 2025
Size: 16.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.0.1 CPython/3.9.9

File hashes

Hashes for coreset-0.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`83d4cf8f805545c443c02eb0c9701c77156e432730e547ff7397680beb9ebe5a`
MD5	`ed6f2324e8588bcf7c1b23c36865c616`
BLAKE2b-256	`5fb2b4f554ccb5e5fe3b39a34ad895e4aa0d1fc88065e2d7e2005c5a80daf79e`

See more details on using hashes here.

coreset 0.0.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

Exemplar Sample Selection Framework

Installation

Quick Start

Framework Structure

Core Components

1. Dataset Management

2. Selection Strategies

3. Evaluation Metrics

4. Experiment Management

Adding New Components

Adding a New Selection Strategy

Adding a New Metric

Running Experiments

1. Create Configuration

2. Run Experiment

3. Examine Results

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes