BOLT: Benchmarking Open-world Learning for Text classification

Project description

BOLT-Lab

BOLT-Lab is a self-contained Python package for benchmarking open-world learning (OWL) in text classification. It wraps 19 baseline methods (11 GCD + 8 Open-set) via subprocess calls and provides a unified grid experiment runner.

1. Installation

Requirements

Linux + NVIDIA GPU
Python 3.10
NVIDIA driver installed (nvidia-smi works)

Steps (run in order)

Install bolt-lab

pip install bolt_lab

Install PyTorch (CUDA 12.6 uses cu126)

pip install torch==2.8.0 torchvision==0.23.0 torchaudio==2.8.0 --index-url https://download.pytorch.org/whl/cu126

Install NVCC (use conda only for this step)

conda install -c nvidia cuda-nvcc -y

Install the remaining Python dependencies

pip install -r requirements.txt

Install flash-attn (install separately to avoid build failures)

mkdir -p ~/tmp/pip
TMPDIR=~/tmp/pip pip install --no-build-isolation --no-cache-dir flash-attn==2.8.3

Quick self-check

python -c "import torch; print(torch.__version__); print(torch.cuda.is_available())"
python -c "from bolt_lab.methods import list_methods; print(list_methods())"
bolt-grid --help

2. Environment Variables

Variable	Description	Example
`BOLT_DATA_DIR`	Path to BOLT datasets	`/path/to/bolt/data`
`BOLT_PRETRAINED_MODELS`	Path to pretrained models directory	`/path/to/pretrained_models`
`BOLT_INTEGRATION`	Set to `1` to run integration tests	`1`

Set them in your shell or pass via --model-dir:

export BOLT_DATA_DIR=/path/to/bolt/data
export BOLT_PRETRAINED_MODELS=/path/to/pretrained_models

3. Usage

Initialize workspace

bolt-grid --init-only --output-dir ./bolt_workspace --model-dir /path/to/pretrained_models

This creates the directory structure and copies editable configs to ./bolt_workspace/configs/.

Run experiments

bolt-grid --config grid_gcd.yaml --output-dir ./bolt_workspace --model-dir /path/to/pretrained_models

Arguments

Argument	Description
`--config`	Grid config YAML. Bare names are resolved from `output-dir/configs/`, then package builtins.
`--output-dir`	Working directory for all outputs/results/logs.
`--model-dir`	Pretrained models directory (bert-base-uncased, etc.).
`--init-only`	Initialize workspace only, do not run experiments.
`--overwrite-configs`	Re-copy config files from package to output-dir.

Typical workflow

# 1. Initialize and edit configs
bolt-grid --init-only --output-dir ./bolt_workspace --model-dir /path/to/pretrained_models
vim ./bolt_workspace/configs/grid_gcd.yaml

# 2. Run
bolt-grid --config grid_gcd.yaml --output-dir ./bolt_workspace --model-dir /path/to/pretrained_models

4. Grid Config Example

methods: [loop, glean, alup, geoid, sdc, dpn, deepaligned, tan]
datasets: [banking, clinc, stackoverflow]
result_file: summary_gcd

grid:
  known_cls_ratio: [0.25, 0.5, 0.75]
  labeled_ratio: [0.1, 0.5, 1.0]
  seeds: [2025]
  fold_types: [fold]
  fold_idxs: [0,1,2,3,4]
  fold_nums: [5]
  cluster_num_factor: [1.0]

run:
  gpus: [0,1,2,3]
  max_workers: 4
  num_pretrain_epochs: 100
  num_train_epochs: 50

5. Output Structure

After running with --output-dir ./bolt_workspace:

bolt_workspace/
├── configs/          # Editable YAML configs (safe to modify)
├── outputs/          # Training artifacts (models, predictions)
├── results/          # Result CSVs + _index.json (dedup index)
├── logs/             # Experiment logs
├── data -> ...       # Symlink to dataset directory
└── pretrained_models -> ...  # Symlink to model directory

Deduplication

Completed experiments are tracked in results/<task>/<method>/results.csv. Re-running the same grid config will automatically skip finished experiments based on matching method, dataset, known_cls_ratio, labeled_ratio, seed, and fold parameters.

6. Methods

GCD (11 methods)

Name	Description
loop	KNN + SupConLoss + MLM pretrain
glean	KNN + DistillLoss + LLM cluster characterization
alup	Active Learning with LLM labeling
geoid	GeoID clustering
sdc	Self-paced Deep Clustering
dpn	Deep Pairwise Network
deepaligned	DeepAligned Clustering
tan	TAN method
tlsa	TLSA method
plm_gcd	PLM-based GCD
llm4openssl	Llama-based GCD (SFTTrainer + LoRA)

Open-set (8 methods)

Name	Description
ab	Adaptive Boundary
adb	Adaptive Decision Boundary
doc	DOC method
deepunk	DeepUnk (TF/Keras)
scl	Supervised Contrastive Learning (TF/Keras)
dyen	Dynamic Ensemble
knncon	KNN-Contrastive
unllm	Llama-based open-set (SFTTrainer + LoRA)

All methods are subprocess wrappers. Training source code is bundled in _builtin/_src/.

7. Notes

Do not point --output-dir to the outputs/ directory itself. Point it to an experiment root so outputs/results/logs/ are created as subdirectories.
data/ and pretrained_models/ under output-dir are symlinks. Do not edit them directly.
If flash-attn installation fails: check torch.cuda.is_available(), CUDA version match, and disk space.

8. Updating the Package

cd /path/to/bolt-lab
# Edit version in pyproject.toml if needed
pip install -e .

Since bolt-lab is installed in editable mode (-e), code changes take effect immediately. Only re-run pip install -e . after changing pyproject.toml.

Project details

Release history Release notifications | RSS feed

This version

1.0.2

Apr 16, 2026

1.0.1

Apr 16, 2026

1.0.0

Apr 16, 2026

0.1.1

Dec 31, 2025

0.1.0

Dec 31, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bolt_lab-1.0.2.tar.gz (26.9 MB view details)

Uploaded Apr 16, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

bolt_lab-1.0.2-py3-none-any.whl (27.5 MB view details)

Uploaded Apr 16, 2026 Python 3

File details

Details for the file bolt_lab-1.0.2.tar.gz.

File metadata

Download URL: bolt_lab-1.0.2.tar.gz
Upload date: Apr 16, 2026
Size: 26.9 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for bolt_lab-1.0.2.tar.gz
Algorithm	Hash digest
SHA256	`fc87c6b121a78516f8d2ce6452a48683738fc273ecca90d5f44afa0d44df1d0a`
MD5	`86a43a1863d62b12ec6629d1dd6c8ac6`
BLAKE2b-256	`8a854a97aeb2116f2306785045f969c9d2f5d69706fe03f7e09c5892e30f156e`

See more details on using hashes here.

File details

Details for the file bolt_lab-1.0.2-py3-none-any.whl.

File metadata

Download URL: bolt_lab-1.0.2-py3-none-any.whl
Upload date: Apr 16, 2026
Size: 27.5 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for bolt_lab-1.0.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`02138b84fc60f2bce4204ad8e90f601155501ee2faa99e495a59e2e6bcde2392`
MD5	`fb5d4d59a7fc05137ec52b65362b8189`
BLAKE2b-256	`cf137d1d901ad0fec82834a90f200dc08b5ae59578b1bd5633254526adad13d6`

See more details on using hashes here.

bolt-lab 1.0.2

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

BOLT-Lab

1. Installation

Requirements

Steps (run in order)

Quick self-check

2. Environment Variables

3. Usage

Initialize workspace

Run experiments

Arguments

Typical workflow

4. Grid Config Example

5. Output Structure

Deduplication

6. Methods

GCD (11 methods)

Open-set (8 methods)

7. Notes

8. Updating the Package

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes