Skip to main content

A lightweight Python library that automates TinyBERT fine-tuning with Genetic Algorithm hyperparameter optimisation

Project description

tinyLMTune

Genetic-Algorithm-Optimised TinyBERT Fine-Tuning in one function call.

tinyLMTune automates the full pipeline: (1) dataset building
(2) GA hyperparameter search (3) TinyBERT fine-tuning (4) model export

Bring your own data or let it generate synthetic training data automatically.

Installation

pip install tinylmtune

Quick Start

Option 1 — Bring your own data (recommended)

from tinylmtune import optimize_slm, TinyInference

my_data = [
    {"text": "Loved this product!", "label": "positive"},
    {"text": "Broke after one day.", "label": "negative"},
    {"text": "Works fine, nothing special.", "label": "neutral"},
    # ... at least 50+ records recommended
]

best = optimize_slm(
    task="classification",
    user_data=my_data,
    output_dir="my_model",
)

model = TinyInference("my_model")
print(model.predict("Absolutely amazing quality!"))

Option 2 — Use a JSONL file

best = optimize_slm(
    task="classification",
    corpus_path="my_data.jsonl",   # one JSON object per line
    output_dir="my_model",
)

Option 3 — Auto-generate synthetic data (default fallback)

If neither user_data nor corpus_path is provided, tinyLMTune generates synthetic training data via a local Ollama/Mistral model:

best = optimize_slm(
    task="classification",
    corpus_prompt="Generate movie-review sentiment examples",
    n_examples=200,
    output_dir="my_model",
)

Data Format

Each record is a dict (or a JSON line in a .jsonl file). The required keys depend on the task:

Task Required keys Example
classification text, label {"text": "Great film!", "label": "positive"}
summarization text, summary {"text": "Long article...", "summary": "Short version."}
qna question, answer {"question": "What is X?", "answer": "X is..."}
generation prompt, completion {"prompt": "Once upon a", "completion": "time there was..."}
ner text, entities {"text": "John in NYC", "entities": [{"text": "John", "label": "PER", "start": 0, "end": 4}]}

Records missing required keys are skipped with a warning. If all records are invalid, an error is raised showing the expected format.

Public API

Only two symbols are exported:

Symbol Purpose
optimize_slm() Full train pipeline — returns best config
TinyInference Load & predict with a saved model

All internal modules live in tinylmtune._internal and are not part of the public API.

Project Structure

tinylmtune/
├── __init__.py              # Public API: optimize_slm, TinyInference
├── _internal/               # Private — do not import directly
│   ├── __init__.py
│   ├── constants.py         # Model names, task mappings, GA search space
│   ├── cleaner.py           # Text cleaning utilities
│   ├── corpus_gen.py        # Synthetic data generation (Ollama/Mistral)
│   ├── dataset.py           # list[dict] / JSONL → HuggingFace Dataset
│   ├── model_builder.py     # TinyBERT model instantiation
│   ├── trainer.py           # HuggingFace Trainer wrapper
│   ├── ga_optimizer.py      # Genetic algorithm hyperparameter search
│   ├── inference.py         # TinyInference class + save_best_model
│   └── pipeline.py          # optimize_slm() orchestrator
├── setup.py
└── README.md

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tinylmtune-0.0.3.tar.gz (33.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tinylmtune-0.0.3-py3-none-any.whl (38.1 kB view details)

Uploaded Python 3

File details

Details for the file tinylmtune-0.0.3.tar.gz.

File metadata

  • Download URL: tinylmtune-0.0.3.tar.gz
  • Upload date:
  • Size: 33.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.12

File hashes

Hashes for tinylmtune-0.0.3.tar.gz
Algorithm Hash digest
SHA256 6f443cfb386c1f9a4c7b9447220f5dd424f65bb1c4ece56ea7668a4c817df95e
MD5 0f08f62cfeaeab9e70a77b87becbdef0
BLAKE2b-256 a8b08e9461e382357cf5cd5ba743d4ac949becbe323022ae2c2a50c063e53c52

See more details on using hashes here.

File details

Details for the file tinylmtune-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: tinylmtune-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 38.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.12

File hashes

Hashes for tinylmtune-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 d2e4b4387d02d0702e0ed576424bd69b9855bbcc53c5b37938677de1e13a0ea0
MD5 7ad0d5ee34919cd8e8c467b94d5f24fc
BLAKE2b-256 78c78b96d898f695b85352a18d03308ed2cf9d12726afa1c86981c89debd48b4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page