Skip to main content

A lightweight Python library that automates TinyBERT fine-tuning with Genetic Algorithm hyperparameter optimisation

Project description

tinyLMTune

Genetic-Algorithm-Optimised TinyBERT Fine-Tuning in one function call.

tinyLMTune automates the full pipeline: (1) dataset building
(2) GA hyperparameter search (3) TinyBERT fine-tuning (4) model export

Bring your own data or let it generate synthetic training data automatically.

Installation

pip install -e .

Quick Start

Option 1 — Bring your own data (recommended)

from tinylmtune import optimize_slm, TinyInference

my_data = [
    {"text": "Loved this product!", "label": "positive"},
    {"text": "Broke after one day.", "label": "negative"},
    {"text": "Works fine, nothing special.", "label": "neutral"},
    # ... at least 50+ records recommended
]

best = optimize_slm(
    task="classification",
    user_data=my_data,
    output_dir="my_model",
)

model = TinyInference("my_model")
print(model.predict("Absolutely amazing quality!"))

Option 2 — Use a JSONL file

best = optimize_slm(
    task="classification",
    corpus_path="my_data.jsonl",   # one JSON object per line
    output_dir="my_model",
)

Option 3 — Auto-generate synthetic data (default fallback)

If neither user_data nor corpus_path is provided, tinyLMTune generates synthetic training data via a local Ollama/Mistral model:

best = optimize_slm(
    task="classification",
    corpus_prompt="Generate movie-review sentiment examples",
    n_examples=200,
    output_dir="my_model",
)

Data Format

Each record is a dict (or a JSON line in a .jsonl file). The required keys depend on the task:

Task Required keys Example
classification text, label {"text": "Great film!", "label": "positive"}
summarization text, summary {"text": "Long article...", "summary": "Short version."}
qna question, answer {"question": "What is X?", "answer": "X is..."}
generation prompt, completion {"prompt": "Once upon a", "completion": "time there was..."}
ner text, entities {"text": "John in NYC", "entities": [{"text": "John", "label": "PER", "start": 0, "end": 4}]}

Records missing required keys are skipped with a warning. If all records are invalid, an error is raised showing the expected format.

Public API

Only two symbols are exported:

Symbol Purpose
optimize_slm() Full train pipeline — returns best config
TinyInference Load & predict with a saved model

All internal modules live in tinylmtune._internal and are not part of the public API.

Project Structure

tinylmtune/
├── __init__.py              # Public API: optimize_slm, TinyInference
├── _internal/               # Private — do not import directly
│   ├── __init__.py
│   ├── constants.py         # Model names, task mappings, GA search space
│   ├── cleaner.py           # Text cleaning utilities
│   ├── corpus_gen.py        # Synthetic data generation (Ollama/Mistral)
│   ├── dataset.py           # list[dict] / JSONL → HuggingFace Dataset
│   ├── model_builder.py     # TinyBERT model instantiation
│   ├── trainer.py           # HuggingFace Trainer wrapper
│   ├── ga_optimizer.py      # Genetic algorithm hyperparameter search
│   ├── inference.py         # TinyInference class + save_best_model
│   └── pipeline.py          # optimize_slm() orchestrator
├── setup.py
└── README.md

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tinylmtune-0.0.1.tar.gz (4.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tinylmtune-0.0.1-py3-none-any.whl (3.8 kB view details)

Uploaded Python 3

File details

Details for the file tinylmtune-0.0.1.tar.gz.

File metadata

  • Download URL: tinylmtune-0.0.1.tar.gz
  • Upload date:
  • Size: 4.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.12

File hashes

Hashes for tinylmtune-0.0.1.tar.gz
Algorithm Hash digest
SHA256 b1aa36457a097be36db2558275b6d31aa2580a276faffdd0fcc12a2ed8308457
MD5 095b723ee246a936134b0bbcc1b9958a
BLAKE2b-256 fdb16dbd71e8c83b5f2f62ccf3aaef3e09776bc488090c584c46ef72fcb1a66d

See more details on using hashes here.

File details

Details for the file tinylmtune-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: tinylmtune-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 3.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.12

File hashes

Hashes for tinylmtune-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e1862adf524cc02d6bacc481e8d6d6d51afb5b5a0a5da0ceb2b29d4ed1add865
MD5 b72a8f077bfc320ef1de65869f909042
BLAKE2b-256 09adf453a99d33f4d56ca23760e7c498c894d8b83d6b660876a9a8ec45bc735c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page