Skip to main content

A lightweight Python library that automates TinyBERT fine-tuning with Genetic Algorithm hyperparameter optimisation

Project description

tinyLMTune

Genetic-Algorithm-Optimised TinyBERT Fine-Tuning in one function call.

tinyLMTune automates the full pipeline: (1) dataset building
(2) GA hyperparameter search (3) TinyBERT fine-tuning (4) model export

Bring your own data or let it generate synthetic training data automatically.

Installation

pip install tinylmtune

Quick Start

Option 1 — Bring your own data (recommended)

from tinylmtune import optimize_slm, TinyInference

my_data = [
    {"text": "Loved this product!", "label": "positive"},
    {"text": "Broke after one day.", "label": "negative"},
    {"text": "Works fine, nothing special.", "label": "neutral"},
    # ... at least 50+ records recommended
]

best = optimize_slm(
    task="classification",
    user_data=my_data,
    output_dir="my_model",
)

model = TinyInference("my_model")
print(model.predict("Absolutely amazing quality!"))

Option 2 — Use a JSONL file

best = optimize_slm(
    task="classification",
    corpus_path="my_data.jsonl",   # one JSON object per line
    output_dir="my_model",
)

Option 3 — Auto-generate synthetic data (default fallback)

If neither user_data nor corpus_path is provided, tinyLMTune generates synthetic training data via a local Ollama/Mistral model:

best = optimize_slm(
    task="classification",
    corpus_prompt="Generate movie-review sentiment examples",
    n_examples=200,
    output_dir="my_model",
)

Data Format

Each record is a dict (or a JSON line in a .jsonl file). The required keys depend on the task:

Task Required keys Example
classification text, label {"text": "Great film!", "label": "positive"}
summarization text, summary {"text": "Long article...", "summary": "Short version."}
qna question, answer {"question": "What is X?", "answer": "X is..."}
generation prompt, completion {"prompt": "Once upon a", "completion": "time there was..."}
ner text, entities {"text": "John in NYC", "entities": [{"text": "John", "label": "PER", "start": 0, "end": 4}]}

Records missing required keys are skipped with a warning. If all records are invalid, an error is raised showing the expected format.

Public API

Only two symbols are exported:

Symbol Purpose
optimize_slm() Full train pipeline — returns best config
TinyInference Load & predict with a saved model

All internal modules live in tinylmtune._internal and are not part of the public API.

Project Structure

tinylmtune/
├── __init__.py              # Public API: optimize_slm, TinyInference
├── _internal/               # Private — do not import directly
│   ├── __init__.py
│   ├── constants.py         # Model names, task mappings, GA search space
│   ├── cleaner.py           # Text cleaning utilities
│   ├── corpus_gen.py        # Synthetic data generation (Ollama/Mistral)
│   ├── dataset.py           # list[dict] / JSONL → HuggingFace Dataset
│   ├── model_builder.py     # TinyBERT model instantiation
│   ├── trainer.py           # HuggingFace Trainer wrapper
│   ├── ga_optimizer.py      # Genetic algorithm hyperparameter search
│   ├── inference.py         # TinyInference class + save_best_model
│   └── pipeline.py          # optimize_slm() orchestrator
├── setup.py
└── README.md

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tinylmtune-0.0.2.tar.gz (4.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tinylmtune-0.0.2-py3-none-any.whl (3.8 kB view details)

Uploaded Python 3

File details

Details for the file tinylmtune-0.0.2.tar.gz.

File metadata

  • Download URL: tinylmtune-0.0.2.tar.gz
  • Upload date:
  • Size: 4.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.12

File hashes

Hashes for tinylmtune-0.0.2.tar.gz
Algorithm Hash digest
SHA256 3c0a87657fbe1723ed6e68da88ee72ba9ec7fa76880fcc9548143cf98d85bf92
MD5 5b04bdeef775af6ab14ef27b8c35cd19
BLAKE2b-256 b9db161583b9cb53fc5eb67a8ac120d6183a5fb09ca7724eae591633f9642e34

See more details on using hashes here.

File details

Details for the file tinylmtune-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: tinylmtune-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 3.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.12

File hashes

Hashes for tinylmtune-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 a66301769d98667cb69b2f157aba43350edd2c0d079fee97455d136b90b39af3
MD5 6290ad2acda729cdd42c8cef2ed441b5
BLAKE2b-256 da21d7f4d6e7119870770bfe6168969f8cfb04fc37d4c6049bc260c65228b526

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page