A lightweight Python library that automates TinyBERT fine-tuning with Genetic Algorithm hyperparameter optimisation
Project description
tinyLMTune
Genetic-Algorithm-Optimised TinyBERT Fine-Tuning in one function call.
tinyLMTune automates the full pipeline:
(1) dataset building
(2) GA hyperparameter search
(3) TinyBERT fine-tuning
(4) model export
Bring your own data or let it generate synthetic training data automatically.
Installation
pip install tinylmtune
Quick Start
Option 1 — Bring your own data (recommended)
from tinylmtune import optimize_slm, TinyInference
my_data = [
{"text": "Loved this product!", "label": "positive"},
{"text": "Broke after one day.", "label": "negative"},
{"text": "Works fine, nothing special.", "label": "neutral"},
# ... at least 50+ records recommended
]
best = optimize_slm(
task="classification",
user_data=my_data,
output_dir="my_model",
)
model = TinyInference("my_model")
print(model.predict("Absolutely amazing quality!"))
Option 2 — Use a JSONL file
best = optimize_slm(
task="classification",
corpus_path="my_data.jsonl", # one JSON object per line
output_dir="my_model",
)
Option 3 — Auto-generate synthetic data (default fallback)
If neither user_data nor corpus_path is provided, tinyLMTune generates
synthetic training data via a local Ollama/Mistral model:
best = optimize_slm(
task="classification",
corpus_prompt="Generate movie-review sentiment examples",
n_examples=200,
output_dir="my_model",
)
Data Format
Each record is a dict (or a JSON line in a .jsonl file). The required keys depend on the task:
| Task | Required keys | Example |
|---|---|---|
classification |
text, label |
{"text": "Great film!", "label": "positive"} |
summarization |
text, summary |
{"text": "Long article...", "summary": "Short version."} |
qna |
question, answer |
{"question": "What is X?", "answer": "X is..."} |
generation |
prompt, completion |
{"prompt": "Once upon a", "completion": "time there was..."} |
ner |
text, entities |
{"text": "John in NYC", "entities": [{"text": "John", "label": "PER", "start": 0, "end": 4}]} |
Records missing required keys are skipped with a warning. If all records are invalid, an error is raised showing the expected format.
Public API
Only two symbols are exported:
| Symbol | Purpose |
|---|---|
optimize_slm() |
Full train pipeline — returns best config |
TinyInference |
Load & predict with a saved model |
All internal modules live in tinylmtune._internal and are not part of the public API.
Project Structure
tinylmtune/
├── __init__.py # Public API: optimize_slm, TinyInference
├── _internal/ # Private — do not import directly
│ ├── __init__.py
│ ├── constants.py # Model names, task mappings, GA search space
│ ├── cleaner.py # Text cleaning utilities
│ ├── corpus_gen.py # Synthetic data generation (Ollama/Mistral)
│ ├── dataset.py # list[dict] / JSONL → HuggingFace Dataset
│ ├── model_builder.py # TinyBERT model instantiation
│ ├── trainer.py # HuggingFace Trainer wrapper
│ ├── ga_optimizer.py # Genetic algorithm hyperparameter search
│ ├── inference.py # TinyInference class + save_best_model
│ └── pipeline.py # optimize_slm() orchestrator
├── setup.py
└── README.md
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tinylmtune-0.0.2.tar.gz.
File metadata
- Download URL: tinylmtune-0.0.2.tar.gz
- Upload date:
- Size: 4.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3c0a87657fbe1723ed6e68da88ee72ba9ec7fa76880fcc9548143cf98d85bf92
|
|
| MD5 |
5b04bdeef775af6ab14ef27b8c35cd19
|
|
| BLAKE2b-256 |
b9db161583b9cb53fc5eb67a8ac120d6183a5fb09ca7724eae591633f9642e34
|
File details
Details for the file tinylmtune-0.0.2-py3-none-any.whl.
File metadata
- Download URL: tinylmtune-0.0.2-py3-none-any.whl
- Upload date:
- Size: 3.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a66301769d98667cb69b2f157aba43350edd2c0d079fee97455d136b90b39af3
|
|
| MD5 |
6290ad2acda729cdd42c8cef2ed441b5
|
|
| BLAKE2b-256 |
da21d7f4d6e7119870770bfe6168969f8cfb04fc37d4c6049bc260c65228b526
|