Fast C-backed trigram language model for word prediction and sentence completion

These details have not been verified by PyPI

Project links

Project description

trigram-llm 🧠

A fast, production-ready Python library for next-word prediction and sentence completion, powered by a hand-written C engine using a Prefix Trie, DJB2 HashMap, and Stupid Backoff smoothing.

Sub-millisecond predictions · Zero dependencies · Pure ctypes · Thread-safe

Features

Feature	Description
`train_from_text(text)`	Train from any Python string
`train_from_file(path)`	Train from a text file (incremental)
`train_from_list(words)`	Train from a pre-tokenised word list
`predict_next(w1, w2)`	Greedy single-word prediction (< 1ms)
`predict_top_n(w1, w2, n, temperature)`	Top-N predictions with probabilities
`complete_sentence(prompt, num_words, beam_width)`	Beam search sentence generation
`greedy_generate(prompt, num_words)`	Fastest sentence completion
`perplexity(text)`	Evaluate model quality on held-out text
`vocabulary()`	Returns all known words as a Python `set`
`get_stats()`	Dict with trigram count, vocab size, etc.
`save(path)` / `TrigramModel.load(path)`	Binary model persistence
`reset()`	Clear model and retrain from scratch
`"the quick" in model`	Check if a bigram context was seen
`len(model)`	Total number of stored trigrams
Thread-safe	All predictions guarded by a `threading.Lock`
Context manager	`with TrigramModel.load(path) as m:`

Installation

Prerequisites

Python 3.8+
GCC (macOS: xcode-select --install, Ubuntu: sudo apt install gcc)

Install (one command)

cd /path/to/Trigrams
pip install -e .

This compiles the C engine into trigram/_trigram_c.dylib (or .so on Linux) and installs the package in editable mode.

Quickstart

from trigram import TrigramModel

# 1. Create and train
model = TrigramModel()
model.train_from_text("""
    The quick brown fox jumps over the lazy dog.
    The quick brown fox was nimble and swift.
    The lazy dog slept peacefully under the old oak tree.
""")

# 2. Predict next word (greedy)
word = model.predict_next("the", "quick")
print(word)  # → "brown"

# 3. Top-N predictions with probabilities
preds = model.predict_top_n("the", "quick", n=3, temperature=1.0)
# [{"word": "brown", "probability": 0.75, "count": 2},
#  {"word": "red",   "probability": 0.25, "count": 1}]

# 4. Sentence completion (beam search)
completions = model.complete_sentence("the quick", num_words=4, beam_width=3)
# [{"sentence": "the quick brown fox jumps", "probability": 0.012}, ...]

# 5. Greedy generation (fastest)
sentence = model.greedy_generate("the quick", num_words=3)
# "the quick brown fox"

# 6. Evaluate quality
ppl = model.perplexity("the quick brown fox")
print(f"Perplexity: {ppl:.2f}")

# 7. Inspect model
print(len(model))          # → total trigrams
print("the quick" in model)  # → True
print(model.vocabulary())  # → {"the", "quick", "brown", ...}
print(model.get_stats())   # → {"total_trigrams": 7, "unique_first_words": 3, ...}

Training from a File

model = TrigramModel()
model.train_from_file("path/to/my_corpus.txt")

# Incremental training — add more data later
model.train_from_file("path/to/more_data.txt")

Saving and Loading Models

# Save
model.save("my_model.bin")

# Load (class method)
model2 = TrigramModel.load("my_model.bin")

# Context manager (auto-frees on exit)
with TrigramModel.load("my_model.bin") as m:
    print(m.predict_next("the", "quick"))

Temperature Sampling

The temperature parameter controls how creative predictions are:

# Deterministic — always picks the most common word
model.predict_top_n("the", "quick", temperature=0.1)

# Standard probability distribution
model.predict_top_n("the", "quick", temperature=1.0)

# More diverse / creative
model.predict_top_n("the", "quick", temperature=2.0)

Advanced Usage

Train from a word list (custom tokenisation)

import nltk
tokens = nltk.word_tokenize("The quick brown fox")
tokens = [t.lower() for t in tokens if t.isalpha()]

model = TrigramModel()
model.train_from_list(tokens)

Thread-safe batch prediction

import threading

def worker(model, results, idx):
    results[idx] = model.predict_top_n("the", "quick", n=5)

model = TrigramModel.load("model.bin")
results = [None] * 10
threads = [threading.Thread(target=worker, args=(model, results, i)) for i in range(10)]
for t in threads: t.start()
for t in threads: t.join()

Check if a context exists before predicting

if "the quick" in model:
    result = model.predict_next("the", "quick")

API Reference

`TrigramModel()`

Creates a new empty model.

`train_from_text(text: str) → int`

Train on a raw text string. Returns trigrams inserted.

`train_from_file(path) → int`

Train from a text file. Returns trigrams inserted.

`train_from_list(words: list) → int`

Train from a pre-tokenised word list. Returns trigrams inserted.

`predict_next(w1, w2) → str | None`

Return the single most-likely next word or None.

`predict_top_n(w1, w2, n=5, temperature=1.0) → list[dict]`

Return up to N predictions sorted by probability descending. Each dict: {"word": str, "probability": float, "count": int}.

`complete_sentence(prompt, num_words=5, beam_width=3) → list[dict]`

Generate sentence completions via beam search. Each dict: {"sentence": str, "probability": float}.

`greedy_generate(prompt, num_words=5) → str`

Fastest sentence completion using greedy decoding.

`perplexity(text) → float`

Compute per-token perplexity on held-out text. Lower = better.

`vocabulary() → set[str]`

All words seen in the first-word position of training trigrams.

`get_stats() → dict`

{"total_trigrams": int, "unique_first_words": int, "vocabulary_size": int}.

`save(path) → None`

Save model to binary file. Compatible with the C CLI tool.

`TrigramModel.load(path) → TrigramModel` (classmethod)

Load a pre-trained binary model. Supports context manager protocol.

`reset() → None`

Clear all training data.

`len(model)` → int

Total stored trigrams.

`"w1 w2" in model` / `("w1", "w2") in model` → bool

Check if a bigram context exists.

`repr(model)`

TrigramModel(trigrams=11,062,203, vocab=97,277)

Performance

Operation	Latency
Single word prediction	< 1ms
Top-5 predictions	1–2ms
Beam search (5 words, width 3)	5–10ms
Training (1M words)	~30s

Running Tests

pip install pytest
pytest tests/ -v

Project Structure

Trigrams/
├── trigram/                  # Python library
│   ├── __init__.py
│   ├── _lib.py               # ctypes bindings
│   ├── model.py              # TrigramModel class
│   ├── utils.py              # Text preprocessing
│   └── _trigram_c.dylib      # Compiled C engine (auto-generated)
├── trigram_llm/
│   ├── src/                  # C source files
│   └── include/              # C headers
├── tests/                    # pytest test suite
├── setup.py                  # Build script
└── pyproject.toml

License

MIT License — feel free to use, modify, and distribute.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Jun 12, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

trigram_llm-0.1.0.tar.gz (55.0 kB view details)

Uploaded Jun 12, 2026 Source

File details

Details for the file trigram_llm-0.1.0.tar.gz.

File metadata

Download URL: trigram_llm-0.1.0.tar.gz
Upload date: Jun 12, 2026
Size: 55.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.13

File hashes

Hashes for trigram_llm-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`6e5da04f2cfc928b2681e719fe6180e914fab96c3a13f6b29a18c63a2729a518`
MD5	`0d82bf8a619a160bbbe0024e616620df`
BLAKE2b-256	`d7f456a31465332dba005e77771ae98adc8da48784594d74b2fe389ba2bfbfbf`

See more details on using hashes here.

trigram-llm 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

trigram-llm 🧠

Features

Installation

Prerequisites

Install (one command)

Quickstart

Training from a File

Saving and Loading Models

Temperature Sampling

Advanced Usage

Train from a word list (custom tokenisation)

Thread-safe batch prediction

Check if a context exists before predicting

API Reference

TrigramModel()

train_from_text(text: str) → int

train_from_file(path) → int

train_from_list(words: list) → int

predict_next(w1, w2) → str | None

predict_top_n(w1, w2, n=5, temperature=1.0) → list[dict]

complete_sentence(prompt, num_words=5, beam_width=3) → list[dict]

greedy_generate(prompt, num_words=5) → str

perplexity(text) → float

vocabulary() → set[str]

get_stats() → dict

save(path) → None

TrigramModel.load(path) → TrigramModel (classmethod)

reset() → None

len(model) → int

"w1 w2" in model / ("w1", "w2") in model → bool

repr(model)

Performance

Running Tests

Project Structure

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes

`TrigramModel()`

`train_from_text(text: str) → int`

`train_from_file(path) → int`

`train_from_list(words: list) → int`

`predict_next(w1, w2) → str | None`

`predict_top_n(w1, w2, n=5, temperature=1.0) → list[dict]`

`complete_sentence(prompt, num_words=5, beam_width=3) → list[dict]`

`greedy_generate(prompt, num_words=5) → str`

`perplexity(text) → float`

`vocabulary() → set[str]`

`get_stats() → dict`

`save(path) → None`

`TrigramModel.load(path) → TrigramModel` (classmethod)

`reset() → None`

`len(model)` → int

`"w1 w2" in model` / `("w1", "w2") in model` → bool

`repr(model)`