Controlled-vocabulary prompts plus portable GBNF and JSON Schema resources for small-word English generation.
Project description
smallwords
smallwords is a tiny Python package for controlled-vocabulary prompting plus
portable output resources. It keeps one wordlist at the center of the workflow
so prompt text, GBNF, JSON Schema, and post-generation validation all stay in
sync.
The package ships with a small set of bundled wordlists: direct source-backed
lists such as moby_898, basic_850, and special_english_1475, plus a
couple of intentionally themed remixes. By default, the built-ins also allow
slight family variants such as go, goes, and going.
It supports Python 3.10 and newer.
The hosted API-and-examples docs live at
cmccomb.github.io/smallwords.
Installation
pip install smallwords
For local development, create and activate a virtualenv first:
python -m venv .venv
source .venv/bin/activate
python -m pip install -e ".[dev]"
Quick Start
from smallwords import OutputResources, OutputShape, allow_input_words, is_compliant
from smallwords.prompts import build_prompt
shape = OutputShape(max_words_per_line=24, max_lines=1)
spec = allow_input_words("basic_850", "How does a bridge work?")
resources = OutputResources.from_wordlist(spec, shape=shape)
prompt = build_prompt("explain", "How does a bridge work?", wordlist=spec)
schema = resources.json_schema(key="answer", title="bridge_explanation")
text = "A bridge is a structure that helps people and things move across a river or a deep place."
ok = is_compliant(text, spec)
The contrast is the point. build_prompt(...) is the soft instruction layer.
OutputResources gives you the matching hard constraints in both GBNF and JSON
Schema form. is_compliant(...) is the lightweight offline check.
If you want the model to be able to repeat topic or question terms such as
bridge, neighbor, or order, use allow_input_words(...) once and pass
that derived spec into the prompt, resources, and validation helpers together.
Built-In Wordlists
moby_898: the full normalized alpha-only Moby Words II frequency listbasic_850: Charles Ogden's Basic English 850special_english_1475: Voice of America Special Englishcaveman_898: a size-neutral surface-onlymoby_898remix with caveman adjustmentspirate_898: a size-neutralmoby_898remix with pirate adjustments
The bundled text files live in src/smallwords/data/. moby_898,
basic_850, and special_english_1475 are direct source-backed lists.
caveman_898 and pirate_898 are derived size-neutral remixes built on top of
moby_898.
The themed remixes live in src/smallwords/themes/caveman.py and
src/smallwords/themes/pirate.py. If you want to build your own, use
remix_wordlist(...) with a base list plus curated additions and removals.
Contrastive Example
This is the clearest way to see what smallwords is trying to do. Both blocks
below are genuine local Qwen outputs from April 5, 2026. The first uses a plain
prompt. The second uses the same base prompt plus an explicit basic_850
vocabulary list, the topic word bridge, and the generated GBNF.
A plain prompt stays fairly natural:
A bridge connects two points, usually across a body of water or a gap, allowing people and vehicles to cross safely.
A constrained basic_850 + topic words run stays simpler while still sounding
reasonably natural:
A bridge is a structure that helps people and things move across a river or a deep place.
These runs use llama-server from llama.cpp and
Qwen/Qwen3-8B-GGUF
via
bartowski/Qwen_Qwen3-8B-GGUF.
Reproduce that comparison from a clone of the repository with an activated virtualenv:
llama-server -hf bartowski/Qwen_Qwen3-8B-GGUF:q4_k_m --host 127.0.0.1 --port 8080 --reasoning-budget 0 --log-disable
python examples/readme_bridge_contrast.py
Examples
See the repository's
examples/README.md
for the runnable examples. The current example set is live-model based:
the README bridge contrast, a focused pirate greeting, and a focused technical
rewrite all call a live llama-server model with a prompt plus generated
grammar.
Development
Run these commands from an activated virtualenv:
python -m pip install -e ".[dev]"
python -m ruff check .
python -m ruff format --check .
python -m pytest
python scripts/check_documentation.py
python -m sphinx -W --keep-going -b html docs docs/_build/html
python -m build
python -m twine check --strict dist/*
CI runs linting, tests, the documentation policy check, a >=90% coverage
gate, a Sphinx docs build, and a package build on GitHub Actions.
For release steps and Trusted Publishing setup, see
RELEASING.md.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file smallwords-0.1.0.tar.gz.
File metadata
- Download URL: smallwords-0.1.0.tar.gz
- Upload date:
- Size: 42.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fe5d52f35adf5ede03d9ebda9fb2a70dcbb18b02af2f5c0874c9de702e12c95e
|
|
| MD5 |
6ee10d61d2c4084e4b7357fdcf3fa825
|
|
| BLAKE2b-256 |
21b86f7e17c2c813d2a5e3fc7af15b04db8478939f59ef139cee3dd8902249b7
|
Provenance
The following attestation bundles were made for smallwords-0.1.0.tar.gz:
Publisher:
workflow.yml on cmccomb/smallwords
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
smallwords-0.1.0.tar.gz -
Subject digest:
fe5d52f35adf5ede03d9ebda9fb2a70dcbb18b02af2f5c0874c9de702e12c95e - Sigstore transparency entry: 1239445508
- Sigstore integration time:
-
Permalink:
cmccomb/smallwords@9558d977a025510cb3bb4a7df829f39c8f565d40 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/cmccomb
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
workflow.yml@9558d977a025510cb3bb4a7df829f39c8f565d40 -
Trigger Event:
release
-
Statement type:
File details
Details for the file smallwords-0.1.0-py3-none-any.whl.
File metadata
- Download URL: smallwords-0.1.0-py3-none-any.whl
- Upload date:
- Size: 36.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0b2fee234eb27bd4fb7d984d2c8864b06f36a5ae920f0eff843b5786e404e363
|
|
| MD5 |
bd4e05e29ba4e8f49c6f862476fe0424
|
|
| BLAKE2b-256 |
a5b900fc390da576124f111cdcfe0e6937a9d949b1e95968bf57e7cf4d8b7412
|
Provenance
The following attestation bundles were made for smallwords-0.1.0-py3-none-any.whl:
Publisher:
workflow.yml on cmccomb/smallwords
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
smallwords-0.1.0-py3-none-any.whl -
Subject digest:
0b2fee234eb27bd4fb7d984d2c8864b06f36a5ae920f0eff843b5786e404e363 - Sigstore transparency entry: 1239445509
- Sigstore integration time:
-
Permalink:
cmccomb/smallwords@9558d977a025510cb3bb4a7df829f39c8f565d40 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/cmccomb
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
workflow.yml@9558d977a025510cb3bb4a7df829f39c8f565d40 -
Trigger Event:
release
-
Statement type: