Skip to main content

Controlled-vocabulary prompts plus portable GBNF and JSON Schema resources for small-word English generation.

Project description

smallwords

CI PyPI version Python versions License Docs

smallwords is a tiny Python package for controlled-vocabulary prompting plus portable output resources. It keeps one wordlist at the center of the workflow so prompt text, GBNF, JSON Schema, and post-generation validation all stay in sync.

The package ships with a small set of bundled wordlists: direct source-backed lists such as moby_898, basic_850, and special_english_1475, plus a couple of intentionally themed remixes. By default, the built-ins also allow slight family variants such as go, goes, and going.

It supports Python 3.10 and newer.

The hosted API-and-examples docs live at cmccomb.github.io/smallwords.

Installation

pip install smallwords

For local development, create and activate a virtualenv first:

python -m venv .venv
source .venv/bin/activate
python -m pip install -e ".[dev]"

Quick Start

from smallwords import OutputResources, OutputShape, allow_input_words, is_compliant
from smallwords.prompts import build_prompt

shape = OutputShape(max_words_per_line=24, max_lines=1)
spec = allow_input_words("basic_850", "How does a bridge work?")
resources = OutputResources.from_wordlist(spec, shape=shape)
prompt = build_prompt("explain", "How does a bridge work?", wordlist=spec)
schema = resources.json_schema(key="answer", title="bridge_explanation")

text = "A bridge is a structure that helps people and things move across a river or a deep place."
ok = is_compliant(text, spec)

The contrast is the point. build_prompt(...) is the soft instruction layer. OutputResources gives you the matching hard constraints in both GBNF and JSON Schema form. is_compliant(...) is the lightweight offline check.

If you want the model to be able to repeat topic or question terms such as bridge, neighbor, or order, use allow_input_words(...) once and pass that derived spec into the prompt, resources, and validation helpers together.

Built-In Wordlists

  • moby_898: the full normalized alpha-only Moby Words II frequency list
  • basic_850: Charles Ogden's Basic English 850
  • special_english_1475: Voice of America Special English
  • caveman_898: a size-neutral surface-only moby_898 remix with caveman adjustments
  • pirate_898: a size-neutral moby_898 remix with pirate adjustments

The bundled text files live in src/smallwords/data/. moby_898, basic_850, and special_english_1475 are direct source-backed lists. caveman_898 and pirate_898 are derived size-neutral remixes built on top of moby_898.

The themed remixes live in src/smallwords/themes/caveman.py and src/smallwords/themes/pirate.py. If you want to build your own, use remix_wordlist(...) with a base list plus curated additions and removals.

Contrastive Example

This is the clearest way to see what smallwords is trying to do. Both blocks below are genuine local Qwen outputs from April 5, 2026. The first uses a plain prompt. The second uses the same base prompt plus an explicit basic_850 vocabulary list, the topic word bridge, and the generated GBNF.

A plain prompt stays fairly natural:

A bridge connects two points, usually across a body of water or a gap, allowing people and vehicles to cross safely.

A constrained basic_850 + topic words run stays simpler while still sounding reasonably natural:

A bridge is a structure that helps people and things move across a river or a deep place.

These runs use llama-server from llama.cpp and Qwen/Qwen3-8B-GGUF via bartowski/Qwen_Qwen3-8B-GGUF.

Reproduce that comparison from a clone of the repository with an activated virtualenv:

llama-server -hf bartowski/Qwen_Qwen3-8B-GGUF:q4_k_m --host 127.0.0.1 --port 8080 --reasoning-budget 0 --log-disable
python examples/readme_bridge_contrast.py

Examples

See the repository's examples/README.md for the runnable examples. The current example set is live-model based: the README bridge contrast, a focused pirate greeting, and a focused technical rewrite all call a live llama-server model with a prompt plus generated grammar.

Development

Run these commands from an activated virtualenv:

python -m pip install -e ".[dev]"
python -m ruff check .
python -m ruff format --check .
python -m pytest
python scripts/check_documentation.py
python -m sphinx -W --keep-going -b html docs docs/_build/html
python -m build
python -m twine check --strict dist/*

CI runs linting, tests, the documentation policy check, a >=90% coverage gate, a Sphinx docs build, and a package build on GitHub Actions.

For release steps and Trusted Publishing setup, see RELEASING.md.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

smallwords-0.1.0.tar.gz (42.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

smallwords-0.1.0-py3-none-any.whl (36.1 kB view details)

Uploaded Python 3

File details

Details for the file smallwords-0.1.0.tar.gz.

File metadata

  • Download URL: smallwords-0.1.0.tar.gz
  • Upload date:
  • Size: 42.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for smallwords-0.1.0.tar.gz
Algorithm Hash digest
SHA256 fe5d52f35adf5ede03d9ebda9fb2a70dcbb18b02af2f5c0874c9de702e12c95e
MD5 6ee10d61d2c4084e4b7357fdcf3fa825
BLAKE2b-256 21b86f7e17c2c813d2a5e3fc7af15b04db8478939f59ef139cee3dd8902249b7

See more details on using hashes here.

Provenance

The following attestation bundles were made for smallwords-0.1.0.tar.gz:

Publisher: workflow.yml on cmccomb/smallwords

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file smallwords-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: smallwords-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 36.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for smallwords-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0b2fee234eb27bd4fb7d984d2c8864b06f36a5ae920f0eff843b5786e404e363
MD5 bd4e05e29ba4e8f49c6f862476fe0424
BLAKE2b-256 a5b900fc390da576124f111cdcfe0e6937a9d949b1e95968bf57e7cf4d8b7412

See more details on using hashes here.

Provenance

The following attestation bundles were made for smallwords-0.1.0-py3-none-any.whl:

Publisher: workflow.yml on cmccomb/smallwords

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page