Dictionary-Scoped Composition Layer — a model-native linguistic constraint layer for LLM output.
Project description
DSCL — Dictionary-Scoped Composition Layer
A model-native linguistic constraint layer for LLM output.
The narrative is the input. The meaning already exists.
DSCL governs how that meaning gets represented — at the word, sentence, paragraph, and page level — using frequency-grounded vocabulary, plain language standards, and structured composition rules.
As each word carries weight, every syllable carries tone, every sentence carries context, while every paragraph represents narrative.
The problem
LLMs default to a recognisable pattern: passive voice, long intro clauses, hedged language, abstract words, buried main points. The same five adjectives appear in every document. The reading grade sits at 12 when the domain needs 6. The output sounds like a machine wrote it because a machine did — and nothing constrained how it wrote.
DSCL intercepts before generation and constrains how meaning gets represented. Not what the meaning is — that is always the user's. Only how it is written.
Narrative → Words → Sentences → Paragraphs → Page
How it works
At each level, a different set of rules fires:
| Level | What it controls | Source files |
|---|---|---|
| Word | Which words are permitted. Simple word first — always. Frequency-grounded against COCA data. | primary_vocab.json, domain_exceptions.json, register_tags.json |
| Sentence | How words form sentences. Subject leads. Actor before action. Main point first. | sentence_structure.json, clause_rules.json, voice_rules.json |
| Paragraph | How sentences build paragraphs. One idea per block. Topic sentence leads. No drift. | paragraph_structure.json, sentence_length.json |
| Page | How paragraphs assemble into a finished page a person can read and act on. | domain_register.json, grade_targets.json, conflict_resolution.json |
The LLM does not invent meaning. The user already has it. DSCL constrains the representation — not the content.
Install
pip install dscl
# or
uv add dscl
For more accurate structural checking (passive voice, nominalization detection):
pip install "dscl[nlp]"
dscl setup-nlp
Quick start
from dscl import DSCL
dscl = DSCL()
constrained = dscl.prepare("Write a travel risk assessment for Beirut.")
print(constrained.domain) # "travel_risk"
print(constrained.fk_target) # "5-7"
print(constrained.register) # "abstract_general"
Pass the constrained context directly to any model:
# Anthropic
import anthropic
from dscl import DSCL
client = anthropic.Anthropic()
dscl = DSCL()
constrained = dscl.prepare("Write a travel risk assessment for Beirut.")
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
system=constrained.system_prompt,
messages=[{"role": "user", "content": constrained.prompt}]
)
result = dscl.validate(response.content[0].text, domain=constrained.domain)
print(result.output)
print(result.fk_grade) # 5.8
print(result.violations) # [] — clean pass
print(result.fixes_applied) # ["SS_01", "VR_01"]
print(result.sentence_variation) # 0.43 — natural variation
print(result.vocabulary_variety) # 0.89 — good range
# OpenAI
from openai import OpenAI
from dscl import DSCL
client = OpenAI()
dscl = DSCL()
constrained = dscl.prepare("Summarize the risks of this contract.")
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": constrained.system_prompt},
{"role": "user", "content": constrained.prompt}
]
)
result = dscl.validate(response.choices[0].message.content, domain=constrained.domain)
print(result.output)
# Ollama (local models)
import ollama
from dscl import DSCL
dscl = DSCL()
constrained = dscl.prepare("Explain this medical procedure to a patient.")
# recommended inference options for Qwen3 8b non-thinking mode
# adjust per model — see your model's documentation for optimal settings
response = ollama.chat(
model="qwen3:8b",
options={
"temperature": 0.7,
"top_p": 0.85,
"top_k": 40,
"min_p": 0.0,
"repeat_penalty": 1.5,
"num_predict": 2048,
},
messages=[
{"role": "system", "content": constrained.system_prompt},
{"role": "user", "content": constrained.prompt}
]
)
result = dscl.validate(response["message"]["content"], domain=constrained.domain)
print(result.output)
Flags
only_text — plain prose output
Instructs the model to write in paragraphs only. No markdown headers, bullet points, bold text, tables, or numbered lists. Use when your pipeline or app renders plain text.
constrained = dscl.prepare(narrative, only_text=True)
live — current information
Signals that the model has web access and should use real-time information for time-sensitive topics. Prevents the model from confidently stating stale training data as current fact. For models without web access, instructs the model to flag uncertainty and direct the reader to current sources.
constrained = dscl.prepare(narrative, live=True)
Combined
constrained = dscl.prepare(narrative, only_text=True, live=True)
The combination for live-feed product pipelines — current information rendered as clean plain prose.
Test script flags
test_dscl.py ships with the repository and covers the full diagnostic workflow.
python test_dscl.py
python test_dscl.py --narrative "Write a travel advisory for Dubai."
python test_dscl.py --only-text
python test_dscl.py --live
python test_dscl.py --compare
python test_dscl.py --live --only-text --compare
| Flag | What it does |
|---|---|
--narrative "..." |
The narrative to process. Defaults to a software licence summary if omitted. |
--only-text |
Output plain prose only. No markdown, bullets, headers, or tables. |
--live |
Injects a currency instruction. Use with models that have web access. |
--compare |
Runs the narrative twice — once with DSCL, once without — and prints a side-by-side comparison of FK grade, sentence variation, vocabulary displacement, violations, and word count. |
--live --only-text |
Current information rendered as plain prose. The combination for live-feed pipelines. |
--live --only-text --compare |
Full diagnostic run. Constrained vs baseline with live signal and plain prose output. |
Template commands
Prefix your narrative with @template_name to use a specific report template:
narrative = "@travel_risk Write a country risk assessment for Dubai..."
narrative = "@corporate_security_intel India-Pakistan water treaty dispute..."
Any key in dictionary/vocabulary/templates.json is automatically a valid command. Add a template to the JSON and it is available immediately — no code changes needed.
Override domain manually
constrained = dscl.prepare(
"Draft a liability clause for a SaaS agreement.",
domain="legal_contract",
audience="general_public"
)
Validate existing output
from dscl import DSCL
dscl = DSCL()
result = dscl.validate(existing_text, domain="travel_risk")
print(result.fk_grade) # Flesch-Kincaid grade
print(result.grade_target_closeness) # above 0.65 = pass
print(result.sentence_variation) # above 0.40 = natural variation
print(result.vocabulary_variety) # above 0.58 = good vocabulary range
print(result.vocabulary_displacement) # avg COCA rank — lower = plainer language
print(result.violations) # rule violations found
print(result.fixes_applied) # rule IDs applied automatically
print(result.passed) # True if no hard violations
What the validator measures
Word-level checks
Banned words found : 0
Replacements applied : 2
Vocabulary displacement : rank 2,840 — plain language, strong
Sentence-level checks
Passive voice instances : 0
Nominalizations found : 0
Sentences checked : 12
Readability (reference)
FK grade : 6.9 target 6–8
Grade target closeness : 1.0 above 0.65 threshold — pass
Sentence variation : 0.429 above 0.40 — natural variation
Vocabulary variety : 0.890 above 0.58 — good range
Result
Passed : True
Fixes applied : ['RD_01']
Violations : []
On the FK grade: it is a reference number, not the primary target. 65% compliance with the target range is the realistic ceiling for natural language. A score above that threshold counts as a full pass. Perfect scores do not exist in real prose.
Real output example
Input narrative — user-supplied, current events:
USS Spruance fired on and seized Iranian cargo vessel MV Touska on April 19 in the Arabian Sea after a six-hour standoff. Iran closed the Strait of Hormuz to vessels bound for US and allied ports. Ship transits dropped 95% from 130 per day in February to 6 in March. Maersk, MSC, Hapag-Lloyd, and CMA CGM have suspended transits. QatarEnergy declared force majeure March 4 removing 20% of global LNG supply overnight. The IEA called this the largest supply disruption in oil market history. Brent crude is above 95 dollars. UNCTAD projects global trade growth falling from 4.7% in 2025 to 1.5–2.5% in 2026.
DSCL output (only_text=True):
On April 19, the USS Spruance fired on and seized the Iranian cargo vessel MV Touska in the Arabian Sea. The standoff lasted six hours before US forces disabled the ship's engines and took it into custody. This action occurred under a US naval blockade of Iranian ports that began on April 13.
Iran responded by closing the Strait of Hormuz to ships bound for US and allied ports. Iran has also threatened direct retaliation. A ceasefire between the US and Iran expires today, April 21.
Ship traffic through the strait has collapsed. Daily transits fell from 130 in February to just 6 in March, a drop of 95%. Major shipping lines including Maersk, MSC, Hapag-Lloyd, and CMA CGM have all suspended transits.
Energy markets have been hit hard. QatarEnergy declared force majeure on March 4, removing 20% of global LNG supply at once. Brent crude has risen above 95 dollars per barrel. The IEA called this the largest supply disruption in oil market history.
The disruption reaches beyond energy. One third of global fertilizer trade and 85% of Middle East polyethylene exports move through this route. UNCTAD projects global trade growth falling from 4.7% in 2025 to between 1.5% and 2.5% in 2026.
Validation result:
FK grade : 6.9 target 6–8 ✓
Grade target closeness : 1.0 ✓ pass
Sentence variation : 0.429 ✓ natural
Vocabulary variety : 0.890 ✓ good
Passed : True
The meaning carried from the user's narrative is unchanged. Only the representation was constrained.
The dictionary
dictionary/
├── system/
│ ├── pipeline.json master load sequence
│ ├── entry_rules.json three entry facts before anything else
│ ├── scope.json what DSCL does not control
│ └── defaults.json fallback values
├── vocabulary/
│ ├── primary_vocab.json COCA top-5k frequency words
│ ├── fallback_vocab.json COCA 5k–10k
│ ├── domain_vocab.json preferred words per domain
│ ├── domain_exceptions.json words allowed despite vocabulary rules
│ └── templates.json report structure templates
├── context/
│ ├── domain_register.json domain → register and FK target mapping
│ ├── register_tags.json register definitions
│ └── rules.json global and domain-scoped language rules
├── syntax/
│ ├── sentence_structure.json
│ ├── clause_rules.json
│ ├── paragraph_structure.json
│ └── punctuation_function.json
├── grammar/
│ ├── voice_rules.json
│ ├── verb_placement.json
│ ├── modifier_rules.json
│ └── nominalization_blocklist.json
├── readability/
│ ├── grade_targets.json
│ ├── sentence_length.json
│ └── syllable_load.json
└── meta/
├── conflict_resolution.json
└── weight_definitions.json
The dictionary is the core of DSCL. The library code is the interface to it. Every word added, every domain mapped, every template built compounds over time.
Adding custom templates
Copy _template_blank from dictionary/vocabulary/templates.json, give it a new key, fill in the structure. It becomes available immediately as an @command — no code changes needed.
"my_report": {
"description": "...",
"composition_order": ["section_1", "section_2"],
"render_order": ["section_1", "section_2"],
"sections": { ... }
}
Then use it:
constrained = dscl.prepare("@my_report ...")
Custom templates go in dictionary/custom/templates/ and are never overwritten by library updates.
NLP accuracy
| Check | Without [nlp] |
With [nlp] |
|---|---|---|
| Passive voice | Pattern matching — catches common forms | Dependency parsing — catches all forms |
| Nominalizations | Exact word matching | Morphological analysis — catches all inflected forms |
| Word scanning | FlashText single pass | FlashText single pass |
| FK grade | Vowel cluster syllable algorithm | Same algorithm |
The core library works without [nlp]. Install it when output accuracy matters more than install size.
Grounded sources
The dictionary is built on established linguistic sources:
- Corpus of Contemporary American English (COCA) — frequency data
- Flesch-Kincaid Grade Level formula — readability standard
- Plain Language Act 2010
- Strunk & White, The Elements of Style
- AP Stylebook
- WordNet
- Penn Treebank
- BNC/COCA word family lists
- Google Trillion Word Corpus
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dscl-0.1.0.tar.gz.
File metadata
- Download URL: dscl-0.1.0.tar.gz
- Upload date:
- Size: 938.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7f541651ead0f39f0fec9a573a0837a31f270f74d40828d1a107c91d7cd94db8
|
|
| MD5 |
221f846b99bd1feda00714b457fdaa40
|
|
| BLAKE2b-256 |
e95e5580bf394fc1f6177791e6d5cb4f113da90e08829fcd3203e41ed1a59205
|
File details
Details for the file dscl-0.1.0-py3-none-any.whl.
File metadata
- Download URL: dscl-0.1.0-py3-none-any.whl
- Upload date:
- Size: 30.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8865184279dc556f89c6c2bc742378c01b90aa02da253fc0f8e24b0b79b9aed4
|
|
| MD5 |
5ec311ae48118b14d0566791c972dcb8
|
|
| BLAKE2b-256 |
8239b6d0728f71dcf7ef8fd0fbf6233ace46508f7aec6b3540b102ae4837e298
|