Parseltongue: a DSL for systems which refuse to speak falsehood
Project description
Parseltongue
A DSL for systems that refuse to speak falsehood.
03.03 — CLI Tool Beta Released! Install with
pip install parseltongue-dsl[cli]
Red facts are hallucinated by Claude 4.6 Sonnet:
Explanation: You can see the critique which LLM provided in the markdown document for validation of the core module. The problem is that this critique has no factual basis and was hallucinated by one of the best LLMs on the market, which is shown by ungrounded facts in red.
Rationale - Why?
LLMs hallucinate. They produce fluent, confident text that may have no basis in the source material. Traditional approaches treat this as a retrieval problem — feed the model better context and hope for the best. But even with perfect retrieval, nothing stops the model from inventing facts, misquoting sources, or drawing conclusions that don't follow from the evidence.
Parseltongue takes a different approach: instead of asking an LLM to summarize documents, we ask it to encode each of the documents as a logic system. Every extracted fact must cite a verbatim quote. Every conclusion must derive from stated premises. And every derivation is checked.
This gives us two things that prose summaries cannot:
-
Hallucination detection. Every claim traces back to a quote in a source document. If the LLM fabricates a fact, the quote verification fails — and that failure propagates automatically to every conclusion that depends on it. You don't just catch the lie; you see everything it contaminates. This also gives the user the ability to verify only the foundation — the basic facts — conclusions are guaranteed to follow from them.
-
Cross-document consistency checking. Speaking plainly — we validate if the ground truth is trustable itself. The formal system makes it possible to compute the same value via independent paths — say, a reported growth percentage vs. one calculated from absolute revenue figures in a different document. When these paths disagree, the system flags a divergence. This catches not only LLM errors, but genuine inconsistencies in the source documents.
The result is a system where the LLM does what it's good at (reading documents, identifying relevant facts, understanding relationships) while the formal engine does what LLMs are bad at (tracking provenance, checking logical consistency, propagating uncertainty).
And of course it's perfect for documentation or checking code.
Quick Start
pip install parseltongue-dsl[cli]
parseltongue
This launches the interactive TUI. On first run, a configuration wizard asks for your API endpoint, key, and model. Any OpenAI-compatible endpoint works (OpenRouter, OpenAI, Azure, local servers like vLLM or Ollama).
From the main menu: pick documents, type a question, and the pipeline runs four passes — extraction, blinded derivation, fact-checking, and answer generation. You can review, retry with feedback, or skip each pass interactively.
You can also run directly from the command line:
parseltongue run \
-d "Q3 Report:q3_report.pdf" \
-d "Targets:targets_memo.txt" \
-q "Did we beat the growth target?" \
--model anthropic/claude-sonnet-4.6
| Command | Description |
|---|---|
parseltongue |
Launch the interactive TUI |
parseltongue run -d ... -q ... |
Run pipeline directly on documents |
parseltongue inspect file.pdf |
Preview document conversion |
parseltongue history |
Browse past runs |
parseltongue configure |
Re-run the configuration wizard |
Supports PDF, DOCX, PPTX, XLSX, HTML (via Docling), plus all plain text and code formats.
See the full CLI documentation for TUI navigation, keybindings, screenshots of every screen, and configuration details.
Python API
pip install parseltongue-dsl[llm]
export OPENROUTER_API_KEY=sk-...
from parseltongue import System, Pipeline
from parseltongue.llm.openrouter import OpenRouterProvider
system = System(overridable=True)
provider = OpenRouterProvider()
pipeline = Pipeline(system, provider)
pipeline.add_document("Q3 Report", path="q3_report.pdf")
pipeline.add_document("Targets Memo", path="targets_memo.txt")
result = pipeline.run("Did we beat the growth target? What is the bonus?")
result.output.markdown— grounded report with[[type:name]]references linking every claim to source quotesresult.output.references— resolved references: value, provenance chain, and source quotesresult.output.consistency— unverified evidence, fabrication chains, diff divergencesresult.system— the full formal system for inspection viasystem.provenance(name),system.eval_diff(name), etc.
See the full LLM pipeline documentation for the four-pass architecture, provider interface, extended thinking, and reference resolution.
Core Engine
The DSL that the pipeline builds under the hood. Five directive types — fact, axiom, defterm, derive, diff — each grounded in evidence with verbatim quotes. Can be used standalone without any LLM dependency.
pip install parseltongue-dsl
See the full core documentation for directive types, evidence grounding, quote verification, custom environments, and consistency checking.
Project Structure
parseltongue/
├── core/ — formal engine: evaluation, evidence, consistency
│ ├── quote_verifier/ — inverted-index quote matching with 6-step normalization
│ ├── demos/ — apples (Peano arithmetic), revenue, biomarkers
│ └── tests/ — core unit tests (300+)
├── llm/ — four-pass LLM pipeline: extract → derive → factcheck → answer
│ ├── demos/ — end-to-end revenue demo
│ └── tests/ — llm unit tests (~100)
└── cli/ — terminal interface: TUI, document ingestion, history
├── tui/ — Textual screens, widgets, tree builders
└── demo/ — sample PDF for testing
Demos
# Core demos — no LLM needed
python -m parseltongue.core.demos.apples.demo
python -m parseltongue.core.demos.revenue_reports.demo
python -m parseltongue.core.demos.biomarkers.demo
# LLM pipeline demo
python -m parseltongue.llm.demos.revenue.demo
# CLI demo — run the pipeline on the included PDF
parseltongue run -d "parseltongue/cli/demo/nejm.pdf" -q "Find any inconsistencies or red flags."
A sample PDF (cli/demo/nejm.pdf) is included for testing the CLI — it's the document used in the screenshots above.
Tests
pip install -e ".[dev,llm]"
pytest # all tests
pytest parseltongue/core/tests/ # core only
pytest parseltongue/llm/tests/ # llm only
Acknowledgments
Alan Turing — On Computable Numbers (1936), Systems of Logic Based on Ordinals (1939). For inspiration, formalisation, and the main principles of this work.
Kurt Godel — incompleteness theorems, and the proof that no sufficiently powerful system can guarantee its own consistency. Without him we wouldn't know where to stop.
Eliezer Yudkowsky — for the hint about the language and the name:
"There is a simple answer, and I would have enforced it upon you in any case. Ssnakes can't lie. And since I have a tremendous distaste for stupidity, I suggest you do not say anything like 'What do you mean?' You are smarter than that, and I do not have time for such conversations as ordinary people inflict on one another."
Harry swallowed. Snakes can't lie. "Two pluss two equalss four." Harry had tried to say that two plus two equalled three, and the word four had slipped out instead.
License
Apache 2.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file parseltongue_dsl-0.3.0.tar.gz.
File metadata
- Download URL: parseltongue_dsl-0.3.0.tar.gz
- Upload date:
- Size: 120.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9b4e4e667b7f5d83737ceb89b6a702aadb631f147892ed62445a2fa657acf0e5
|
|
| MD5 |
b82ff5c2655fed19541ccf0d73e7e448
|
|
| BLAKE2b-256 |
f6aa366817c7b7ed78eee374f5cbe826b3a9a41cdf8121f8418bfa05150c0ab6
|
Provenance
The following attestation bundles were made for parseltongue_dsl-0.3.0.tar.gz:
Publisher:
publish.yml on sci2sci-opensource/parseltongue
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
parseltongue_dsl-0.3.0.tar.gz -
Subject digest:
9b4e4e667b7f5d83737ceb89b6a702aadb631f147892ed62445a2fa657acf0e5 - Sigstore transparency entry: 1016393813
- Sigstore integration time:
-
Permalink:
sci2sci-opensource/parseltongue@909273efa472dd32c5e6855f76f51fc935cface7 -
Branch / Tag:
refs/tags/v0.3.0 - Owner: https://github.com/sci2sci-opensource
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@909273efa472dd32c5e6855f76f51fc935cface7 -
Trigger Event:
push
-
Statement type:
File details
Details for the file parseltongue_dsl-0.3.0-py3-none-any.whl.
File metadata
- Download URL: parseltongue_dsl-0.3.0-py3-none-any.whl
- Upload date:
- Size: 146.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8dd11bdd58a05e175d90b30ee1839f0bc9ba0fe4db5bad6b30160dd5e08c9f88
|
|
| MD5 |
febc382ac7e6bc3459fa1967b208d874
|
|
| BLAKE2b-256 |
695c75c66bdd2107305ed8036ed87b91293913cf2fb47114c872484320f92786
|
Provenance
The following attestation bundles were made for parseltongue_dsl-0.3.0-py3-none-any.whl:
Publisher:
publish.yml on sci2sci-opensource/parseltongue
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
parseltongue_dsl-0.3.0-py3-none-any.whl -
Subject digest:
8dd11bdd58a05e175d90b30ee1839f0bc9ba0fe4db5bad6b30160dd5e08c9f88 - Sigstore transparency entry: 1016393846
- Sigstore integration time:
-
Permalink:
sci2sci-opensource/parseltongue@909273efa472dd32c5e6855f76f51fc935cface7 -
Branch / Tag:
refs/tags/v0.3.0 - Owner: https://github.com/sci2sci-opensource
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@909273efa472dd32c5e6855f76f51fc935cface7 -
Trigger Event:
push
-
Statement type: