A package for automating the creation of comprehensive and organized domain knowledge bases for AI applications.
Project description
Compendium Scribe
Compendium Scribe is a Click-driven command line tool and library that builds sourced research compendiums through a bounded OpenAI Agents SDK workflow. It decomposes a topic into planning, web research, verification, and synthesis stages, then renders the final Compendium as Markdown, XML, HTML, or PDF.
Features
- Agents SDK research workflow - Runs planner, research manager, section researcher, verifier, and synthesis agents with structured Pydantic outputs.
- Hosted web search where it belongs - Enables web search for research manager, section research, and verification agents; planner and synthesis stay source-controlled.
- Stable renderer contract - Final agent output is validated and passed through the existing
Compendium.from_payload()shape. - Citation ledger - Deduplicates URLs, assigns citation IDs, tracks section usage, and rejects final citations that are not ledger-backed.
- Recoverable sidecars - Writes
<base>.research.jsonafter accepted artifacts and<base>.costs.jsonfor usage/cost telemetry. - Local cost estimates - Uses a checked-in pricing catalog for GPT-5.5 and GPT-5.4 family token rates, long-context uplifts, and built-in tool call pricing when usage metadata is available.
- Compendium Library publishing - Optionally publishes XML, Markdown, and metadata cards into a movable filesystem library with a root
catalog.json. - Re-rendering - Ingest existing XML compendiums to generate new output formats without re-running research.
- Offline tests - The workflow uses a runner adapter so tests can stub Agents SDK runs without live API calls.
Quick Start
1. Install
pdm install --dev
Ensure PDM_HOME points to a writable location when developing within a sandboxed environment.
2. Configure credentials
Create a .env file (untracked) with your OpenAI credentials and explicit research model settings:
OPENAI_API_KEY=sk-...
PLANNER_AGENT_MODEL=gpt-5.5
RESEARCH_AGENT_MODEL=gpt-5.5
VERIFIER_AGENT_MODEL=gpt-5.5
SYNTHESIS_AGENT_MODEL=gpt-5.5
MAX_AGENT_TURNS=12
All four model variables are required. If any are missing or blank, Compendium Scribe stops before client setup, cost report initialization, or research begins and names the missing setting.
The research workflow uses the OpenAI Agents SDK with hosted web search enabled on the manager, section, and verifier agents.
Cost reports use the local catalog in src/compendiumscribe/research/data/pricing.standard.json. The catalog currently covers GPT-5.5, GPT-5.4 family token pricing, long-context rates above the documented threshold, web search calls, and Responses API file search calls. If a model is missing from the catalog, token usage is still recorded and USD estimates are left unavailable.
3. Generate a compendium
pdm run compendium create "Lithium-ion battery recycling"
Options:
--output PATH- Base path/filename for the output. The extension is ignored.--format FORMAT- Output format, defaulting tomd. Available:md,xml,html,pdf. Repeat for multiple outputs.--library PATH- Also publish the finished compendium into a Compendium Library directory.
If you pass --output report.md, Compendium Scribe writes:
report.mdor the requested render formatsreport.research.jsonreport.costs.json
Without --output, the base name is the slugified topic plus a UTC timestamp.
4. Publish to a Compendium Library
A Compendium Library is a directory agents can scan progressively. The root
catalog.json is the compact card catalog. Each entry points to canonical XML,
readable Markdown, and a richer card for one compendium:
research-library/
├── catalog.json
└── compendiums/
└── lithium-ion-battery-recycling/
├── compendium.xml
├── compendium.md
└── card.json
Creation works the same as usual unless --library is provided. When it is
provided, requested outputs are still written normally, and the final compendium
is also upserted into the library:
pdm run compendium create "Lithium-ion battery recycling" \
--output report.md \
--format md \
--format xml \
--library research-library
Import an existing XML compendium:
pdm run compendium library import research-library report.xml
Library entries are idempotent by slugified title. Re-publishing the same title
updates the existing compendium.xml, compendium.md, card.json, and
catalog.json entry. If another title would use the same slug, the new entry
gets a numeric suffix such as -2.
5. Recover a research run
Recovery resumes from the next incomplete stage in the sidecar state file:
pdm run compendium recover --input report.research.json
The recover command writes outputs using the same base path as the sidecar. For example, report.research.json renders to report.md when the stored format is Markdown.
6. Render formats from existing XML
pdm run compendium render my-topic.xml --format html
Options:
--format FORMAT- Output format(s) to generate:md,xml,html,pdf.--output PATH- Base path/filename for the output.
Python API Usage
from compendiumscribe import build_compendium, ResearchConfig, DeepResearchError
try:
compendium = build_compendium(
"Emerging pathogen surveillance",
config=ResearchConfig(
planner_agent_model="gpt-5.5",
research_agent_model="gpt-5.5",
verifier_agent_model="gpt-5.5",
synthesis_agent_model="gpt-5.5",
),
)
except DeepResearchError:
raise
xml_payload = compendium.to_xml_string()
markdown_doc = compendium.to_markdown()
html_files = compendium.to_html_site()
pdf_bytes = compendium.to_pdf_bytes()
The returned Compendium object contains structured sections, insights, citations, and open questions.
Data Model Overview
Compendium Scribe produces XML shaped like:
<compendium topic="Lithium-ion Battery Recycling" generated_at="2026-04-23T14:32:33+00:00">
<overview><![CDATA[Comprehensive synthesis of the state of lithium-ion recycling...]]></overview>
<methodology>
<step><![CDATA[Surveyed peer-reviewed literature and company disclosures.]]></step>
</methodology>
<sections>
<section id="S01">
<title><![CDATA[Technology Landscape]]></title>
<summary><![CDATA[Dominant recycling modalities and throughput metrics...]]></summary>
<insights>
<insight>
<title><![CDATA[Hydrometallurgy remains the throughput leader]]></title>
<evidence><![CDATA[Commercial operators report high recovery rates for core battery metals.]]></evidence>
<citations>
<ref>C01</ref>
</citations>
</insight>
</insights>
</section>
</sections>
<citations>
<citation id="C01">
<title><![CDATA[Example Recycling Benchmark]]></title>
<url><![CDATA[https://example.com/recycling-benchmark]]></url>
<publisher><![CDATA[Example Publisher]]></publisher>
</citation>
</citations>
</compendium>
Testing & Quality
pdm run test- Executes the unit suite. Tests stub Agents SDK runs, so they run offline.pdm run lint- Linting.pdm run ruff check src tests- Direct lint command.pdm build- Produce distributable artifacts.
Before marking implementation work complete, run:
pdm run pytest
pdm run ruff check src tests
pdm build
Contributing
- Fork and clone the repository.
- Run
pdm install --group dev. - Make changes following the style guide and update/add tests.
- Run
pdm run pytest,pdm run ruff check src tests, andpdm build. - Raise a pull request with a concise description, verification commands, and representative output samples when user-facing structure changes.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file compendiumscribe-0.4.0.tar.gz.
File metadata
- Download URL: compendiumscribe-0.4.0.tar.gz
- Upload date:
- Size: 41.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b07f0c222f5f7cc2016bc671c848367c1a17f58714f340b47adda96e69ad8482
|
|
| MD5 |
f137ca5de76588465b09301fbce11a15
|
|
| BLAKE2b-256 |
01c3e3d4a2f7f3c7a62a09eae8e13b79167943816829adf9e288cfb0e59b1d45
|
Provenance
The following attestation bundles were made for compendiumscribe-0.4.0.tar.gz:
Publisher:
python-publish.yml on btfranklin/compendiumscribe
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
compendiumscribe-0.4.0.tar.gz -
Subject digest:
b07f0c222f5f7cc2016bc671c848367c1a17f58714f340b47adda96e69ad8482 - Sigstore transparency entry: 1455216887
- Sigstore integration time:
-
Permalink:
btfranklin/compendiumscribe@f13d60099ba89f6d6c19b180a21915b670d4f6f2 -
Branch / Tag:
refs/tags/v0.4.0 - Owner: https://github.com/btfranklin
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@f13d60099ba89f6d6c19b180a21915b670d4f6f2 -
Trigger Event:
release
-
Statement type:
File details
Details for the file compendiumscribe-0.4.0-py3-none-any.whl.
File metadata
- Download URL: compendiumscribe-0.4.0-py3-none-any.whl
- Upload date:
- Size: 53.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f444d9cc5136617a2957744079c5f8acaa975327b90450ba17309078ef854812
|
|
| MD5 |
764104812cee41ab28d3b51a1da36c81
|
|
| BLAKE2b-256 |
6083c3f21b8b9f07dce95109fe11cfc32c6daf204548dc99464b79f251c8a673
|
Provenance
The following attestation bundles were made for compendiumscribe-0.4.0-py3-none-any.whl:
Publisher:
python-publish.yml on btfranklin/compendiumscribe
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
compendiumscribe-0.4.0-py3-none-any.whl -
Subject digest:
f444d9cc5136617a2957744079c5f8acaa975327b90450ba17309078ef854812 - Sigstore transparency entry: 1455216944
- Sigstore integration time:
-
Permalink:
btfranklin/compendiumscribe@f13d60099ba89f6d6c19b180a21915b670d4f6f2 -
Branch / Tag:
refs/tags/v0.4.0 - Owner: https://github.com/btfranklin
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@f13d60099ba89f6d6c19b180a21915b670d4f6f2 -
Trigger Event:
release
-
Statement type: