Skip to main content

A package for automating the creation of comprehensive and organized domain knowledge bases for AI applications.

Project description

Compendium Scribe

Compendium Scribe banner

Build Status Supports Python versions 3.12+

Compendium Scribe is a Click-driven command line tool and library that builds sourced research compendiums through a bounded OpenAI Agents SDK workflow. It decomposes a topic into planning, web research, verification, and synthesis stages, then renders the final Compendium as Markdown, XML, HTML, or PDF.


Features

  • Agents SDK research workflow - Runs planner, research manager, section researcher, verifier, and synthesis agents with structured Pydantic outputs.
  • Hosted web search where it belongs - Enables web search for research manager, section research, and verification agents; planner and synthesis stay source-controlled.
  • Stable renderer contract - Final agent output is validated and passed through the existing Compendium.from_payload() shape.
  • Citation ledger - Deduplicates URLs, assigns citation IDs, tracks section usage, and rejects final citations that are not ledger-backed.
  • Recoverable sidecars - Writes <base>.research.json after accepted artifacts and <base>.costs.json for usage/cost telemetry.
  • Local cost estimates - Uses a checked-in pricing catalog for GPT-5.5 and GPT-5.4 family token rates, long-context uplifts, and built-in tool call pricing when usage metadata is available.
  • Compendium Library publishing - Optionally publishes XML, Markdown, and metadata cards into a movable filesystem library with a root catalog.json.
  • Re-rendering - Ingest existing XML compendiums to generate new output formats without re-running research.
  • Offline tests - The workflow uses a runner adapter so tests can stub Agents SDK runs without live API calls.

Quick Start

1. Install

pdm install --dev

Ensure PDM_HOME points to a writable location when developing within a sandboxed environment.

2. Configure credentials

Create a .env file (untracked) with your OpenAI credentials and explicit research model settings:

OPENAI_API_KEY=sk-...
PLANNER_AGENT_MODEL=gpt-5.5
RESEARCH_AGENT_MODEL=gpt-5.5
VERIFIER_AGENT_MODEL=gpt-5.5
SYNTHESIS_AGENT_MODEL=gpt-5.5
MAX_AGENT_TURNS=12

All four model variables are required. If any are missing or blank, Compendium Scribe stops before client setup, cost report initialization, or research begins and names the missing setting.

The research workflow uses the OpenAI Agents SDK with hosted web search enabled on the manager, section, and verifier agents.

Cost reports use the local catalog in src/compendiumscribe/research/data/pricing.standard.json. The catalog currently covers GPT-5.5, GPT-5.4 family token pricing, long-context rates above the documented threshold, web search calls, and Responses API file search calls. If a model is missing from the catalog, token usage is still recorded and USD estimates are left unavailable.

3. Generate a compendium

pdm run compendium create "Lithium-ion battery recycling"

Options:

  • --output PATH - Base path/filename for the output. The extension is ignored.
  • --format FORMAT - Output format, defaulting to md. Available: md, xml, html, pdf. Repeat for multiple outputs.
  • --library PATH - Also publish the finished compendium into a Compendium Library directory.

If you pass --output report.md, Compendium Scribe writes:

  • report.md or the requested render formats
  • report.research.json
  • report.costs.json

Without --output, the base name is the slugified topic plus a UTC timestamp.

4. Publish to a Compendium Library

A Compendium Library is a directory agents can scan progressively. The root catalog.json is the compact card catalog. Each entry points to canonical XML, readable Markdown, and a richer card for one compendium:

research-library/
├── catalog.json
└── compendiums/
    └── lithium-ion-battery-recycling/
        ├── compendium.xml
        ├── compendium.md
        └── card.json

Creation works the same as usual unless --library is provided. When it is provided, requested outputs are still written normally, and the final compendium is also upserted into the library:

pdm run compendium create "Lithium-ion battery recycling" \
  --output report.md \
  --format md \
  --format xml \
  --library research-library

Import an existing XML compendium:

pdm run compendium library import research-library report.xml

Library entries are idempotent by slugified title. Re-publishing the same title updates the existing compendium.xml, compendium.md, card.json, and catalog.json entry. If another title would use the same slug, the new entry gets a numeric suffix such as -2.

5. Recover a research run

Recovery resumes from the next incomplete stage in the sidecar state file:

pdm run compendium recover --input report.research.json

The recover command writes outputs using the same base path as the sidecar. For example, report.research.json renders to report.md when the stored format is Markdown.

6. Render formats from existing XML

pdm run compendium render my-topic.xml --format html

Options:

  • --format FORMAT - Output format(s) to generate: md, xml, html, pdf.
  • --output PATH - Base path/filename for the output.

Python API Usage

from compendiumscribe import build_compendium, ResearchConfig, DeepResearchError

try:
    compendium = build_compendium(
        "Emerging pathogen surveillance",
        config=ResearchConfig(
            planner_agent_model="gpt-5.5",
            research_agent_model="gpt-5.5",
            verifier_agent_model="gpt-5.5",
            synthesis_agent_model="gpt-5.5",
        ),
    )
except DeepResearchError:
    raise

xml_payload = compendium.to_xml_string()
markdown_doc = compendium.to_markdown()
html_files = compendium.to_html_site()
pdf_bytes = compendium.to_pdf_bytes()

The returned Compendium object contains structured sections, insights, citations, and open questions.


Data Model Overview

Compendium Scribe produces XML shaped like:

<compendium topic="Lithium-ion Battery Recycling" generated_at="2026-04-23T14:32:33+00:00">
  <overview><![CDATA[Comprehensive synthesis of the state of lithium-ion recycling...]]></overview>
  <methodology>
    <step><![CDATA[Surveyed peer-reviewed literature and company disclosures.]]></step>
  </methodology>
  <sections>
    <section id="S01">
      <title><![CDATA[Technology Landscape]]></title>
      <summary><![CDATA[Dominant recycling modalities and throughput metrics...]]></summary>
      <insights>
        <insight>
          <title><![CDATA[Hydrometallurgy remains the throughput leader]]></title>
          <evidence><![CDATA[Commercial operators report high recovery rates for core battery metals.]]></evidence>
          <citations>
            <ref>C01</ref>
          </citations>
        </insight>
      </insights>
    </section>
  </sections>
  <citations>
    <citation id="C01">
      <title><![CDATA[Example Recycling Benchmark]]></title>
      <url><![CDATA[https://example.com/recycling-benchmark]]></url>
      <publisher><![CDATA[Example Publisher]]></publisher>
    </citation>
  </citations>
</compendium>

Testing & Quality

  • pdm run test - Executes the unit suite. Tests stub Agents SDK runs, so they run offline.
  • pdm run lint - Linting.
  • pdm run ruff check src tests - Direct lint command.
  • pdm build - Produce distributable artifacts.

Before marking implementation work complete, run:

pdm run pytest
pdm run ruff check src tests
pdm build

Contributing

  1. Fork and clone the repository.
  2. Run pdm install --group dev.
  3. Make changes following the style guide and update/add tests.
  4. Run pdm run pytest, pdm run ruff check src tests, and pdm build.
  5. Raise a pull request with a concise description, verification commands, and representative output samples when user-facing structure changes.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

compendiumscribe-0.4.0.tar.gz (41.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

compendiumscribe-0.4.0-py3-none-any.whl (53.4 kB view details)

Uploaded Python 3

File details

Details for the file compendiumscribe-0.4.0.tar.gz.

File metadata

  • Download URL: compendiumscribe-0.4.0.tar.gz
  • Upload date:
  • Size: 41.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for compendiumscribe-0.4.0.tar.gz
Algorithm Hash digest
SHA256 b07f0c222f5f7cc2016bc671c848367c1a17f58714f340b47adda96e69ad8482
MD5 f137ca5de76588465b09301fbce11a15
BLAKE2b-256 01c3e3d4a2f7f3c7a62a09eae8e13b79167943816829adf9e288cfb0e59b1d45

See more details on using hashes here.

Provenance

The following attestation bundles were made for compendiumscribe-0.4.0.tar.gz:

Publisher: python-publish.yml on btfranklin/compendiumscribe

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file compendiumscribe-0.4.0-py3-none-any.whl.

File metadata

File hashes

Hashes for compendiumscribe-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f444d9cc5136617a2957744079c5f8acaa975327b90450ba17309078ef854812
MD5 764104812cee41ab28d3b51a1da36c81
BLAKE2b-256 6083c3f21b8b9f07dce95109fe11cfc32c6daf204548dc99464b79f251c8a673

See more details on using hashes here.

Provenance

The following attestation bundles were made for compendiumscribe-0.4.0-py3-none-any.whl:

Publisher: python-publish.yml on btfranklin/compendiumscribe

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page