AI-optimized Confluence evidence package extractor
Project description
pull-cli
pull-cli installs the pull command, a read-only Confluence extractor for AI-consumable evidence packages. It is rendered-page-first: page Markdown, and the optional Markdown bundle in full mode, are based on the current published page as visible to the authenticated user, while storage XML is kept for macro recovery, provenance, and fallback.
The default output mode is simple: a quiet agent-facing package with the root AI Markdown file, per-page Markdown files, assets/sidecars, and validation control files. Use --output-mode full when you also want bundle.md, page HTML snapshots, and storage-source sidecars.
Confluence access is implemented through atlassian-python-api behind a small pull_cli.clients protocol. The extraction, redaction, manifest, asset, link, and validation contracts remain owned by pull-cli.
Install
uvx pull-cli --help
uv tool install pull-cli
pip install pull-cli
The package name is pull-cli. The import package is pull_cli. Console scripts are pull and pull-cli.
Quickstart
Cloud:
set PULL_URL=https://example.atlassian.net/wiki
set PULL_USER=you@example.com
set PULL_TOKEN=your-api-token
pull 123456 -o pulled-confluence
Data Center or Server:
set PULL_URL=https://confluence.example.com/confluence
set PULL_TOKEN=your-personal-access-token
pull --page-id 123456 -o pulled-confluence
CONFPUB_URL, CONFPUB_USER, CONFPUB_TOKEN, and CONFPUB_SSL_VERIFY are accepted as compatibility fallbacks after PULL_* variables.
CLI Examples
pull 123456 -o pulled
pull "https://example.atlassian.net/wiki/spaces/EA/pages/123456/Architecture" -o pulled
pull --space EA --title "Architecture Overview" -o pulled
pull --page-id 123456 --tree --depth 3 --max-pages 100 -o tree
pull --page-id 123456 --tree --assets all --extract-attachments -o offline
pull --page-id 123456 --tree --comments -o with-comments
pull --page-id 123456 --output-mode full -o full-evidence
pull --page-id 123456 --output-mode simple --bundle -o simple-with-bundle
pull --page-id 123456 --json -o pulled
pull validate pulled
pull guide --json
Selector resolution order is: explicit --page-id, explicit --url, positional URL, positional numeric page ID, then --space plus --title.
Output Package
Default simple mode:
pulled-confluence/
├── page-title.md
├── page-title.yaml
├── manifest.yaml
├── pages/
│ └── 0001-page-slug/
│ ├── index.md
│ ├── page.json
│ ├── comments.md # with --comments, only when comments exist
│ └── assets/
└── diagnostics/
├── warnings.jsonl
└── unresolved-links.md
page-title.md is named from the sanitized root page title and is the recommended first file to give another AI agent. In simple mode it links only the reading/navigation surface: page Markdown paths, assets, sidecars, and explicitly requested agent-facing extras such as bundle.md or chunks.jsonl. Warning counts are shown, but control files are not linked from the root AI Markdown.
page-title.yaml is the machine-readable version of that AI navigation manifest, also named from the sanitized root page title. It intentionally omits noisy provenance and raw API details; use manifest.yaml when you need full validation/provenance data. The exact generated filenames are recorded in manifest.yaml under paths.ai_entry and paths.ai_manifest. AI navigation paths are package-root-relative: resolve them against the directory containing the root AI Markdown/YAML file, not the caller's shell working directory.
manifest.yaml, page.json, and diagnostics files are still written in simple mode so pull validate <output-dir> and provenance checks work. --force never deletes stale files from earlier runs; use --clean when switching modes if you need the physical tree to contain only files from the new mode.
--output-mode full adds the full evidence artifacts:
pulled-confluence/
├── bundle.md
└── pages/
└── 0001-page-slug/
├── index.html
└── source.storage.xml
bundle.md concatenates pages in page/tree order with stable delimiters for AI use; local links embedded in the bundle are rebased to the package root. index.html and source.storage.xml are raw/reference artifacts, not the primary navigation surface.
For tree pulls, nested page paths are the default. The manifest always carries stable numeric ordering.
Auth and Config
Resolution order:
- CLI flags such as
--base-url,--user,--token,--ssl-verify. PULL_*environment variables.- Optional YAML config from
--config. CONFPUB_*compatibility environment variables.
--ssl-verify accepts true, false, or a CA bundle path.
Macro, Asset, and Link Behavior
The extractor uses a macro adapter registry. Current adapters cover panels/admonitions, code/noformat, status, expand, tabs, layout flattening, TOC placeholders, children/page tree links when in scope, include/excerpt placeholders or inline source when available, attachments, displayed files, Jira placeholders, diagram snapshots, dynamic snapshots, HTML macro sanitization, and unknown macro placeholders.
Asset policy defaults to visible: rendered images, visible attachment links, file macros, and rendered diagram images where discoverable. --assets page downloads all page attachments. --assets all includes visible/referenced assets plus all page attachments and macro-listed files where discoverable. --no-assets skips downloads and preserves source links with warnings.
Local links to pages in the pulled tree are rewritten to relative index.md paths. Downloaded asset links are rewritten to local files. External, mailto, Jira, and out-of-scope Confluence links are preserved. Same-page anchors are normalized where possible; unresolved anchors become diagnostics.
Comments
Comments are skipped by default. Use --comments to fetch page-level and inline comments for each pulled page. When comments exist, pull writes a page-local comments.md sidecar with agent-readable metadata and Markdown-converted comment bodies.
Comment sidecars are agent-facing reading surfaces: the root AI Markdown page hierarchy links them in simple mode, the page index.md header links the local sidecar, and the AI YAML includes the optional comments path and count. If one page's comments cannot be fetched, the pull continues with W_COMMENTS_FETCH_FAILED and validation can still pass for the partial package.
JSON Mode
With --json or LLM=true, stdout is exactly one JSON object with:
{
"schema_version": "1.0",
"request_id": "req_...",
"ok": true,
"command": "pull",
"target": {},
"result": {},
"warnings": [],
"errors": [],
"metrics": {}
}
Progress, retries, warnings, and debug output belong on stderr.
Security
pull is read-only. It does not mutate Confluence, fetch drafts by default, bypass permissions, or call LLM services. Tokens, Authorization headers, cookies, signed download query parameters, and token-like strings are redacted before JSON envelopes, manifests, page metadata, and diagnostics are written.
Rendered HTML snapshots are sanitized by removing executable tags and event attributes. HTML macro content is made inert before conversion.
Validation
pull validate pulled-confluence
pull validate pulled-confluence/manifest.yaml --json
Validation checks manifest shape, AI navigation manifest paths, relative paths, page files, optional comment sidecars, asset checksums, diagnostics JSONL, Markdown local links, and token-like markers in text outputs.
Development
uv sync --all-extras
uv run ruff check .
uv run pytest
uv build
uv run pull --help
uv run pull guide --json
uv run python tests/generate_fixture_output.py .tmp/generated-fixture
uv run pull validate .tmp/generated-fixture
Live smoke testing requires a readable Confluence page and credentials through PULL_* or CONFPUB_*.
Releasing
Versions are managed from src/pull_cli/__init__.py through Hatch. Use uv run hatch version patch, uv run hatch version minor, or uv run hatch version major; pull --version, built package metadata, and GitHub release tags are expected to match. See RELEASING.md for the PyPI trusted publisher setup and release flow.
License
MIT. See LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pull_cli-0.1.0.tar.gz.
File metadata
- Download URL: pull_cli-0.1.0.tar.gz
- Upload date:
- Size: 148.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0ef0f38a066a99f3cefe3680a3b21ebead4c2fc7ea0f4bef223124fa2cac9c4b
|
|
| MD5 |
1e6f8a88f585be1a78a0d66b7edd94d4
|
|
| BLAKE2b-256 |
7777e4e9366511bb21a6091a86893ac3e7850f5fde21bb8a956a62328ea464be
|
Provenance
The following attestation bundles were made for pull_cli-0.1.0.tar.gz:
Publisher:
publish.yml on ThomasRohde/pull-cli
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pull_cli-0.1.0.tar.gz -
Subject digest:
0ef0f38a066a99f3cefe3680a3b21ebead4c2fc7ea0f4bef223124fa2cac9c4b - Sigstore transparency entry: 1731908366
- Sigstore integration time:
-
Permalink:
ThomasRohde/pull-cli@cda3cc5d3a4c0023209e698245942d99eb26e389 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/ThomasRohde
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@cda3cc5d3a4c0023209e698245942d99eb26e389 -
Trigger Event:
release
-
Statement type:
File details
Details for the file pull_cli-0.1.0-py3-none-any.whl.
File metadata
- Download URL: pull_cli-0.1.0-py3-none-any.whl
- Upload date:
- Size: 55.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
18eed1b4edbfbdc87de700f1436b0a3140707b838d1f8b65f27dd236981157f3
|
|
| MD5 |
d5ef9e75655c14ee3648d68b29275aac
|
|
| BLAKE2b-256 |
72a601096539a6d76f8e4f7255ba414e84a41282d2908ce5eadf444b0fb8a925
|
Provenance
The following attestation bundles were made for pull_cli-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on ThomasRohde/pull-cli
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pull_cli-0.1.0-py3-none-any.whl -
Subject digest:
18eed1b4edbfbdc87de700f1436b0a3140707b838d1f8b65f27dd236981157f3 - Sigstore transparency entry: 1731908520
- Sigstore integration time:
-
Permalink:
ThomasRohde/pull-cli@cda3cc5d3a4c0023209e698245942d99eb26e389 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/ThomasRohde
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@cda3cc5d3a4c0023209e698245942d99eb26e389 -
Trigger Event:
release
-
Statement type: