Content-extraction provider runtime for arcus — turn a URL or file into normalized markdown + structured metadata.
Project description
arcus-provider-runtime
The content-extraction kernel behind arcus: give it one URL or one file path, get back normalized markdown plus structured metadata. No vault, no database, no project awareness — a pure download + extraction layer you can drop into any pipeline (RAG ingest, knowledge bases, LLM context building).
Install
pip install "arcus-provider-runtime[html,pdf,office]"
Extras pull in only the heavy dependencies you need:
| Extra | Adds | For |
|---|---|---|
html |
playwright |
JS-rendered pages, X.com / LinkedIn, SPA articles |
pdf |
pymupdf4llm |
PDF → markdown extraction |
office |
python-docx, python-pptx, openpyxl |
DOCX / PPTX / XLSX / EPUB |
all |
everything above | — |
The base install (YouTube transcripts via yt-dlp) has no extras. The HTML
provider also needs Chromium (python -m playwright install chromium) and
node on PATH (the vendored html2md.mjs converter).
Use
from arcus.provider_runtime import Factory
result = Factory().run("https://example.com/article", out_dir="./out")
# result.markdown_path → ./out/<slug>.md (frontmatter + readable body)
# result.metadata_path → ./out/<slug>.json (segments, timing, provenance)
One Factory.run() entry point dispatches to the right provider by inspecting
the input. Providers live under
arcus.provider_runtime.providers.<kind>/ and are individually registerable.
What it deliberately does NOT do
arcus has zero awareness of any consuming app's storage, topics, or wiki. One input in, one extracted artifact out. Vault-aware orchestration (dedup, cross-referencing, synthesis) belongs in the consumer, not here.
License
MIT © 2026 POLLEO.AI
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file arcus_provider_runtime-0.6.0.tar.gz.
File metadata
- Download URL: arcus_provider_runtime-0.6.0.tar.gz
- Upload date:
- Size: 151.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
be3efbc4322a8fe7391b1c63b35fe20d2ddb0cf81874284b3d2f7efbf8fbe4bf
|
|
| MD5 |
6b3ece968b7338b88665f2b79922bfaa
|
|
| BLAKE2b-256 |
062a402c12e5c880a5525e980e66f2de5e53c3f10ae4977f5b6703a3b592218d
|
Provenance
The following attestation bundles were made for arcus_provider_runtime-0.6.0.tar.gz:
Publisher:
release.yml on polleoai/arcus
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
arcus_provider_runtime-0.6.0.tar.gz -
Subject digest:
be3efbc4322a8fe7391b1c63b35fe20d2ddb0cf81874284b3d2f7efbf8fbe4bf - Sigstore transparency entry: 1637011242
- Sigstore integration time:
-
Permalink:
polleoai/arcus@017ca222073bf4eb19404fac28d9524499f31ed6 -
Branch / Tag:
refs/tags/0.6.0 - Owner: https://github.com/polleoai
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@017ca222073bf4eb19404fac28d9524499f31ed6 -
Trigger Event:
push
-
Statement type:
File details
Details for the file arcus_provider_runtime-0.6.0-py3-none-any.whl.
File metadata
- Download URL: arcus_provider_runtime-0.6.0-py3-none-any.whl
- Upload date:
- Size: 75.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bb15e955557eddd94d66db97f8b64bb24028b0494017e352b9be645f159d6a21
|
|
| MD5 |
1b6b9c04b7f286accdff5d28defc1c3d
|
|
| BLAKE2b-256 |
9e7d2ff71d9b6385188ee494322a65d338fcab9b48450117860f06c234c4c055
|
Provenance
The following attestation bundles were made for arcus_provider_runtime-0.6.0-py3-none-any.whl:
Publisher:
release.yml on polleoai/arcus
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
arcus_provider_runtime-0.6.0-py3-none-any.whl -
Subject digest:
bb15e955557eddd94d66db97f8b64bb24028b0494017e352b9be645f159d6a21 - Sigstore transparency entry: 1637011371
- Sigstore integration time:
-
Permalink:
polleoai/arcus@017ca222073bf4eb19404fac28d9524499f31ed6 -
Branch / Tag:
refs/tags/0.6.0 - Owner: https://github.com/polleoai
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@017ca222073bf4eb19404fac28d9524499f31ed6 -
Trigger Event:
push
-
Statement type: