Skip to main content

Content-extraction provider runtime for arcus — turn a URL or file into normalized markdown + structured metadata.

Project description

arcus-provider-runtime

The content-extraction kernel behind arcus: give it one URL or one file path, get back normalized markdown plus structured metadata. No vault, no database, no project awareness — a pure download + extraction layer you can drop into any pipeline (RAG ingest, knowledge bases, LLM context building).

Install

pip install "arcus-provider-runtime[html,pdf,office]"

Extras pull in only the heavy dependencies you need:

Extra Adds For
html playwright JS-rendered pages, X.com / LinkedIn, SPA articles
pdf pymupdf4llm PDF → markdown extraction
office python-docx, python-pptx, openpyxl DOCX / PPTX / XLSX / EPUB
all everything above

The base install (YouTube transcripts via yt-dlp) has no extras. The HTML provider also needs Chromium (python -m playwright install chromium) and node on PATH (the vendored html2md.mjs converter).

Use

from arcus.provider_runtime import Factory

result = Factory().run("https://example.com/article", out_dir="./out")
# result.markdown_path  → ./out/<slug>.md   (frontmatter + readable body)
# result.metadata_path  → ./out/<slug>.json (segments, timing, provenance)

One Factory.run() entry point dispatches to the right provider by inspecting the input. Providers live under arcus.provider_runtime.providers.<kind>/ and are individually registerable.

What it deliberately does NOT do

arcus has zero awareness of any consuming app's storage, topics, or wiki. One input in, one extracted artifact out. Vault-aware orchestration (dedup, cross-referencing, synthesis) belongs in the consumer, not here.

License

MIT © 2026 POLLEO.AI

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

arcus_provider_runtime-0.3.1.tar.gz (125.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

arcus_provider_runtime-0.3.1-py3-none-any.whl (57.8 kB view details)

Uploaded Python 3

File details

Details for the file arcus_provider_runtime-0.3.1.tar.gz.

File metadata

  • Download URL: arcus_provider_runtime-0.3.1.tar.gz
  • Upload date:
  • Size: 125.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for arcus_provider_runtime-0.3.1.tar.gz
Algorithm Hash digest
SHA256 38900fa91724821be8227deb86afde66b51b338f1f57ff535f196ae40ced64ee
MD5 4a021fe63805c5ed0c79a9c772536397
BLAKE2b-256 2896fedb201ddf47517757f5b6c69de5af5b6c91ad59db10ea599201332816a9

See more details on using hashes here.

Provenance

The following attestation bundles were made for arcus_provider_runtime-0.3.1.tar.gz:

Publisher: release.yml on polleoai/arcus

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file arcus_provider_runtime-0.3.1-py3-none-any.whl.

File metadata

File hashes

Hashes for arcus_provider_runtime-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e8a817b9d39c5cbf385fce01d1a3e6852ab62541c4287adb05a34fa3c66a8721
MD5 23b95519f06bd1dfbeed4a32624c0738
BLAKE2b-256 ca9e97bd7a3d7196dec942f3544c7cb59edb38d318d634cd26345d269ab535b1

See more details on using hashes here.

Provenance

The following attestation bundles were made for arcus_provider_runtime-0.3.1-py3-none-any.whl:

Publisher: release.yml on polleoai/arcus

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page