Skip to main content

Content-extraction provider runtime for arcus — turn a URL or file into normalized markdown + structured metadata.

Project description

arcus-provider-runtime

The content-extraction kernel behind arcus: give it one URL or one file path, get back normalized markdown plus structured metadata. No vault, no database, no project awareness — a pure download + extraction layer you can drop into any pipeline (RAG ingest, knowledge bases, LLM context building).

Install

pip install "arcus-provider-runtime[html,pdf,office]"

Extras pull in only the heavy dependencies you need:

Extra Adds For
html playwright JS-rendered pages, X.com / LinkedIn, SPA articles
pdf pymupdf4llm PDF → markdown extraction
office python-docx, python-pptx, openpyxl DOCX / PPTX / XLSX / EPUB
all everything above

The base install (YouTube transcripts via yt-dlp) has no extras. The HTML provider also needs Chromium (python -m playwright install chromium) and node on PATH (the vendored html2md.mjs converter).

Use

from arcus.provider_runtime import Factory

result = Factory().run("https://example.com/article", out_dir="./out")
# result.markdown_path  → ./out/<slug>.md   (frontmatter + readable body)
# result.metadata_path  → ./out/<slug>.json (segments, timing, provenance)

One Factory.run() entry point dispatches to the right provider by inspecting the input. Providers live under arcus.provider_runtime.providers.<kind>/ and are individually registerable.

What it deliberately does NOT do

arcus has zero awareness of any consuming app's storage, topics, or wiki. One input in, one extracted artifact out. Vault-aware orchestration (dedup, cross-referencing, synthesis) belongs in the consumer, not here.

License

MIT © 2026 POLLEO.AI

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

arcus_provider_runtime-0.5.0.tar.gz (149.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

arcus_provider_runtime-0.5.0-py3-none-any.whl (74.0 kB view details)

Uploaded Python 3

File details

Details for the file arcus_provider_runtime-0.5.0.tar.gz.

File metadata

  • Download URL: arcus_provider_runtime-0.5.0.tar.gz
  • Upload date:
  • Size: 149.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for arcus_provider_runtime-0.5.0.tar.gz
Algorithm Hash digest
SHA256 f9c162e7be419190023862323b3e9f1685cecadd3e4329cd656489328c29e4a2
MD5 35d0abd5fdc10059801210ccd006cb2d
BLAKE2b-256 560623769362c55bb132a069332fb334a5b326b354466a7d6f1c6a657ad96739

See more details on using hashes here.

Provenance

The following attestation bundles were made for arcus_provider_runtime-0.5.0.tar.gz:

Publisher: release.yml on polleoai/arcus

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file arcus_provider_runtime-0.5.0-py3-none-any.whl.

File metadata

File hashes

Hashes for arcus_provider_runtime-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ddc06e18877a3a9ce3e3c1234d95cd758078900b2f589958ca2ff44ca844804c
MD5 64979d5b8ed07f79675f20f2f071bcf6
BLAKE2b-256 dbc0af740d1ca53f4dc1bd3b616b3c5327bc4dd9ab92230be9d88ac45781b37f

See more details on using hashes here.

Provenance

The following attestation bundles were made for arcus_provider_runtime-0.5.0-py3-none-any.whl:

Publisher: release.yml on polleoai/arcus

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page