Skip to main content

KAOS-native source discovery and materialization — filesystem, archive, HTTP, browser, plus REST connectors for Federal Register / eCFR / EDGAR / GovInfo / GLEIF and forensic parsers for VCard / EML / MBOX / PACER / EXIF

Project description

kaos-source

Part of Kelvin Agentic OS (KAOS) — open agentic infrastructure for legal work, built by 273 Ventures. See the full KAOS package map for the rest of the stack.

PyPI - Version Python License CI

kaos-source is the source discovery and materialization layer for KAOS — filesystem, archive, HTTP, and browser transport connectors, plus REST clients for the Federal Register, eCFR, EDGAR, GovInfo, and GLEIF, and forensic parsers for VCard, EML / MBOX email, PACER docket HTML, and image EXIF.

It is the layer between "I have a URL / path / docket number" and "give me a typed SourceDescriptor plus an artifact handle in kaos-core's VFS." Every fetch goes through a strict-by-default SSRF guard, every response body is size-capped, every archive iteration enforces decompression-ratio and symlink protection. Configurability lives in KAOS_SECURITY_* and KAOS_SOURCE_* env vars.

The base install carries only httpx, kaos-core, and pydantic — most of the heavy lifting (lxml, pillow, playwright, kaos-content, kaos-nlp-core) is gated behind opt-in extras ([browser], [content], [pacer]).

Install

uv add kaos-source
# or
pip install kaos-source

Optional extras (all additive — none of the base functionality requires them):

uv add 'kaos-source[browser]'   # Playwright-backed browser fetches
uv add 'kaos-source[content]'   # parse-into-ContentDocument bridges
uv add 'kaos-source[pacer]'     # lxml-backed PACER docket parser

kaos-source requires Python 3.13 or newer.

Quick start

Discover, preview, and materialize a local file through the in-memory SourceService:

import asyncio
from pathlib import Path

from kaos_core import KaosContext, KaosRuntime
from kaos_core.protocol.roots import Root
from kaos_source import (
    SourceDiscoverOptions,
    SourceLocator,
    SourcePreviewOptions,
    SourceService,
)


async def main() -> None:
    runtime = KaosRuntime()
    service = SourceService()  # registers the five default connectors
    workspace = Path.cwd()
    context = KaosContext.create(
        session_id="quickstart",
        runtime=runtime,
        roots=[Root(uri=workspace.as_uri(), name="cwd")],
    )

    page = await service.discover(
        SourceLocator.filesystem(workspace),
        context,
        SourceDiscoverOptions(limit=5, patterns=["*.py"]),
    )
    print([item.name for item in page.items])

    if page.items:
        preview = await service.preview(
            page.items[0].locator,
            context,
            SourcePreviewOptions(max_bytes=120),
        )
        print(preview.text_preview)


asyncio.run(main())

The same SourceService API also handles archive://, http(s)://, browser://, and memory:// locators — only the Root allowlist and the SSRF guard change behaviour per scheme.

Concepts

The package is organized around three layers — contracts, runtime, and domain-specific catalogues — that auto-register on import.

Concept What it is
SourceConnector / ApiConnector / SourceParser Three ABCs in kaos_source.base. Connectors handle URI-addressed transports (filesystem, archive, HTTP, browser, memory). API connectors handle parameterized REST APIs (Federal Register, eCFR, EDGAR, GovInfo, GLEIF). Parsers handle byte-stream formats (VCard, EML, MBOX, PACER, EXIF).
SourceLocator / SourceDescriptor The locator is the addressable input (SourceLocator.http("https://…"), SourceLocator.archive_member(path, "docs/x.pdf")). The descriptor is the metadata-first response: name, MIME, size, provenance, capability flags. Discovery is metadata-first by design — bodies don't load until materialize.
SourceService Runtime that routes operations across registered connectors. Subclasses of SourceConnector register themselves at import time via default_connector_registry. Custom connectors register explicitly with default_connector_registry.register(...).
SourceMaterialization The artifact-handle return type from service.materialize(...). Bodies move through kaos-core's artifact store, never inline. The descriptor's metadata carries archive_format, cik, lei, etc. depending on the connector.
KaosSourceHttpSettings and friends Per-connector ModuleSettings subclasses with the KAOS_SOURCE_* env prefix. Each carries connector-specific knobs (timeout, retry, allowed_hosts, EDGAR User-Agent, GovInfo SecretStr API key). All read from environment at edge of the call graph and thread through to the connector.
SSRF + size-cap guards The HTTP connector and every API client run through kaos_core.security.validate_outbound_url (per-request, including each redirect hop) and kaos_core.security.read_capped_json (streamed, with Content-Length pre-flight + running byte budget). Strict-by-default; configurable via KAOS_SECURITY_* env vars.

CLI

kaos-source ships a kaos-source administrative CLI plus a kaos-source-serve MCP launcher. Every structured command supports --json for machine-readable output:

kaos-source discover ./data/ --recursive --pattern "*.pdf"  # list sources
kaos-source preview document.pdf --max-bytes 2048           # bounded preview
kaos-source info document.pdf --json                        # source metadata
kaos-source materialize document.pdf --name my-artifact     # stage to artifact store
kaos-source inspect-archive bundle.zip                      # list archive members

kaos-source-serve --http --port 8765                        # MCP server (stdio default)

Compatibility & status

Aspect
Python 3.13, 3.14 (informational matrix entries for 3.14t free-threaded and 3.15-dev)
OS Linux, macOS, Windows (pure-Python wheel; no native code)
Maturity Alpha. The public API is documented in kaos_source.__all__ (56 symbols).
Stability policy Pre-1.0: minor bumps may change behaviour. Every change is documented in CHANGELOG.md. The MCP tool surface, KAOS_SOURCE_* and KAOS_SECURITY_* environment-variable namespaces are public API.
Test coverage 411 unit tests across connectors, API clients, parsers, settings, and security regressions. Live integration tests gated behind --include-live.
Type checker Validated with ty, Astral's Python type checker.

Companion packages

kaos-source is one of the packages in the Kelvin Agentic OS. The broader stack:

Package Layer What it does
kaos-core Core Foundational runtime, MCP-native types, registries, execution engine, VFS
kaos-content Core Typed document AST: Block/Inline, provenance, views
kaos-mcp Bridge FastMCP server, kaos management CLI, MCP resource templates
kaos-pdf Extraction PDF → AST with provenance
kaos-web Extraction Web extraction, browser automation, search, domain intelligence
kaos-office Extraction DOCX / PPTX / XLSX readers + writers to AST
kaos-tabular Extraction DuckDB-powered SQL analytics
kaos-source Data Government + financial data connectors (Federal Register, eCFR, EDGAR, GovInfo, PACER, GLEIF)
kaos-llm-client LLM Multi-provider LLM transport
kaos-llm-core LLM Typed LLM programming (Signatures, Programs, Optimizers)
kaos-nlp-core Primitives (Rust) High-performance NLP primitives
kaos-nlp-transformers ML Dense embeddings + retrieval
kaos-graph Primitives (Rust) Graph algorithms + RDF/SPARQL
kaos-ml-core Primitives (Rust) Classical ML on the document AST
kaos-citations Legal Legal citation extraction, resolution, verification
kaos-agents Agentic Agent runtime, memory, recipes
kaos-reference Sample Reference module for module authors

Packages depend on kaos-core; everything else is opt-in. Mix and match the ones you need.

Development

git clone https://github.com/273v/kaos-source
cd kaos-source
uv sync --group dev

Install pre-commit hooks (recommended — they run the same checks as CI on every commit, scoped to staged files):

uvx pre-commit install
uvx pre-commit run --all-files     # one-time full sweep

Manual QA commands (the same set CI runs):

uv run ruff format --check kaos_source tests
uv run ruff check kaos_source tests
uv run ty check kaos_source tests
uv run pytest -m "not live and not network and not slow"

Build from source

uv build
uv pip install dist/*.whl

Contributing

Issues and pull requests are welcome. By contributing you certify the Developer Certificate of Origin v1.1 — sign every commit with git commit -s. Please open an issue before starting on a non-trivial change so we can align on scope.

Security

For security issues, please do not file a public issue. Report privately via GitHub Private Vulnerability Reporting or email security@273ventures.com. See SECURITY.md for the full disclosure policy.

License

Apache License 2.0 — see LICENSE and NOTICE.

Copyright 2026 273 Ventures LLC. Built for kelvin.legal.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kaos_source-0.1.0a2.tar.gz (223.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kaos_source-0.1.0a2-py3-none-any.whl (165.6 kB view details)

Uploaded Python 3

File details

Details for the file kaos_source-0.1.0a2.tar.gz.

File metadata

  • Download URL: kaos_source-0.1.0a2.tar.gz
  • Upload date:
  • Size: 223.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for kaos_source-0.1.0a2.tar.gz
Algorithm Hash digest
SHA256 c6602edd12608024192b2baa0fc5018371b1e770787ddb801bebca964917b370
MD5 2943832ca9240e1d6ad095ce699bc17e
BLAKE2b-256 705b04c05e5a22a9d69b800ac1b26758fe0f115138b5052b85c6af4f0b8e94d3

See more details on using hashes here.

Provenance

The following attestation bundles were made for kaos_source-0.1.0a2.tar.gz:

Publisher: release.yml on 273v/kaos-source

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file kaos_source-0.1.0a2-py3-none-any.whl.

File metadata

  • Download URL: kaos_source-0.1.0a2-py3-none-any.whl
  • Upload date:
  • Size: 165.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for kaos_source-0.1.0a2-py3-none-any.whl
Algorithm Hash digest
SHA256 1ebf7927666a91a70133febb3da751fadab5f285d0f835254b1fb8ec9dc2fbb5
MD5 efb83576014da305a6ed29eab69724ab
BLAKE2b-256 58b4de358acc878e984557ee2f1570370e3789a0bb2d93c7858a5e0ff3531e66

See more details on using hashes here.

Provenance

The following attestation bundles were made for kaos_source-0.1.0a2-py3-none-any.whl:

Publisher: release.yml on 273v/kaos-source

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page