KAOS-native source discovery and materialization — filesystem, archive, HTTP, browser, plus REST connectors for Federal Register / eCFR / EDGAR / GovInfo / GLEIF and forensic parsers for VCard / EML / MBOX / PACER / EXIF
Project description
kaos-source
Part of Kelvin Agentic OS (KAOS) — open agentic infrastructure for legal work, built by 273 Ventures. See the full KAOS package map for the rest of the stack.
kaos-source is the source discovery and materialization layer for KAOS —
filesystem, archive, HTTP, and browser transport connectors, plus REST clients
for the Federal Register, eCFR, EDGAR, GovInfo, and GLEIF, and forensic
parsers for VCard, EML / MBOX email, PACER docket HTML, and image EXIF.
It is the layer between "I have a URL / path / docket number" and "give me a
typed SourceDescriptor plus an artifact handle in kaos-core's VFS." Every
fetch goes through a strict-by-default SSRF guard, every response body is
size-capped, every archive iteration enforces decompression-ratio and
symlink protection. Configurability lives in KAOS_SECURITY_* and
KAOS_SOURCE_* env vars.
The base install carries only httpx, kaos-core, and pydantic — most of
the heavy lifting (lxml, pillow, playwright, kaos-content, kaos-nlp-core)
is gated behind opt-in extras ([browser], [content], [pacer]).
Install
uv add kaos-source
# or
pip install kaos-source
Optional extras (all additive — none of the base functionality requires them):
uv add 'kaos-source[browser]' # Playwright-backed browser fetches
uv add 'kaos-source[content]' # parse-into-ContentDocument bridges
uv add 'kaos-source[pacer]' # lxml-backed PACER docket parser
kaos-source requires Python 3.13 or newer.
Quick start
Discover, preview, and materialize a local file through the in-memory
SourceService:
import asyncio
from pathlib import Path
from kaos_core import KaosContext, KaosRuntime
from kaos_core.protocol.roots import Root
from kaos_source import (
SourceDiscoverOptions,
SourceLocator,
SourcePreviewOptions,
SourceService,
)
async def main() -> None:
runtime = KaosRuntime()
service = SourceService() # registers the five default connectors
workspace = Path.cwd()
context = KaosContext.create(
session_id="quickstart",
runtime=runtime,
roots=[Root(uri=workspace.as_uri(), name="cwd")],
)
page = await service.discover(
SourceLocator.filesystem(workspace),
context,
SourceDiscoverOptions(limit=5, patterns=["*.py"]),
)
print([item.name for item in page.items])
if page.items:
preview = await service.preview(
page.items[0].locator,
context,
SourcePreviewOptions(max_bytes=120),
)
print(preview.text_preview)
asyncio.run(main())
The same SourceService API also handles archive://, http(s)://,
browser://, and memory:// locators — only the Root allowlist and the
SSRF guard change behaviour per scheme.
Concepts
The package is organized around three layers — contracts, runtime, and domain-specific catalogues — that auto-register on import.
| Concept | What it is |
|---|---|
SourceConnector / ApiConnector / SourceParser |
Three ABCs in kaos_source.base. Connectors handle URI-addressed transports (filesystem, archive, HTTP, browser, memory). API connectors handle parameterized REST APIs (Federal Register, eCFR, EDGAR, GovInfo, GLEIF). Parsers handle byte-stream formats (VCard, EML, MBOX, PACER, EXIF). |
SourceLocator / SourceDescriptor |
The locator is the addressable input (SourceLocator.http("https://…"), SourceLocator.archive_member(path, "docs/x.pdf")). The descriptor is the metadata-first response: name, MIME, size, provenance, capability flags. Discovery is metadata-first by design — bodies don't load until materialize. |
SourceService |
Runtime that routes operations across registered connectors. Subclasses of SourceConnector register themselves at import time via default_connector_registry. Custom connectors register explicitly with default_connector_registry.register(...). |
SourceMaterialization |
The artifact-handle return type from service.materialize(...). Bodies move through kaos-core's artifact store, never inline. The descriptor's metadata carries archive_format, cik, lei, etc. depending on the connector. |
KaosSourceHttpSettings and friends |
Per-connector ModuleSettings subclasses with the KAOS_SOURCE_* env prefix. Each carries connector-specific knobs (timeout, retry, allowed_hosts, EDGAR User-Agent, GovInfo SecretStr API key). All read from environment at edge of the call graph and thread through to the connector. |
| SSRF + size-cap guards | The HTTP connector and every API client run through kaos_core.security.validate_outbound_url (per-request, including each redirect hop) and kaos_core.security.read_capped_json (streamed, with Content-Length pre-flight + running byte budget). Strict-by-default; configurable via KAOS_SECURITY_* env vars. |
CLI
kaos-source ships a kaos-source administrative CLI plus a
kaos-source-serve MCP launcher. Every structured command supports
--json for machine-readable output:
kaos-source discover ./data/ --recursive --pattern "*.pdf" # list sources
kaos-source preview document.pdf --max-bytes 2048 # bounded preview
kaos-source info document.pdf --json # source metadata
kaos-source materialize document.pdf --name my-artifact # stage to artifact store
kaos-source inspect-archive bundle.zip # list archive members
kaos-source-serve --http --port 8765 # MCP server (stdio default)
Compatibility & status
| Aspect | |
|---|---|
| Python | 3.13, 3.14 (informational matrix entries for 3.14t free-threaded and 3.15-dev) |
| OS | Linux, macOS, Windows (pure-Python wheel; no native code) |
| Maturity | Alpha. The public API is documented in kaos_source.__all__ (56 symbols). |
| Stability policy | Pre-1.0: minor bumps may change behaviour. Every change is documented in CHANGELOG.md. The MCP tool surface, KAOS_SOURCE_* and KAOS_SECURITY_* environment-variable namespaces are public API. |
| Test coverage | 411 unit tests across connectors, API clients, parsers, settings, and security regressions. Live integration tests gated behind --include-live. |
| Type checker | Validated with ty, Astral's Python type checker. |
Companion packages
kaos-source is one of the packages in the
Kelvin Agentic OS. The broader stack:
| Package | Layer | What it does |
|---|---|---|
kaos-core |
Core | Foundational runtime, MCP-native types, registries, execution engine, VFS |
kaos-content |
Core | Typed document AST: Block/Inline, provenance, views |
kaos-mcp |
Bridge | FastMCP server, kaos management CLI, MCP resource templates |
kaos-pdf |
Extraction | PDF → AST with provenance |
kaos-web |
Extraction | Web extraction, browser automation, search, domain intelligence |
kaos-office |
Extraction | DOCX / PPTX / XLSX readers + writers to AST |
kaos-tabular |
Extraction | DuckDB-powered SQL analytics |
kaos-source |
Data | Government + financial data connectors (Federal Register, eCFR, EDGAR, GovInfo, PACER, GLEIF) |
kaos-llm-client |
LLM | Multi-provider LLM transport |
kaos-llm-core |
LLM | Typed LLM programming (Signatures, Programs, Optimizers) |
kaos-nlp-core |
Primitives (Rust) | High-performance NLP primitives |
kaos-nlp-transformers |
ML | Dense embeddings + retrieval |
kaos-graph |
Primitives (Rust) | Graph algorithms + RDF/SPARQL |
kaos-ml-core |
Primitives (Rust) | Classical ML on the document AST |
kaos-citations |
Legal | Legal citation extraction, resolution, verification |
kaos-agents |
Agentic | Agent runtime, memory, recipes |
kaos-reference |
Sample | Reference module for module authors |
Packages depend on kaos-core; everything else is opt-in. Mix and match the
ones you need.
Development
git clone https://github.com/273v/kaos-source
cd kaos-source
uv sync --group dev
Install pre-commit hooks (recommended — they run the same checks as CI on every commit, scoped to staged files):
uvx pre-commit install
uvx pre-commit run --all-files # one-time full sweep
Manual QA commands (the same set CI runs):
uv run ruff format --check kaos_source tests
uv run ruff check kaos_source tests
uv run ty check kaos_source tests
uv run pytest -m "not live and not network and not slow"
Build from source
uv build
uv pip install dist/*.whl
Contributing
Issues and pull requests are welcome. By contributing you certify the
Developer Certificate of Origin v1.1 —
sign every commit with git commit -s. Please open an issue before starting
on a non-trivial change so we can align on scope.
Security
For security issues, please do not file a public issue. Report privately via GitHub Private Vulnerability Reporting or email security@273ventures.com. See SECURITY.md for the full disclosure policy.
License
Apache License 2.0 — see LICENSE and NOTICE.
Copyright 2026 273 Ventures LLC. Built for kelvin.legal.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file kaos_source-0.1.0a2.tar.gz.
File metadata
- Download URL: kaos_source-0.1.0a2.tar.gz
- Upload date:
- Size: 223.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c6602edd12608024192b2baa0fc5018371b1e770787ddb801bebca964917b370
|
|
| MD5 |
2943832ca9240e1d6ad095ce699bc17e
|
|
| BLAKE2b-256 |
705b04c05e5a22a9d69b800ac1b26758fe0f115138b5052b85c6af4f0b8e94d3
|
Provenance
The following attestation bundles were made for kaos_source-0.1.0a2.tar.gz:
Publisher:
release.yml on 273v/kaos-source
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
kaos_source-0.1.0a2.tar.gz -
Subject digest:
c6602edd12608024192b2baa0fc5018371b1e770787ddb801bebca964917b370 - Sigstore transparency entry: 1473413485
- Sigstore integration time:
-
Permalink:
273v/kaos-source@2659fe3b1a2fd65b607d8f2f9734fdae8f5396d2 -
Branch / Tag:
refs/tags/v0.1.0a2 - Owner: https://github.com/273v
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@2659fe3b1a2fd65b607d8f2f9734fdae8f5396d2 -
Trigger Event:
push
-
Statement type:
File details
Details for the file kaos_source-0.1.0a2-py3-none-any.whl.
File metadata
- Download URL: kaos_source-0.1.0a2-py3-none-any.whl
- Upload date:
- Size: 165.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1ebf7927666a91a70133febb3da751fadab5f285d0f835254b1fb8ec9dc2fbb5
|
|
| MD5 |
efb83576014da305a6ed29eab69724ab
|
|
| BLAKE2b-256 |
58b4de358acc878e984557ee2f1570370e3789a0bb2d93c7858a5e0ff3531e66
|
Provenance
The following attestation bundles were made for kaos_source-0.1.0a2-py3-none-any.whl:
Publisher:
release.yml on 273v/kaos-source
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
kaos_source-0.1.0a2-py3-none-any.whl -
Subject digest:
1ebf7927666a91a70133febb3da751fadab5f285d0f835254b1fb8ec9dc2fbb5 - Sigstore transparency entry: 1473413538
- Sigstore integration time:
-
Permalink:
273v/kaos-source@2659fe3b1a2fd65b607d8f2f9734fdae8f5396d2 -
Branch / Tag:
refs/tags/v0.1.0a2 - Owner: https://github.com/273v
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@2659fe3b1a2fd65b607d8f2f9734fdae8f5396d2 -
Trigger Event:
push
-
Statement type: