Deterministic B2B context router. Zero keys. Schema-locked JSON your agent pipelines can actually trust.
Project description
companyctx
Deterministic B2B context router. Zero keys. Schema-locked JSON your agent pipelines can actually trust.
pipx install companyctx # v0.1 coming soon — see Status below
companyctx acme-bakery.com --json
{
"status": "ok",
"data": {
"site": "acme-bakery.com",
"fetched_at": "2026-04-20T18:42:11Z",
"pages": { "homepage_text": "...", "services": ["cakes", "catering"], "tech_stack": ["WordPress", "Elementor"] },
"reviews": { "count": 142, "rating": 4.6, "source": "reviews_google_places" },
"social": { "handles": { "instagram": "@acmebakery" }, "follower_counts": {} },
"signals": { "copyright_year": 2024, "last_blog_post_at": "2026-02-11T00:00:00Z", "team_size_claim": "team of 6" }
},
"provenance": {
"site_text_trafilatura": { "status": "ok", "latency_ms": 412, "error": null, "provider_version": "0.1.0" },
"reviews_google_places": { "status": "not_configured","latency_ms": 0, "error": "GOOGLE_PLACES_API_KEY not set", "provider_version": "0.1.0" }
}
}
One site in. One schema-locked JSON object out. No API keys for the zero-key path. Graceful partials on anti-bot blocks. A local SQLite cache that compounds into a queryable B2B dataset over time.
Status: v0.1 in development. Not yet on PyPI. Schema + CLI surface are committed in
docs/SPEC.mdanddocs/SCHEMA.md; architecture indocs/ARCHITECTURE.md.
What this is (and isn't)
IS
- A schema-locked context router. The Pydantic v2 envelope is the product. Providers come and go; the contract agents consume does not.
- A Deterministic Waterfall. Zero-key stealth fetch first → smart-proxy provider (if configured) → direct-API provider (if configured). Every attempt returns the same shape.
- A local-first memory layer. The SQLite cache is not just speed — it compounds into a queryable local B2B dataset as a byproduct of normal use.
- A narrow muscle in the brains-and-muscles pattern. Your frontier
model is the brain;
companyctxis one of many CLIs it pipes through.
ISN'T
- Not a scraper competing on scale. Residential-proxy / headless-browser
infrastructure is a commodity layer — we compose it via a
SmartProxyProviderinterface, we don't out-build it. - Not an agent framework. Orchestration lives upstream.
- Not a hosted service. Local pipx CLI. No rented infra. No credits.
- Not a synthesis engine. Our output is the input for synthesis.
- Not a Cloudflare bypass. Zero-key covers the majority of small-biz homepages. It won't defeat serious anti-bot — see the coverage matrix.
- Not a people-data tool. Companies only. Contact enrichment belongs upstream (Apollo, Clearbit, manual).
- Not an MCP server — ever, in our roadmap. MCP's ~50k-token schema
dump defeats a muscle built to save tokens. Agents find us via
SKILL.md; the CLI +jq+ stdout are the composition layer. Seedecisions/2026-04-20-skill-md-not-mcp.md.
The Deterministic Waterfall
Attempt 1 — Zero-key stealth (TLS+HTTP/2 impersonation + trafilatura + extruct)
↓ 403 / challenge / timeout?
Attempt 2 — Smart-proxy provider (user-keyed, vendor-agnostic)
↓ still blocked?
Attempt 3 — Direct-API provider (user-keyed — Google Places, Yelp Fusion, YouTube)
↓
{ status, data, provenance, error?, suggestion? }
Every attempt maps to the same Pydantic schema. Downstream pipelines
never branch on which attempt succeeded — they branch on the envelope's
status: ok | partial | degraded.
On full block with no Attempt-2/3 providers configured:
{
"status": "partial",
"data": { "site": "example.com", "fetched_at": "...", "pages": null, "reviews": null, ... },
"provenance": { "site_text_trafilatura": { "status": "failed", "error": "blocked_by_antibot (HTTP 403)", ... } },
"error": "blocked_by_antibot",
"suggestion": "configure a smart-proxy provider key or skip this prospect"
}
Never raises. Never crashes your pipeline. Every run comes back well-formed.
See docs/ARCHITECTURE.md for the full picture and
docs/ZERO-KEY.md for honest anti-bot scoping.
Zero-key coverage — honestly
| Site class | Zero-key outcome |
|---|---|
| Small-biz WordPress / Squarespace / Wix / Webflow / agency custom | Full payload. Expected status: "ok" on the majority of prospects in this segment.* |
| Cloudflare Turnstile / DataDome / Akamai / PerimeterX | Often blocked. Returns status: "partial" with actionable suggestion. |
| JS-heavy SPAs needing a real browser | HTML shell only. Render-dependent fields come back null. Configure a smart-proxy provider to fill the gap. |
| Aggregator pages (Yelp / Houzz / G2 / Birdeye) | Not the target — use the direct-API providers (reviews_google_places, reviews_yelp_fusion) instead. |
* Exact coverage number lands in
docs/ZERO-KEY.md after the M1 stealth-fetcher spike
against the 30-prospect fixtures corpus. Numbers come from measurement, not
marketing.
Brains-and-muscles pipe
companyctx acme-bakery.com --json \
| jq '.data | {site, signals, reviews}' \
| claude -p "write a 6-section outreach brief from this context"
import json, subprocess
ctx = json.loads(subprocess.check_output(["companyctx", "acme-bakery.com", "--json"]))
if ctx["status"] == "partial":
print(f"heads up: {ctx['error']} — {ctx['suggestion']}")
brief = synthesize(ctx["data"]) # your synthesis call, your prompts, your weights
companyctx never calls an LLM. The brain upstream decides what the
context means.
Install (during v0.1 dev)
git clone https://github.com/dmthepm/companyctx.git
cd companyctx
pip install -e ".[dev,extract,reviews,youtube]"
companyctx --help
Once v0.1.0 ships:
pipx install companyctx
companyctx fetch example.com --mock --json
Design invariants
- Schema is the product. Providers are replaceable; the
CompanyContextenvelope is not. Raw observations only — inference lives in the downstream synthesis layer. - Graceful-partial always. Providers never raise uncaught. Every
failure maps to
ProviderRunMetadata.statusper provider and the top-levelstatuson the envelope. - Vertical Memory. Every run persists the full normalized payload to
SQLite under XDG paths.
--refreshforces a re-fetch;--from-cacheis a cache-only read. Acompanyctx query ...DSL on the cache is v0.2 scope, not v0.1. - Provider pluggability. Every deterministic call class is discovered
via Python entry points (
companyctx.providers). Day-one providers include bus-factor fallbacks (trafilatura+readability-lxmlboth wired for site text). Seedocs/PROVIDERS.md. - robots.txt respected by default.
--ignore-robotsis an explicit CLI-only flag; never settable via TOML or env. - Deterministic mocks.
fixtures/<site>/drives--mock; re-runs produce byte-identical output modulofetched_at.
Providers (committed for v0.1)
| Slug | Layer | Category | Key | Cost |
|---|---|---|---|---|
site_text_trafilatura |
Zero-key | site_text | — | free |
site_text_readability |
Zero-key | site_text (fallback) | — | free |
site_meta_extruct |
Zero-key | site_meta | — | free |
social_discovery_site |
Zero-key | social_discovery | — | free |
signals_site_heuristic |
Zero-key | signals | — | free |
reviews_google_places |
Direct-API | reviews | GOOGLE_PLACES_API_KEY |
per-1k |
reviews_yelp_fusion |
Direct-API | reviews | YELP_API_KEY |
per-call |
social_counts_youtube |
Direct-API | social_counts | YOUTUBE_API_KEY |
free w/ quota |
mentions_brave_stub |
Direct-API | mentions | BRAVE_SEARCH_API_KEY |
per-call |
Full table + SmartProxyProvider interface in
docs/PROVIDERS.md.
Layout
companyctx/ # package
cli.py # Typer app
schema.py # pydantic v2 models — the JSON contract
config.py # pydantic-settings + TOML, XDG-compliant paths
cache.py # SQLite fetch cache (Vertical Memory)
http.py # stealth fetcher foundation
robots.py # robots.txt enforcement
providers/
__init__.py # plugin loader (importlib.metadata.entry_points)
base.py # ProviderBase, ProviderError, ProviderRunMetadata
SKILL.md # ~150-token agent-discovery surface (not MCP)
docs/
SPEC.md # frozen v0.1 spec snapshot
SCHEMA.md # Pydantic envelope in detail
ARCHITECTURE.md # brains-and-muscles + Deterministic Waterfall + Vertical Memory
ZERO-KEY.md # honest anti-bot coverage + graceful-partial contract
PROVIDERS.md # provider list + SmartProxyProvider interface
VALIDATION.md # two-phase acceptance protocol
REFERENCES.md # upstream OSS deps
decisions/ # in-repo ADRs (walks-the-walk for OSS readers)
fixtures/ # per-site raw HTML + API responses + expected.json
tests/ # pytest, hypothesis where useful
Contributing
See CONTRIBUTING.md. Conventional commits, ~400 LOC per
PR, one PR per milestone or provider. ruff + mypy strict + pytest ≥70% cov.
Architecture-shape changes go through decisions/ first.
License
MIT. Copyright 2026 Noontide Collective LLC.
Support
companyctx is open source and free to use. If it earns a place in your
pipeline, consider supporting development by joining Noontide's Main Branch
community on Skool (there's a free trial): https://skool.com/main.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file companyctx-0.1.0.dev0.tar.gz.
File metadata
- Download URL: companyctx-0.1.0.dev0.tar.gz
- Upload date:
- Size: 19.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5701e9c6952892849448f5555ecea6dabce0f4a8180af6ff5c7d0aa2d48cc190
|
|
| MD5 |
2a442d0e26305f444495ed8528a41e10
|
|
| BLAKE2b-256 |
efd636ac2326e1496086a7acb272a3acf67f804bdb8887cce2fb480af688b725
|
Provenance
The following attestation bundles were made for companyctx-0.1.0.dev0.tar.gz:
Publisher:
publish.yml on dmthepm/companyctx
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
companyctx-0.1.0.dev0.tar.gz -
Subject digest:
5701e9c6952892849448f5555ecea6dabce0f4a8180af6ff5c7d0aa2d48cc190 - Sigstore transparency entry: 1344090042
- Sigstore integration time:
-
Permalink:
dmthepm/companyctx@3e1caa6b6b9fe4a75a80667b9cf440a48cb5362e -
Branch / Tag:
refs/tags/v0.1.0.dev0 - Owner: https://github.com/dmthepm
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@3e1caa6b6b9fe4a75a80667b9cf440a48cb5362e -
Trigger Event:
release
-
Statement type:
File details
Details for the file companyctx-0.1.0.dev0-py3-none-any.whl.
File metadata
- Download URL: companyctx-0.1.0.dev0-py3-none-any.whl
- Upload date:
- Size: 15.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
76b67f5f5bb3fdf1fb31cd2983768ff641ad0747c9add358b7c370e4425891ca
|
|
| MD5 |
4684f5e524d10268a4629628668cfd00
|
|
| BLAKE2b-256 |
bf96a83623ea0e29e3f27b6bd333478e20bf63b79d402ddf40e52b736d0af2bf
|
Provenance
The following attestation bundles were made for companyctx-0.1.0.dev0-py3-none-any.whl:
Publisher:
publish.yml on dmthepm/companyctx
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
companyctx-0.1.0.dev0-py3-none-any.whl -
Subject digest:
76b67f5f5bb3fdf1fb31cd2983768ff641ad0747c9add358b7c370e4425891ca - Sigstore transparency entry: 1344090095
- Sigstore integration time:
-
Permalink:
dmthepm/companyctx@3e1caa6b6b9fe4a75a80667b9cf440a48cb5362e -
Branch / Tag:
refs/tags/v0.1.0.dev0 - Owner: https://github.com/dmthepm
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@3e1caa6b6b9fe4a75a80667b9cf440a48cb5362e -
Trigger Event:
release
-
Statement type: