Skip to main content

Deterministic B2B context router. Zero keys. Schema-locked JSON your agent pipelines can actually trust.

Project description

companyctx

Deterministic B2B context router. Zero keys. Schema-locked JSON your agent pipelines can actually trust.

pipx install companyctx          # v0.1 coming soon — see Status below
companyctx acme-bakery.com --json
{
  "status": "ok",
  "data": {
    "site": "acme-bakery.com",
    "fetched_at": "2026-04-20T18:42:11Z",
    "pages":   { "homepage_text": "...", "services": ["cakes", "catering"], "tech_stack": ["WordPress", "Elementor"] },
    "reviews": { "count": 142, "rating": 4.6, "source": "reviews_google_places" },
    "social":  { "handles": { "instagram": "@acmebakery" }, "follower_counts": {} },
    "signals": { "copyright_year": 2024, "last_blog_post_at": "2026-02-11T00:00:00Z", "team_size_claim": "team of 6" }
  },
  "provenance": {
    "site_text_trafilatura": { "status": "ok",            "latency_ms": 412, "error": null,                       "provider_version": "0.1.0" },
    "reviews_google_places": { "status": "not_configured","latency_ms": 0,   "error": "GOOGLE_PLACES_API_KEY not set", "provider_version": "0.1.0" }
  }
}

One site in. One schema-locked JSON object out. No API keys for the zero-key path. Graceful partials on anti-bot blocks. A local SQLite cache that compounds into a queryable B2B dataset over time.

Status: v0.1 in development. Not yet on PyPI. Schema + CLI surface are committed in docs/SPEC.md and docs/SCHEMA.md; architecture in docs/ARCHITECTURE.md.

What this is (and isn't)

IS

  • A schema-locked context router. The Pydantic v2 envelope is the product. Providers come and go; the contract agents consume does not.
  • A Deterministic Waterfall. Zero-key stealth fetch first → smart-proxy provider (if configured) → direct-API provider (if configured). Every attempt returns the same shape.
  • A local-first memory layer. The SQLite cache is not just speed — it compounds into a queryable local B2B dataset as a byproduct of normal use.
  • A narrow muscle in the brains-and-muscles pattern. Your frontier model is the brain; companyctx is one of many CLIs it pipes through.

ISN'T

  • Not a scraper competing on scale. Residential-proxy / headless-browser infrastructure is a commodity layer — we compose it via a SmartProxyProvider interface, we don't out-build it.
  • Not an agent framework. Orchestration lives upstream.
  • Not a hosted service. Local pipx CLI. No rented infra. No credits.
  • Not a synthesis engine. Our output is the input for synthesis.
  • Not a Cloudflare bypass. Zero-key covers the majority of small-biz homepages. It won't defeat serious anti-bot — see the coverage matrix.
  • Not a people-data tool. Companies only. Contact enrichment belongs upstream (Apollo, Clearbit, manual).
  • Not an MCP server — ever, in our roadmap. MCP's ~50k-token schema dump defeats a muscle built to save tokens. Agents find us via SKILL.md; the CLI + jq + stdout are the composition layer. See decisions/2026-04-20-skill-md-not-mcp.md.

The Deterministic Waterfall

  Attempt 1 — Zero-key stealth       (TLS+HTTP/2 impersonation + trafilatura + extruct)
            ↓  403 / challenge / timeout?
  Attempt 2 — Smart-proxy provider   (user-keyed, vendor-agnostic)
            ↓  still blocked?
  Attempt 3 — Direct-API provider    (user-keyed — Google Places, Yelp Fusion, YouTube)
            ↓
  { status, data, provenance, error?, suggestion? }

Every attempt maps to the same Pydantic schema. Downstream pipelines never branch on which attempt succeeded — they branch on the envelope's status: ok | partial | degraded.

On full block with no Attempt-2/3 providers configured:

{
  "status": "partial",
  "data": { "site": "example.com", "fetched_at": "...", "pages": null, "reviews": null, ... },
  "provenance": { "site_text_trafilatura": { "status": "failed", "error": "blocked_by_antibot (HTTP 403)", ... } },
  "error": "blocked_by_antibot",
  "suggestion": "configure a smart-proxy provider key or skip this prospect"
}

Never raises. Never crashes your pipeline. Every run comes back well-formed.

See docs/ARCHITECTURE.md for the full picture and docs/ZERO-KEY.md for honest anti-bot scoping.

Zero-key coverage — honestly

Site class Zero-key outcome
Small-biz WordPress / Squarespace / Wix / Webflow / agency custom Full payload. Expected status: "ok" on the majority of prospects in this segment.*
Cloudflare Turnstile / DataDome / Akamai / PerimeterX Often blocked. Returns status: "partial" with actionable suggestion.
JS-heavy SPAs needing a real browser HTML shell only. Render-dependent fields come back null. Configure a smart-proxy provider to fill the gap.
Aggregator pages (Yelp / Houzz / G2 / Birdeye) Not the target — use the direct-API providers (reviews_google_places, reviews_yelp_fusion) instead.

* Exact coverage number lands in docs/ZERO-KEY.md after the M1 stealth-fetcher spike against the 30-prospect fixtures corpus. Numbers come from measurement, not marketing.

Brains-and-muscles pipe

companyctx acme-bakery.com --json \
  | jq '.data | {site, signals, reviews}' \
  | claude -p "write a 6-section outreach brief from this context"
import json, subprocess
ctx = json.loads(subprocess.check_output(["companyctx", "acme-bakery.com", "--json"]))
if ctx["status"] == "partial":
    print(f"heads up: {ctx['error']}{ctx['suggestion']}")
brief = synthesize(ctx["data"])   # your synthesis call, your prompts, your weights

companyctx never calls an LLM. The brain upstream decides what the context means.

Install (during v0.1 dev)

git clone https://github.com/dmthepm/companyctx.git
cd companyctx
pip install -e ".[dev,extract,reviews,youtube]"
companyctx --help

Once v0.1.0 ships:

pipx install companyctx
companyctx fetch example.com --mock --json

Design invariants

  • Schema is the product. Providers are replaceable; the CompanyContext envelope is not. Raw observations only — inference lives in the downstream synthesis layer.
  • Graceful-partial always. Providers never raise uncaught. Every failure maps to ProviderRunMetadata.status per provider and the top-level status on the envelope.
  • Vertical Memory. Every run persists the full normalized payload to SQLite under XDG paths. --refresh forces a re-fetch; --from-cache is a cache-only read. A companyctx query ... DSL on the cache is v0.2 scope, not v0.1.
  • Provider pluggability. Every deterministic call class is discovered via Python entry points (companyctx.providers). Day-one providers include bus-factor fallbacks (trafilatura + readability-lxml both wired for site text). See docs/PROVIDERS.md.
  • robots.txt respected by default. --ignore-robots is an explicit CLI-only flag; never settable via TOML or env.
  • Deterministic mocks. fixtures/<site>/ drives --mock; re-runs produce byte-identical output modulo fetched_at.

Providers (committed for v0.1)

Slug Layer Category Key Cost
site_text_trafilatura Zero-key site_text free
site_text_readability Zero-key site_text (fallback) free
site_meta_extruct Zero-key site_meta free
social_discovery_site Zero-key social_discovery free
signals_site_heuristic Zero-key signals free
reviews_google_places Direct-API reviews GOOGLE_PLACES_API_KEY per-1k
reviews_yelp_fusion Direct-API reviews YELP_API_KEY per-call
social_counts_youtube Direct-API social_counts YOUTUBE_API_KEY free w/ quota
mentions_brave_stub Direct-API mentions BRAVE_SEARCH_API_KEY per-call

Full table + SmartProxyProvider interface in docs/PROVIDERS.md.

Layout

companyctx/            # package
  cli.py               # Typer app
  schema.py            # pydantic v2 models — the JSON contract
  config.py            # pydantic-settings + TOML, XDG-compliant paths
  cache.py             # SQLite fetch cache (Vertical Memory)
  http.py              # stealth fetcher foundation
  robots.py            # robots.txt enforcement
  providers/
    __init__.py        # plugin loader (importlib.metadata.entry_points)
    base.py            # ProviderBase, ProviderError, ProviderRunMetadata
SKILL.md               # ~150-token agent-discovery surface (not MCP)
docs/
  SPEC.md              # frozen v0.1 spec snapshot
  SCHEMA.md            # Pydantic envelope in detail
  ARCHITECTURE.md      # brains-and-muscles + Deterministic Waterfall + Vertical Memory
  ZERO-KEY.md          # honest anti-bot coverage + graceful-partial contract
  PROVIDERS.md         # provider list + SmartProxyProvider interface
  VALIDATION.md        # two-phase acceptance protocol
  REFERENCES.md        # upstream OSS deps
decisions/             # in-repo ADRs (walks-the-walk for OSS readers)
fixtures/              # per-site raw HTML + API responses + expected.json
tests/                 # pytest, hypothesis where useful

Contributing

See CONTRIBUTING.md. Conventional commits, ~400 LOC per PR, one PR per milestone or provider. ruff + mypy strict + pytest ≥70% cov. Architecture-shape changes go through decisions/ first.

License

MIT. Copyright 2026 Noontide Collective LLC.

Support

companyctx is open source and free to use. If it earns a place in your pipeline, consider supporting development by joining Noontide's Main Branch community on Skool (there's a free trial): https://skool.com/main.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

companyctx-0.1.0.dev0.tar.gz (19.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

companyctx-0.1.0.dev0-py3-none-any.whl (15.4 kB view details)

Uploaded Python 3

File details

Details for the file companyctx-0.1.0.dev0.tar.gz.

File metadata

  • Download URL: companyctx-0.1.0.dev0.tar.gz
  • Upload date:
  • Size: 19.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for companyctx-0.1.0.dev0.tar.gz
Algorithm Hash digest
SHA256 5701e9c6952892849448f5555ecea6dabce0f4a8180af6ff5c7d0aa2d48cc190
MD5 2a442d0e26305f444495ed8528a41e10
BLAKE2b-256 efd636ac2326e1496086a7acb272a3acf67f804bdb8887cce2fb480af688b725

See more details on using hashes here.

Provenance

The following attestation bundles were made for companyctx-0.1.0.dev0.tar.gz:

Publisher: publish.yml on dmthepm/companyctx

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file companyctx-0.1.0.dev0-py3-none-any.whl.

File metadata

  • Download URL: companyctx-0.1.0.dev0-py3-none-any.whl
  • Upload date:
  • Size: 15.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for companyctx-0.1.0.dev0-py3-none-any.whl
Algorithm Hash digest
SHA256 76b67f5f5bb3fdf1fb31cd2983768ff641ad0747c9add358b7c370e4425891ca
MD5 4684f5e524d10268a4629628668cfd00
BLAKE2b-256 bf96a83623ea0e29e3f27b6bd333478e20bf63b79d402ddf40e52b736d0af2bf

See more details on using hashes here.

Provenance

The following attestation bundles were made for companyctx-0.1.0.dev0-py3-none-any.whl:

Publisher: publish.yml on dmthepm/companyctx

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page