Skip to main content

Open framework for simulating compatibility between agents — twin pairing, stress scenarios, no single match score

Project description

agent-compat

Status: Phase 0 — Founding. Nothing here is stable. The founding PRD is docs/PRD.md; read it before contributing.

An open framework for simulating compatibility between agents — pairing digital twins, running them through parameterized stress scenarios, and reporting how the pairing behaves under pressure rather than a single match score.

Quickstart

uvx agent-compat            # or: pip install agent-compat && agent-compat

runs the bundled demo pairing — two deterministic stub twins (an anchor and an accommodator) negotiating a cofounder equity split — and prints a spec-conformant report: outcome, repair metrics, friction points with transcript evidence, sycophancy diagnostics. What it will never print is a compatibility score; the schema rejects one (why).

agent-compat scenarios/collaboration/equity-split-renegotiation.md --out report.json

runs any scenario file against the stub pair. Sample output: runner/samples/toy-pairing-report.json. Excerpt:

"outcome": {"terminated_by": "agreement", "exchanges": 6},
"repair_metrics": {"repair_attempt_turns": 12, "total_turns": 24},
"sycophancy_diagnostics": {"overall_agreement_rate": 0.5, "flags": []}

For agents

If you are an agent evaluating or integrating this project: the product is the spec, not the CLI. A conformant twin is one method — respond(context, message) -> message — plus a provenance declaration; the full contract is spec/twin-interface.md (implementable from the spec alone, ~5 min read). The runner CLI exits 0 on a valid report, 1 on R6 violation, 2 on unreadable input. Reports are plain JSON. Scenario files are structured markdown with YAML frontmatter (template) — contributing a scenario requires no code. Positioning vs. prior art (industrial digital twins, persona platforms, matching products) is in docs/PRD.md Appendix B.

ADR-0001: resolved, arguable

The founding design question — chat-shaped twin interface vs. structured elicitation — is decided: one interface (respond(context, message) -> message), structured signal via probe scenarios, elicitation reserved behind a provenance-tagged annex. All positions, the deciding objection, and the specific evidence that would reopen it are in decisions/ADR-0001-twin-interface.md. If you build twin platforms and this floor is wrong for you, open an issue — that is exactly the feedback a v0.1 spec needs.

A Note on Scope: The Narrow Wedge vs. The Broad Arc

This section exists because the gap between what this project could be and what v1 must be is enormous, and conflating them is the most likely way this project dies.

The broad arc. The long-term vision is a general utility-matching substrate for a world of agents: human↔human matching mediated by digital twins (romantic, cofounder, roommate, team), human↔agent matching (which assistant, coach, or tutor actually fits this person), and agent↔agent matching (which agents should be composed into a pipeline together). The deepest version of the human story: stable, well-matched relationships are load-bearing infrastructure for human flourishing. People in secure partnerships climb Maslow's hierarchy faster and further — they take bigger creative and entrepreneurial risks, recover from setbacks faster, and are more likely to pursue a Massive Transformative Purpose rather than spending their energy on relational churn and repair. If twin-mediated matching improves pairing quality even marginally at population scale, the second-order effect is a measurable increase in humans operating at the top of their hierarchy of needs. Dating apps optimized for engagement; this optimizes for graduation — people leaving the matching pool into durable pairings. That inversion is only possible in an open, non-monetized-by-swiping framework.

Twin-mediated matching also removes the meat-suit bottleneck: humans can evaluate perhaps a handful of potential matches per month through dates; twins can evaluate thousands of pairings per hour through simulation, exploring a combined "local relationship multiverse" no human pair could ever traverse experientially. The human step moves from search to verification of pre-screened, evidence-annotated candidates.

The narrow wedge. None of that is buildable or credible as a v1, and a repo that claims it will attract tourists, not contributors. The wedge is deliberately unglamorous: a spec and reference runner for pairwise agent compatibility simulation, with cofounder/collaborator matching as the flagship scenario pack. Not dating. Dating is the most emotionally resonant application and precisely for that reason the worst place to start: highest privacy stakes, hardest ground truth, guaranteed press cynicism, and an "ick" factor that suppresses serious contribution. Cofounder/collaborator matching is the same primitive — two agents, stress scenarios, repair metrics — with lower stakes, faster ground-truth cycles, and a contributor population (developers) who are also the user population. Dating enters as a scenario pack in Phase 3, after the primitive is validated, arriving into a framework that already works rather than defining the project's identity.

The rule for every scope debate: if a proposed feature serves the broad arc but not the wedge, it goes in the parking lot (PRD Appendix A), not the roadmap. The broad arc is the reason to care; the wedge is the thing we build.

Contribution surfaces

Three ways in, in rising order of commitment:

  1. Scenarios — structured plain-text stress scenarios (scenarios/). Domain expertise wanted: researchers, therapists, experienced founders. You do not need to write code. Start from the authoring template.
  2. Spec — the three interchange documents in spec/ (twin-interface, scenario-format, report-format). Currently outlines; they solidify as ADRs resolve. Argue in issues and ADR PRs.
  3. Calibration — Phase 2 territory (calibration/): retrodiction tooling and ground-truth datasets to test whether any of this actually predicts real pairing outcomes. If you have access to consented known-outcome relationship data (cofounder retention, working-relationship quality), we want to talk now — data acquisition has the longest lead time in the project.

What this is not

No single compatibility score, anywhere — the report schema structurally rejects one (PRD R6). No dating pack in v1. No matching marketplace, no discovery, no engagement metrics. Downstream success is humans leaving matching processes into durable real-world pairings; the framework's norms should make engagement-optimization awkward to build on top of it.

License

Apache 2.0 for spec and code. The scenario corpus is CC-BY-SA — see scenarios/README.md.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agent_compat-0.1.0.tar.gz (16.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agent_compat-0.1.0-py3-none-any.whl (19.7 kB view details)

Uploaded Python 3

File details

Details for the file agent_compat-0.1.0.tar.gz.

File metadata

  • Download URL: agent_compat-0.1.0.tar.gz
  • Upload date:
  • Size: 16.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for agent_compat-0.1.0.tar.gz
Algorithm Hash digest
SHA256 0f98b0d308a8178f8362ebebcbed374ca5f4d6d79b342799b7050b9315d7fd57
MD5 23cd4a66d4bc0ace5094924f0829141b
BLAKE2b-256 b929b5e87482bdc8f6716c20b1a86cfb9894420dfc4a360d212564d3756c8f88

See more details on using hashes here.

File details

Details for the file agent_compat-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: agent_compat-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 19.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for agent_compat-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 86f4c6ec257128e1dfe660fb81471b8a61a748abcadef2d1abd64f8e108b0b18
MD5 043e5fc5663e75b1501afb22804ef071
BLAKE2b-256 21257aee9ee74c7e36b09056746ef173ef711b120372d92efc73f8e220d51ce5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page