Keep a sprawling repo telling one story: deterministic codename-leak lint + semantic retrieval over a repo's prose.

These details have not been verified by PyPI

Project description

Concord

Concord leak guard tests license

Keep a sprawling repo telling one story.

Concord indexes the prose in a repository — docs, marketing copy, specs, READMEs — and lets you ask it three kinds of question:

Lint — "does any internal codename / retired term / banned phrase appear in a file that ships publicly?" Deterministic, exact-match, recall‑complete on a known list. Runs in CI or a pre-commit hook.
Find — "where else do we say something like this?" Exact and semantic matches in one ranked result, so it catches paraphrases a grep would miss.
Read — "summarise everything we've said about X, and flag where it contradicts itself." Retrieval-first, so only the relevant passages are pulled into context instead of whole files.

Concord is computed, not generated. The lint is regex. The ranking is geometry (cosine + an elbow cutoff). The only place a language model enters is the optional final synthesis of retrieved passages — and even that step is handed only the passages Concord selected, which is where the token savings come from.

Why it exists

Two failure modes plague any repo where strategy, internal notes, and public-facing copy live side by side:

Leaks — an internal codename or a retired product name slips into a published page.
Drift — the same fact (a price, a policy, a product name) is stated three different ways across three files, and nobody notices.

A plain grep catches neither paraphrases nor contradictions. A vector search alone is fuzzy and misses exact strings. Concord runs both signals together.

Token efficiency

Concord earns its keep on the synthesis step: it hands a model only the passages that matter, not the whole repo. Measured on this project's own documentation (a 14,551-passage corpus), answering "find contradictory pricing information" (token counts are a chars/4 estimate):

Approach	Tokens into context	Gives you the conflicting sentences?
Read the whole directory	~1,800,000	Yes — but it won't fit most context windows, and you pay for all of it on every query.
graphify (concept graph)	~1,600	No — returns concept nodes + file pointers, zero verbatim prices. Tells you what relates to pricing, not where the numbers disagree; you still have to open the files.
Concord (passage retrieval)	~190	Yes — the actual price statements, cited to `file:line`.

graphify and Concord are complementary, not competitors: graphify maps how concepts connect; Concord retrieves the verbatim prose where a claim lives and where it conflicts. For "show me the contradictory pricing," you need the passages — which is why graphify alone isn't enough.

Honest caveat — completeness queries. These numbers are for targeted questions. For "find all X" sweeps (e.g. "every GDPR commitment"), a small top-k with an aggressive cutoff under-retrieves: it can return four near-identical clauses and miss the scattered rest. That's a recall-vs-tokens trade, and it's exactly where a topic/cluster index helps (see Roadmap). Concord prints what it retrieved so the gap is visible, never hidden.

Updating: only what changed

The index records the commit it was built at (.concord/meta.json) and a content-hash manifest (.concord/manifest.json). concord update re-embeds only the diff:

In a git repo: asks git what changed since the indexed commit (or just HEAD~1..HEAD with --last-commit, for a post-commit hook).
Outside git (--no-git, or a non-git folder): diffs the content-hash manifest, so a real edit re-embeds and a bare touch does not.

Either way, cost scales with the diff, not the corpus.

In CI — the leak guard + badge

Fail the build if a codename reaches a public file, and stamp a badge on your README:

# .github/workflows/leak-guard.yml
on: [push, pull_request]
jobs:
  leak-guard:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with: { python-version: "3.12" }
      - uses: linnetlabs/concord@v1     # the reusable action
        with: { scope: public }

concord badge .    # -> ![Concord](https://img.shields.io/badge/concord-0%20leaks-brightgreen)

Find drift across history

concord radar .                 # value-conflict candidates (same topic, different number)
concord drift "$49"             # which commits changed this value (git pickaxe)

The driver model

Concord's core is a set of deterministic primitives. Who drives the loop is pluggable:

Driver	Surface	Relevance judge
Human	`concordai` (Python CLI), live explorer (`concord ui`)	geometry, or your eyes
Agent	Claude skill / MCP server	the model

Same engine underneath. A human sits in the seat an agent would otherwise occupy.

Install

pip install concord-ai              # lint + exact find (no ML dependencies)
pip install "concord-ai[embeddings]"  # + sentiment.ai embedder for semantic find / read

Embeddings come from sentiment.ai — its sibling package — so Concord inherits a local, auditable, provenance-tracked embedder (e5 on-device by default) rather than calling a hosted API. sentiment.ai is the only embedding backend: Concord never silently swaps in a different model, because that would make a result look the same while being incomparable.

Quickstart

concord init   .                           # scaffold rules.yaml + gitignore it and .concord/
concord lint   .                           # fail CI if a banned term reaches a public file
concord index  .                           # build the semantic index (self-ignored)
concord find   "founding-free pricing"     # exact + semantic hits, cited to file:line
concord read   "what have we said about pricing?"   # retrieve the relevant passages
concord radar  . --verify                  # find contradictions; --verify lets an LLM confirm + name the canonical value
concord resolve .                          # walk confirmed contradictions and apply the fix (interactive; --apply = auto)
concord report . --out report.html         # shareable consistency report (lint + radar)
concord drift  "$49"                       # which commits changed a value (git pickaxe)
concord topics .                           # annotated topic map (browse; --samples to name them)
concord ui     .                           # premium live explorer in your browser (search · topics · radar)

AI is optional — and it's your key

Everything core is free and deterministic: lint, find, index, topics, radar candidates, report. The optional LLM steps — radar --verify, resolve, and naming topics in the explorer — call your own API key (you pay for usage), and the tool is explicit about it everywhere (a status pill, cost tooltips, CLI notes).

Set any of ANTHROPIC_API_KEY (preferred — the better judge), OPENAI_API_KEY, DEEPSEEK_API_KEY, GROQ_API_KEY, MISTRAL_API_KEY, OPENROUTER_API_KEY, GEMINI_API_KEY. The explorer's ⚙ picks among the keys you actually have.
CONCORD_NO_LLM=1 turns AI off entirely; CONCORD_LLM=<provider> forces one.
No key? Everything except verify / resolve / AI-naming still works.

Your real ruleset stays private — enforced, not trusted. concord init copies rules.example.yaml to rules.yaml and adds rules.yaml, *.local.yaml, and .concord/ to your repo's .gitignore. The built index writes its own .concord/.gitignore too. A tool that prevents codename leaks must not leak the codenames — so it makes them uncommittable for you.

Status

Early scaffold. lint works today (no ML required). Semantic find / read and the benchmark harness are in progress. See eval/README.md for the benchmark design (seed-efficiency, stopping-strategy, token-efficiency).

MIT licensed. A Linnet Labs project.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.0

Jun 10, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

concord_ai-0.1.0.tar.gz (54.3 kB view details)

Uploaded Jun 10, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

concord_ai-0.1.0-py3-none-any.whl (56.3 kB view details)

Uploaded Jun 10, 2026 Python 3

File details

Details for the file concord_ai-0.1.0.tar.gz.

File metadata

Download URL: concord_ai-0.1.0.tar.gz
Upload date: Jun 10, 2026
Size: 54.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for concord_ai-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`c003c50c7dba2db271f7129394df82e0a6fcc175708d7d845682671ed20681eb`
MD5	`164f21c7cad4243580e3cd2881e2e164`
BLAKE2b-256	`c087c5d636bb0bed005af07d00d2e46f7a93eadf2a4aa96d518aff011588d6d9`

See more details on using hashes here.

File details

Details for the file concord_ai-0.1.0-py3-none-any.whl.

File metadata

Download URL: concord_ai-0.1.0-py3-none-any.whl
Upload date: Jun 10, 2026
Size: 56.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for concord_ai-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`87a240fc1884481f5f42fe5e968d0c6111da1bd2397a0a3996d9c17f0333125a`
MD5	`8cc75def870a186942b976d715b4b91e`
BLAKE2b-256	`00ff064f02c4717c7ba3c6d0de42c9df46b9bee3beb45c9c6c68d58cf1bf570a`

See more details on using hashes here.

concord-ai 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

Concord

Why it exists

Token efficiency

Updating: only what changed

In CI — the leak guard + badge

Find drift across history

The driver model

Install

Quickstart

AI is optional — and it's your key

Status

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes