Keep a sprawling repo telling one story: deterministic codename-leak lint + semantic retrieval over a repo's prose.
Project description
Concord
Keep a sprawling repo telling one story.
Concord indexes the prose in a repository — docs, marketing copy, specs, READMEs — and lets you ask it three kinds of question:
- Lint — "does any internal codename / retired term / banned phrase appear in a file that ships publicly?" Deterministic, exact-match, recall‑complete on a known list. Runs in CI or a pre-commit hook.
- Find — "where else do we say something like this?" Exact and semantic
matches in one ranked result, so it catches paraphrases a
grepwould miss. - Read — "summarise everything we've said about X, and flag where it contradicts itself." Retrieval-first, so only the relevant passages are pulled into context instead of whole files.
Concord is computed, not generated. The lint is regex. The ranking is geometry (cosine + an elbow cutoff). The only place a language model enters is the optional final synthesis of retrieved passages — and even that step is handed only the passages Concord selected, which is where the token savings come from.
Why it exists
Two failure modes plague any repo where strategy, internal notes, and public-facing copy live side by side:
- Leaks — an internal codename or a retired product name slips into a published page.
- Drift — the same fact (a price, a policy, a product name) is stated three different ways across three files, and nobody notices.
A plain grep catches neither paraphrases nor contradictions. A vector search alone
is fuzzy and misses exact strings. Concord runs both signals together.
Token efficiency
Concord earns its keep on the synthesis step: it hands a model only the passages that matter, not the whole repo. Measured on this project's own documentation (a 14,551-passage corpus), answering "find contradictory pricing information" (token counts are a chars/4 estimate):
| Approach | Tokens into context | Gives you the conflicting sentences? |
|---|---|---|
| Read the whole directory | ~1,800,000 | Yes — but it won't fit most context windows, and you pay for all of it on every query. |
| graphify (concept graph) | ~1,600 | No — returns concept nodes + file pointers, zero verbatim prices. Tells you what relates to pricing, not where the numbers disagree; you still have to open the files. |
| Concord (passage retrieval) | ~190 | Yes — the actual price statements, cited to file:line. |
graphify and Concord are complementary, not competitors: graphify maps how concepts connect; Concord retrieves the verbatim prose where a claim lives and where it conflicts. For "show me the contradictory pricing," you need the passages — which is why graphify alone isn't enough.
Honest caveat — completeness queries. These numbers are for targeted questions. For "find all X" sweeps (e.g. "every GDPR commitment"), a small top-k with an aggressive cutoff under-retrieves: it can return four near-identical clauses and miss the scattered rest. That's a recall-vs-tokens trade, and it's exactly where a topic/cluster index helps (see Roadmap). Concord prints what it retrieved so the gap is visible, never hidden.
Updating: only what changed
The index records the commit it was built at (.concord/meta.json) and a
content-hash manifest (.concord/manifest.json). concord update re-embeds only the
diff:
- In a git repo: asks git what changed since the indexed commit (or just
HEAD~1..HEADwith--last-commit, for a post-commit hook). - Outside git (
--no-git, or a non-git folder): diffs the content-hash manifest, so a real edit re-embeds and a baretouchdoes not.
Either way, cost scales with the diff, not the corpus.
In CI — the leak guard + badge
Fail the build if a codename reaches a public file, and stamp a badge on your README:
# .github/workflows/leak-guard.yml
on: [push, pull_request]
jobs:
leak-guard:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with: { python-version: "3.12" }
- uses: linnetlabs/concord@v1 # the reusable action
with: { scope: public }
concord badge . # -> 
Find drift across history
concord radar . # value-conflict candidates (same topic, different number)
concord drift "$49" # which commits changed this value (git pickaxe)
The driver model
Concord's core is a set of deterministic primitives. Who drives the loop is pluggable:
| Driver | Surface | Relevance judge |
|---|---|---|
| Human | concordai (Python CLI), live explorer (concord ui) |
geometry, or your eyes |
| Agent | Claude skill / MCP server | the model |
Same engine underneath. A human sits in the seat an agent would otherwise occupy.
Install
pip install concord-ai # lint + exact find (no ML dependencies)
pip install "concord-ai[embeddings]" # + sentiment.ai embedder for semantic find / read
Embeddings come from sentiment.ai — its sibling package — so Concord inherits a local, auditable, provenance-tracked embedder (e5 on-device by default) rather than calling a hosted API. sentiment.ai is the only embedding backend: Concord never silently swaps in a different model, because that would make a result look the same while being incomparable.
Quickstart
concord init . # scaffold rules.yaml + gitignore it and .concord/
concord lint . # fail CI if a banned term reaches a public file
concord index . # build the semantic index (self-ignored)
concord find "founding-free pricing" # exact + semantic hits, cited to file:line
concord read "what have we said about pricing?" # retrieve the relevant passages
concord radar . --verify # find contradictions; --verify lets an LLM confirm + name the canonical value
concord resolve . # walk confirmed contradictions and apply the fix (interactive; --apply = auto)
concord report . --out report.html # shareable consistency report (lint + radar)
concord drift "$49" # which commits changed a value (git pickaxe)
concord topics . # annotated topic map (browse; --samples to name them)
concord ui . # premium live explorer in your browser (search · topics · radar)
AI is optional — and it's your key
Everything core is free and deterministic: lint, find, index, topics, radar candidates, report.
The optional LLM steps — radar --verify, resolve, and naming topics in the explorer — call your own
API key (you pay for usage), and the tool is explicit about it everywhere (a status pill, cost tooltips,
CLI notes).
- Set any of
ANTHROPIC_API_KEY(preferred — the better judge),OPENAI_API_KEY,DEEPSEEK_API_KEY,GROQ_API_KEY,MISTRAL_API_KEY,OPENROUTER_API_KEY,GEMINI_API_KEY. The explorer's ⚙ picks among the keys you actually have. CONCORD_NO_LLM=1turns AI off entirely;CONCORD_LLM=<provider>forces one.- No key? Everything except verify / resolve / AI-naming still works.
Your real ruleset stays private — enforced, not trusted.
concord initcopiesrules.example.yamltorules.yamland addsrules.yaml,*.local.yaml, and.concord/to your repo's.gitignore. The built index writes its own.concord/.gitignoretoo. A tool that prevents codename leaks must not leak the codenames — so it makes them uncommittable for you.
Status
Early scaffold. lint works today (no ML required). Semantic find / read and the
benchmark harness are in progress. See eval/README.md for the
benchmark design (seed-efficiency, stopping-strategy, token-efficiency).
MIT licensed. A Linnet Labs project.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file concord_ai-0.1.0.tar.gz.
File metadata
- Download URL: concord_ai-0.1.0.tar.gz
- Upload date:
- Size: 54.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c003c50c7dba2db271f7129394df82e0a6fcc175708d7d845682671ed20681eb
|
|
| MD5 |
164f21c7cad4243580e3cd2881e2e164
|
|
| BLAKE2b-256 |
c087c5d636bb0bed005af07d00d2e46f7a93eadf2a4aa96d518aff011588d6d9
|
File details
Details for the file concord_ai-0.1.0-py3-none-any.whl.
File metadata
- Download URL: concord_ai-0.1.0-py3-none-any.whl
- Upload date:
- Size: 56.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
87a240fc1884481f5f42fe5e968d0c6111da1bd2397a0a3996d9c17f0333125a
|
|
| MD5 |
8cc75def870a186942b976d715b4b91e
|
|
| BLAKE2b-256 |
00ff064f02c4717c7ba3c6d0de42c9df46b9bee3beb45c9c6c68d58cf1bf570a
|