gcontext — file-based knowledge protocol for AI agents
Project description
gcontext
Version-controlled context modules your AI agent navigates itself.
Your agent's knowledge and its working state as plain markdown in git: modular, reviewable, loaded per task. No embeddings, no memory layer, no new runtime. Works with Claude Code, Cursor, Codex, and anything else that reads files.
Modules don't just describe your stack. They carry the credentials and the know-how for the agent to act on it: query the database, hit the Stripe API, run the deploy.
And work in flight survives the session: start a migration today, get blocked waiting on an email, come back Thursday and say "continue". The agent kept track of where things stand, so you don't have to.
A real session, replayed: one prompt builds a Supabase integration, then a fresh session answers "how many users do we have?" with the live number. Unedited transcript · Do it yourself in three copy-pastes
The problem
AI agents fail for the same reason large codebases fail: implicit state.
Prompts become undocumented architecture. Instructions drift between conversations. Context duplicates across files. Every session starts with "let me remind you how our deploy works."
And the most implicit state of all is the work in flight. The migration you started Monday, the blocker you're waiting on, the decision you made last week: none of it survives the session. The agent forgets, so you keep track, and every conversation starts with a re-briefing.
This isn't a model problem. It's an engineering problem.
Most teams solve it by writing longer prompts, pasting more docs, or hoping the agent remembers. That works until it doesn't, and it stops working fast.
gcontext treats context as infrastructure.
Before and after
Without structured context:
- Giant system prompts nobody maintains
- Copy-pasted docs that go stale
- "Remember, our Stripe webhook is at..." every session
- You're the project manager for your agent: re-briefing it on where every piece of work stands, every session
- Agent hallucinates because it can't find what it needs
- Context bloat kills quality on long tasks
- Tribal knowledge lives in one person's prompt history
With gcontext:
modules-repo/
stripe/ → API keys, webhooks, how to query invoices
postgres/ → schema, migrations, connection details
deploy-pipeline/ → step-by-step release to production
migrate-to-paddle/ → task: where it stands, what's blocked, what's next
Each module is a self-contained unit of context. Load what the task needs, unload what it doesn't. The agent navigates to what's relevant. Nothing else enters the window.
Install
curl -LsSf https://gcontext.ai/gcontext/install.sh | sh
# or
uv tool install gcontext-ai # PyPI name is gcontext-ai; the command is `gcontext`
# or
pip install gcontext-ai
That's the whole footprint: no account, no server, no vector database. Your keys stay in your local .env; modules are just files in your repo.
Quick start
Three copy-pastes: one command, two lines in .env, one prompt. Your agent does the rest.
1. Initialize
gcontext init
The only command you have to run. It creates the workspace your agent operates from:
AGENTS.md # auto-loaded by your agent; points it at context/system.md
CLAUDE.md # one line: @AGENTS.md (Claude Code reads this file)
context/ # what the agent reads: generated llms.txt index + loaded modules
modules-repo/ # source of truth: your modules
It never overwrites anything: re-running it in an existing workspace errors out.
2. Put a key in .env
SUPABASE_URL=https://your-project.supabase.co
SUPABASE_SECRET_KEY=sb_secret_...
Module files only ever name the variables. The values stay in the gitignored .env. Prefer read-only credentials where the service offers them.
3. Say the prompt
Open your agent (Claude Code, Cursor, Codex) in the workspace and ask for what you have:
Create a supabase integration module for our Supabase project and load it
into the workspace. The keys are in .env as SUPABASE_URL and SUPABASE_SECRET_KEY.
The agent builds and loads the module itself: info.md with the API know-how, llms.txt as the index, module.yaml declaring the secrets by name only. The exact prompt we test against a real model before every release is in the first-integration walkthrough.
Starting from zero instead? Say Set up gcontext for this project and the agent interviews you (what do you re-explain every session? what are you working on?), then builds your first modules from the answers.
4. Ask a real question in a fresh session
How many users do we have?
A new conversation that has never seen your project follows the index to the module, calls the API with the key from .env, and answers with the live row count. That's what happens in the recorded session, where the answer is then verified against the database independently.
5. Hand it the work in flight
We're migrating auth to OAuth. Track it as a task.
The agent creates a task module (brief.md for the goal, status.md for progress and blockers) and keeps it current as it works. Days later, in a fresh conversation, "continue the oauth migration" is all the briefing it needs.
This quick start is tested. The prompt in step 3 is extracted from examples/first-integration.md and run against a real model before releases, and the full unedited transcript of one run is public: every tool call, the module files the agent wrote, the verified answer. If the quick start stops working, the release does not ship.
Prefer doing things by hand? Every step is also a deterministic command: gcontext new, gcontext load, gcontext ls, gcontext validate. See Commands. The agent uses the same files either way.
Your agent doesn't just know your stack. It operates it.
An integration module is three things: what the service is (info.md), how to navigate it (llms.txt), and which secret it needs (module.yaml, name only). That turns out to be everything an agent needs to make real calls: no MCP server, no tool definitions, no plugin to install.
$ claude → "Refund the duplicate charge for customer@acme.com"
Reads stripe/info.md → knows refunds need the charge id, not the intent
Looks up the customer, finds two charges 41s apart
Refunds ch_3Oa2... ($49.00), refund re_3Oa2... succeeded
MCP gives agents tools; for most of your stack, a markdown file plus an env var does the same job, and you can git diff it. The same module that explains your Stripe setup is the one that lets the agent act on it, and what it learns doing the work gets written back into the module. (The exchange above is illustrative; for a real one, unedited, see the recorded session.)
Walk away mid-task. Come back days later.
The other half of the problem isn't knowledge, it's state. Real work gets interrupted: you wait on an email, a review, another team. Without gcontext, you re-brief the agent from scratch every time, and you keep the real status in your head. With a task module, the agent writes down where things stand as it works.
Monday $ claude → "Start the Stripe-to-Paddle migration"
Creates modules-repo/paddle-migration/ (a task module)
Maps price ids, exports products... blocked: Paddle support
must enable the sandbox. Writes the blocker to status.md.
Thursday $ claude → "Any movement on the migration?"
Reads paddle-migration/status.md
"Blocked on Paddle support since Monday (ticket #4821).
Price mapping is done. Next: webhook rewrite, once
sandbox access lands."
No re-briefing, no scrolling back through old chats, and the status was never in your head to begin with. The task module is the state: what's done, what's blocked, what's next. When the work ships, the task is done with its job: delete it, or keep it as the record of what happened.
How it works: navigation, not retrieval
Modules are independent units of knowledge: an API integration, a deployment procedure, a bounded piece of work. Each contains plain markdown and a navigation index (llms.txt) the agent uses to find what it needs.
agent reads system.md
└─ follows the llms.txt index
├─ stripe/llms.txt → "here's how Stripe works here"
├─ postgres/llms.txt → "here's the database"
└─ deploy/llms.txt → "here's how to ship"
└─ steps.md → the actual procedure
Only the index (one line per module) is always in context. Detail enters the window when the agent follows a link, fresh at the moment the task needs it. Unloaded modules cost zero tokens.
The mechanics are deliberately boring: gcontext load postgres creates a symlink context/postgres → modules-repo/postgres and regenerates the one-line-per-module index in context/llms.txt. unload removes the link. No copies, no database, no daemon: modules-repo/ is the only place content lives, and you can inspect every byte the agent could read by opening a folder.
You won't need unload for a while: with fewer than ~10 modules, keep everything loaded; the resident index costs about one line per module. Load/unload starts paying off when modules outnumber what fits comfortably.
What's actually on disk after init: the seeded example module's llms.txt:
# example
> A sample module demonstrating the context module structure
- [info.md](info.md): What this module contains and how to use it
- [module.yaml](module.yaml): Module configuration
That's the whole trick: an index file per module, one root index over the loaded set, and an AGENTS.md that tells the agent to start there.
The agent walks a knowledge tree; it doesn't search a vector space. You can't force a model to read anything, but you can make the relevant file one link away instead of buried at token 50,000, and you can see exactly what was available to it. You version-control your code and your docs. This is version control for what your agent knows.
Module kinds
Two kinds carry a workspace: what your agent knows, and what it's working on.
| Kind | What it captures | Lifecycle | Example |
|---|---|---|---|
| Integration | How to use an external service, API, or database | Permanent: lives as long as the service does | Stripe, Postgres, GitHub, Slack |
| Task | A bounded piece of work and where it stands | Disposable: done when the work is done | Fix billing bug, migrate auth, ship feature X |
Different shapes for different lifecycles. They coexist and compose. You never have to pick one: the agent chooses a kind when it creates a module; this table is for reading what it made.
What a workspace looks like
Software engineering team
modules-repo/
postgres/ → schema, connection, query patterns
github/ → repo structure, PR conventions, CI
deploy-pipeline/ → release steps, rollback procedures
fix-billing-bug/ → task: reproduce, investigate, fix, verify
The agent reads the database schema, understands CI, follows the deploy playbook, and tracks progress on the billing fix, all from structured context.
Support automation
modules-repo/
zendesk/ → API access, ticket categories, macros
stripe/ → subscription lookup, refund procedures
knowledge-base/ → product docs, known issues, FAQ
escalation/ → when and how to escalate
An agent triaging tickets reads the Zendesk integration, checks Stripe for billing context, references the KB, and follows escalation rules, without a 10,000-token system prompt.
Claude Code / Cursor setup
modules-repo/
codebase/ → architecture, conventions, key paths
cloudflare/ → DNS, workers, deployment targets
monitoring/ → Grafana dashboards, alert rules
ship-v2-auth/ → task: migrate auth with progress tracking
Point your coding agent at the workspace. It navigates to the module it needs per task: your conventions when writing code, your deploy integration when shipping, your monitoring setup when debugging production.
Onboarding your team
A colleague joining an existing workspace is the easiest path into gcontext: there is nothing to set up and nothing to learn:
git clone <your-repo> && cd <your-repo>
cp .env.example .env # if present, fill in your own credentials (gcontext env shows what's missing)
claude # or cursor, codex; the workspace tells the agent the rest
Their agent reads the same modules yours does: the schema notes, the deploy runbook, the gotchas your team already paid to learn. Your new hire's agent knows the codebase before they do.
Nobody hand-maintains this. The agent updates modules as a side effect of doing work (it fixed a deploy quirk, it writes the quirk down), and the humans review the diffs in PRs like any other change. If a module goes stale, git blame tells you when and why.
FAQ
Isn't this just AGENTS.md / CLAUDE.md with extra steps?
A flat instructions file (including Claude Code's @imports, which inline everything at session start) puts the whole thing in the window every session, and quality degrades as it grows. gcontext keeps the always-loaded part tiny (one line per module) and everything else behind links the agent follows on demand. The honest answer to "couldn't I hand-roll this with a docs/ folder?" is: yes, partially; gcontext is that convention made consistent and cheap. The tooling adds what a convention can't enforce: load/unload per task, the regenerated index, module kinds with different lifecycles, secrets declared by name and checked with gcontext env, and gcontext validate to catch broken links and missing files. And because it's a shared convention rather than your house style, the same workspace works identically across Claude Code, Cursor, and Codex.
Is this the llms.txt web standard?
Same filename, different job. The web proposal puts an index on public websites for crawlers. gcontext uses the same index shape inside your private repo, purely for your own agent at inference time. Nothing is published or exposed.
Don't agents just ignore context files anyway?
Instructions buried in a long monolithic prompt do decay; that's an argument against flat files, not against structure. You can't force a model to read anything; what you can do is keep the always-loaded part small and make the relevant file one link away, so it's read fresh at the moment the task touches it instead of sitting 50k tokens behind.
Isn't maintaining a folder of markdown a chore? Wikis die this way.
Wikis die because humans must maintain them on the side. Here the agent maintains the modules as a side effect of doing work (when a session surfaces a gotcha or a changed endpoint, it updates the module then and there), and the human's job shrinks to reviewing diffs. That review step is the point: it's the same control you already have over code the agent writes.
What does this cost in tokens?
The index is a few hundred tokens. Module detail enters the window only when navigated to. Unloaded modules: zero.
Won't better models make this unnecessary?
Better models still won't know your schema, your conventions, or the runbook you wrote last week. That knowledge has to live somewhere. gcontext's position is that it should live in git, where you can diff it, review it, and blame it.
What gcontext is not
- A vector database. No embeddings. The agent navigates a file tree, not a similarity search.
- A memory model. No implicit memory. Context is explicit, human-curated, version-controlled.
- A replacement for RAG. Complementary. gcontext structures the knowledge RAG can retrieve from.
- An agent framework. No runtime. Works with the agent you already use.
- Another orchestration layer. No pipelines, no runtime. Just structured, navigable knowledge.
Why the filesystem
"Why not just use a vector database / memory layer?"
| Filesystem (gcontext) | Vector DB / Memory | |
|---|---|---|
| Version control | git diff, git blame, full history |
Requires custom versioning |
| Inspectability | Open a folder, read the files | Query an API, decode embeddings |
| Determinism | Same files = same available context | Similarity search varies |
| Human readability | It's markdown | It's vectors |
| Composability | Load/unload modules like imports | Rebuild index on every change |
| Tooling | Works with every editor, CI, linter | Needs specialized tooling |
| Portability | Copy the folder | Export, migrate, re-index |
The filesystem is the most universal, inspectable, composable storage layer that exists. Your agent's context should be as maintainable as the code it operates on.
Commands
| Command | What it does |
|---|---|
gcontext init |
Create a new workspace (errors if one already exists) |
gcontext new <kind> <name> [summary] |
Scaffold a module |
gcontext load <name> [...] |
Activate modules in the workspace |
gcontext unload <name> |
Deactivate a module |
gcontext ls |
List all modules and their status |
gcontext env |
Check if required secrets are set |
gcontext validate [name] |
Verify module structure |
Works with
gcontext produces plain markdown with a navigable index. Any agent that reads files can use it. We use it daily with:
- Claude Code (via the generated
CLAUDE.md→AGENTS.md) - Cursor
- Codex
- pi.dev
Secrets
Modules can declare required environment variables. Values go in .env (gitignored). Run gcontext env to check what's missing.
Status
Current release: v0.2.2, early, small, and functional. The CLI surface (init, new, load/unload, ls, env, validate) is complete and covered by tests; the module format may still evolve before 1.0. The CLI is MIT and standalone. An optional hosted version with a web UI and built-in chat (gcontext Cloud) is in the works; the CLI does not depend on it.
Built by Bleak AI | gcontext.ai
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file gcontext_ai-0.2.3.tar.gz.
File metadata
- Download URL: gcontext_ai-0.2.3.tar.gz
- Upload date:
- Size: 38.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a25cb66f636652191e5d20d75821a34ed3db3c541cdc5802237b6d0891d72820
|
|
| MD5 |
931acddd6149c8df856c1de70c6bbc4a
|
|
| BLAKE2b-256 |
f80d1da9dbbe83e53641549f147c7390fc34db07ddd286385c219645376a3b00
|
File details
Details for the file gcontext_ai-0.2.3-py3-none-any.whl.
File metadata
- Download URL: gcontext_ai-0.2.3-py3-none-any.whl
- Upload date:
- Size: 31.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c18ec6a48872fc7df6c7224c27769488afb5eceadf0d9aeea84db67f36c8a4a7
|
|
| MD5 |
b36f61c5909b473257427038b99d4a7b
|
|
| BLAKE2b-256 |
3d18858d55e603b7199636028f3e247e7594a34d1e0d2132ef401cdb70294d53
|