Skip to main content

gcontext — file-based knowledge protocol for AI agents

Project description

gcontext

Version-controlled context modules your AI agent navigates itself.

PyPI License: MIT Python

Your agent's knowledge and its working state as plain markdown in git: modular, reviewable, loaded per task. No embeddings, no memory layer, no new runtime. Works with Claude Code, Cursor, Codex, and anything else that reads files.

Modules don't just describe your stack. They carry the credentials and the know-how for the agent to act on it: query the database, hit the Stripe API, run the deploy.

And work in flight survives the session: start a migration today, get blocked waiting on an email, come back Thursday and say "continue". The agent kept track of where things stand, so you don't have to.

gcontext demo

A real session, replayed: one prompt builds a Supabase integration, then a fresh session answers "how many users do we have?" with the live number. Unedited transcript · Do it yourself in three copy-pastes


The problem

AI agents fail for the same reason large codebases fail: implicit state.

Prompts become undocumented architecture. Instructions drift between conversations. Context duplicates across files. Every session starts with "let me remind you how our deploy works."

And the most implicit state of all is the work in flight. The migration you started Monday, the blocker you're waiting on, the decision you made last week: none of it survives the session. The agent forgets, so you keep track, and every conversation starts with a re-briefing.

This isn't a model problem. It's an engineering problem.

Most teams solve it by writing longer prompts, pasting more docs, or hoping the agent remembers. That works until it doesn't, and it stops working fast.

gcontext treats context as infrastructure.


Before and after

Without structured context:

  • Giant system prompts nobody maintains
  • Copy-pasted docs that go stale
  • "Remember, our Stripe webhook is at..." every session
  • You're the project manager for your agent: re-briefing it on where every piece of work stands, every session
  • Agent hallucinates because it can't find what it needs
  • Context bloat kills quality on long tasks
  • Tribal knowledge lives in one person's prompt history

With gcontext:

modules-repo/
  stripe/            → API keys, webhooks, how to query invoices
  postgres/          → schema, migrations, connection details
  deploy-pipeline/   → step-by-step release to production
  migrate-to-paddle/ → task: where it stands, what's blocked, what's next

Each module is a self-contained unit of context. Load what the task needs, unload what it doesn't. The agent navigates to what's relevant. Nothing else enters the window.


Install

curl -LsSf https://gcontext.ai/gcontext/install.sh | sh
# or
uv tool install gcontext-ai   # PyPI name is gcontext-ai; the command is `gcontext`
# or
pip install gcontext-ai

That's the whole footprint: no account, no server, no vector database. Your keys stay in your local .env; modules are just files in your repo.

Quick start

Three copy-pastes: one command, two lines in .env, one prompt. Your agent does the rest.

1. Initialize

gcontext init

The only command you have to run. It creates the workspace your agent operates from:

AGENTS.md            # auto-loaded by your agent; points it at context/system.md
CLAUDE.md            # one line: @AGENTS.md (Claude Code reads this file)
context/             # what the agent reads: generated llms.txt index + loaded modules
modules-repo/        # source of truth: your modules

It never overwrites anything: re-running it in an existing workspace errors out.

2. Put a key in .env

SUPABASE_URL=https://your-project.supabase.co
SUPABASE_SECRET_KEY=sb_secret_...

Module files only ever name the variables. The values stay in the gitignored .env. Prefer read-only credentials where the service offers them.

3. Say the prompt

Open your agent (Claude Code, Cursor, Codex) in the workspace and ask for what you have:

Create a supabase integration module for our Supabase project and load it
into the workspace. The keys are in .env as SUPABASE_URL and SUPABASE_SECRET_KEY.

The agent builds and loads the module itself: info.md with the API know-how, llms.txt as the index, module.yaml declaring the secrets by name only. The exact prompt we test against a real model before every release is in the first-integration walkthrough.

Starting from zero instead? Say Set up gcontext for this project and the agent interviews you (what do you re-explain every session? what are you working on?), then builds your first modules from the answers.

4. Ask a real question in a fresh session

How many users do we have?

A new conversation that has never seen your project follows the index to the module, calls the API with the key from .env, and answers with the live row count. That's what happens in the recorded session, where the answer is then verified against the database independently.

5. Hand it the work in flight

We're migrating auth to OAuth. Track it as a task.

The agent creates a task module (brief.md for the goal, status.md for progress and blockers) and keeps it current as it works. Days later, in a fresh conversation, "continue the oauth migration" is all the briefing it needs.


This quick start is tested. The prompt in step 3 is extracted from examples/first-integration.md and run against a real model before releases, and the full unedited transcript of one run is public: every tool call, the module files the agent wrote, the verified answer. If the quick start stops working, the release does not ship.

Prefer doing things by hand? Every step is also a deterministic command: gcontext new, gcontext load, gcontext ls, gcontext validate. See Commands. The agent uses the same files either way.


Your agent doesn't just know your stack. It operates it.

An integration module is three things: what the service is (info.md), how to navigate it (llms.txt), and which secret it needs (module.yaml, name only). That turns out to be everything an agent needs to make real calls: no MCP server, no tool definitions, no plugin to install.

$ claude → "Refund the duplicate charge for customer@acme.com"

Reads stripe/info.md → knows refunds need the charge id, not the intent
Looks up the customer, finds two charges 41s apart
Refunds ch_3Oa2... ($49.00), refund re_3Oa2... succeeded

MCP gives agents tools; for most of your stack, a markdown file plus an env var does the same job, and you can git diff it. The same module that explains your Stripe setup is the one that lets the agent act on it, and what it learns doing the work gets written back into the module. (The exchange above is illustrative; for a real one, unedited, see the recorded session.)


Walk away mid-task. Come back days later.

The other half of the problem isn't knowledge, it's state. Real work gets interrupted: you wait on an email, a review, another team. Without gcontext, you re-brief the agent from scratch every time, and you keep the real status in your head. With a task module, the agent writes down where things stand as it works.

Monday    $ claude → "Start the Stripe-to-Paddle migration"

          Creates modules-repo/paddle-migration/ (a task module)
          Maps price ids, exports products... blocked: Paddle support
          must enable the sandbox. Writes the blocker to status.md.

Thursday  $ claude → "Any movement on the migration?"

          Reads paddle-migration/status.md
          "Blocked on Paddle support since Monday (ticket #4821).
           Price mapping is done. Next: webhook rewrite, once
           sandbox access lands."

No re-briefing, no scrolling back through old chats, and the status was never in your head to begin with. The task module is the state: what's done, what's blocked, what's next. When the work ships, the task is done with its job: delete it, or keep it as the record of what happened.


How it works: navigation, not retrieval

Modules are independent units of knowledge: an API integration, a deployment procedure, a bounded piece of work. Each contains plain markdown and a navigation index (llms.txt) the agent uses to find what it needs.

agent reads system.md
  └─ follows the llms.txt index
       ├─ stripe/llms.txt    → "here's how Stripe works here"
       ├─ postgres/llms.txt  → "here's the database"
       └─ deploy/llms.txt    → "here's how to ship"
            └─ steps.md      → the actual procedure

Only the index (one line per module) is always in context. Detail enters the window when the agent follows a link, fresh at the moment the task needs it. Unloaded modules cost zero tokens.

The mechanics are deliberately boring: gcontext load postgres creates a symlink context/postgres → modules-repo/postgres and regenerates the one-line-per-module index in context/llms.txt. unload removes the link. No copies, no database, no daemon: modules-repo/ is the only place content lives, and you can inspect every byte the agent could read by opening a folder.

You won't need unload for a while: with fewer than ~10 modules, keep everything loaded; the resident index costs about one line per module. Load/unload starts paying off when modules outnumber what fits comfortably.

What's actually on disk after init: the seeded example module's llms.txt:

# example

> A sample module demonstrating the context module structure

- [info.md](info.md): What this module contains and how to use it
- [module.yaml](module.yaml): Module configuration

That's the whole trick: an index file per module, one root index over the loaded set, and an AGENTS.md that tells the agent to start there.

The agent walks a knowledge tree; it doesn't search a vector space. You can't force a model to read anything, but you can make the relevant file one link away instead of buried at token 50,000, and you can see exactly what was available to it. You version-control your code and your docs. This is version control for what your agent knows.

Module kinds

Two kinds carry a workspace: what your agent knows, and what it's working on.

Kind What it captures Lifecycle Example
Integration How to use an external service, API, or database Permanent: lives as long as the service does Stripe, Postgres, GitHub, Slack
Task A bounded piece of work and where it stands Disposable: done when the work is done Fix billing bug, migrate auth, ship feature X

Different shapes for different lifecycles. They coexist and compose. You never have to pick one: the agent chooses a kind when it creates a module; this table is for reading what it made.


What a workspace looks like

Software engineering team

modules-repo/
  postgres/          → schema, connection, query patterns
  github/            → repo structure, PR conventions, CI
  deploy-pipeline/   → release steps, rollback procedures
  fix-billing-bug/   → task: reproduce, investigate, fix, verify

The agent reads the database schema, understands CI, follows the deploy playbook, and tracks progress on the billing fix, all from structured context.

Support automation

modules-repo/
  zendesk/           → API access, ticket categories, macros
  stripe/            → subscription lookup, refund procedures
  knowledge-base/    → product docs, known issues, FAQ
  escalation/        → when and how to escalate

An agent triaging tickets reads the Zendesk integration, checks Stripe for billing context, references the KB, and follows escalation rules, without a 10,000-token system prompt.

Claude Code / Cursor setup

modules-repo/
  codebase/          → architecture, conventions, key paths
  cloudflare/        → DNS, workers, deployment targets
  monitoring/        → Grafana dashboards, alert rules
  ship-v2-auth/      → task: migrate auth with progress tracking

Point your coding agent at the workspace. It navigates to the module it needs per task: your conventions when writing code, your deploy integration when shipping, your monitoring setup when debugging production.


Onboarding your team

A colleague joining an existing workspace is the easiest path into gcontext: there is nothing to set up and nothing to learn:

git clone <your-repo> && cd <your-repo>
cp .env.example .env    # if present, fill in your own credentials (gcontext env shows what's missing)
claude                  # or cursor, codex; the workspace tells the agent the rest

Their agent reads the same modules yours does: the schema notes, the deploy runbook, the gotchas your team already paid to learn. Your new hire's agent knows the codebase before they do.

Nobody hand-maintains this. The agent updates modules as a side effect of doing work (it fixed a deploy quirk, it writes the quirk down), and the humans review the diffs in PRs like any other change. If a module goes stale, git blame tells you when and why.


FAQ

Isn't this just AGENTS.md / CLAUDE.md with extra steps?

A flat instructions file (including Claude Code's @imports, which inline everything at session start) puts the whole thing in the window every session, and quality degrades as it grows. gcontext keeps the always-loaded part tiny (one line per module) and everything else behind links the agent follows on demand. The honest answer to "couldn't I hand-roll this with a docs/ folder?" is: yes, partially; gcontext is that convention made consistent and cheap. The tooling adds what a convention can't enforce: load/unload per task, the regenerated index, module kinds with different lifecycles, secrets declared by name and checked with gcontext env, and gcontext validate to catch broken links and missing files. And because it's a shared convention rather than your house style, the same workspace works identically across Claude Code, Cursor, and Codex.

Is this the llms.txt web standard?

Same filename, different job. The web proposal puts an index on public websites for crawlers. gcontext uses the same index shape inside your private repo, purely for your own agent at inference time. Nothing is published or exposed.

Don't agents just ignore context files anyway?

Instructions buried in a long monolithic prompt do decay; that's an argument against flat files, not against structure. You can't force a model to read anything; what you can do is keep the always-loaded part small and make the relevant file one link away, so it's read fresh at the moment the task touches it instead of sitting 50k tokens behind.

Isn't maintaining a folder of markdown a chore? Wikis die this way.

Wikis die because humans must maintain them on the side. Here the agent maintains the modules as a side effect of doing work (when a session surfaces a gotcha or a changed endpoint, it updates the module then and there), and the human's job shrinks to reviewing diffs. That review step is the point: it's the same control you already have over code the agent writes.

What does this cost in tokens?

The index is a few hundred tokens. Module detail enters the window only when navigated to. Unloaded modules: zero.

Won't better models make this unnecessary?

Better models still won't know your schema, your conventions, or the runbook you wrote last week. That knowledge has to live somewhere. gcontext's position is that it should live in git, where you can diff it, review it, and blame it.

What gcontext is not

  • A vector database. No embeddings. The agent navigates a file tree, not a similarity search.
  • A memory model. No implicit memory. Context is explicit, human-curated, version-controlled.
  • A replacement for RAG. Complementary. gcontext structures the knowledge RAG can retrieve from.
  • An agent framework. No runtime. Works with the agent you already use.
  • Another orchestration layer. No pipelines, no runtime. Just structured, navigable knowledge.

Why the filesystem

"Why not just use a vector database / memory layer?"

Filesystem (gcontext) Vector DB / Memory
Version control git diff, git blame, full history Requires custom versioning
Inspectability Open a folder, read the files Query an API, decode embeddings
Determinism Same files = same available context Similarity search varies
Human readability It's markdown It's vectors
Composability Load/unload modules like imports Rebuild index on every change
Tooling Works with every editor, CI, linter Needs specialized tooling
Portability Copy the folder Export, migrate, re-index

The filesystem is the most universal, inspectable, composable storage layer that exists. Your agent's context should be as maintainable as the code it operates on.


Commands

Command What it does
gcontext init Create a new workspace (errors if one already exists)
gcontext new <kind> <name> [summary] Scaffold a module
gcontext load <name> [...] Activate modules in the workspace
gcontext unload <name> Deactivate a module
gcontext ls List all modules and their status
gcontext env Check if required secrets are set
gcontext validate [name] Verify module structure

Works with

gcontext produces plain markdown with a navigable index. Any agent that reads files can use it. We use it daily with:

  • Claude Code (via the generated CLAUDE.mdAGENTS.md)
  • Cursor
  • Codex
  • pi.dev

Secrets

Modules can declare required environment variables. Values go in .env (gitignored). Run gcontext env to check what's missing.

Status

Current release: v0.2.2, early, small, and functional. The CLI surface (init, new, load/unload, ls, env, validate) is complete and covered by tests; the module format may still evolve before 1.0. The CLI is MIT and standalone. An optional hosted version with a web UI and built-in chat (gcontext Cloud) is in the works; the CLI does not depend on it.


Built by Bleak AI | gcontext.ai

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gcontext_ai-0.2.3.tar.gz (38.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gcontext_ai-0.2.3-py3-none-any.whl (31.0 kB view details)

Uploaded Python 3

File details

Details for the file gcontext_ai-0.2.3.tar.gz.

File metadata

  • Download URL: gcontext_ai-0.2.3.tar.gz
  • Upload date:
  • Size: 38.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.16

File hashes

Hashes for gcontext_ai-0.2.3.tar.gz
Algorithm Hash digest
SHA256 a25cb66f636652191e5d20d75821a34ed3db3c541cdc5802237b6d0891d72820
MD5 931acddd6149c8df856c1de70c6bbc4a
BLAKE2b-256 f80d1da9dbbe83e53641549f147c7390fc34db07ddd286385c219645376a3b00

See more details on using hashes here.

File details

Details for the file gcontext_ai-0.2.3-py3-none-any.whl.

File metadata

File hashes

Hashes for gcontext_ai-0.2.3-py3-none-any.whl
Algorithm Hash digest
SHA256 c18ec6a48872fc7df6c7224c27769488afb5eceadf0d9aeea84db67f36c8a4a7
MD5 b36f61c5909b473257427038b99d4a7b
BLAKE2b-256 3d18858d55e603b7199636028f3e247e7594a34d1e0d2132ef401cdb70294d53

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page