A local context layer for AI tools: mirror your repositories, index them into a knowledge graph, and serve it over MCP.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

sayak-sarkar

These details have not been verified by PyPI

Project description

contextlake

All your real context, in one local lake.

Python 3.9+ License: MIT

A local context layer for your AI tools — your repositories mirrored, indexed into a knowledge graph, and served over MCP, so agents work from real source instead of guessing.

You have access to dozens — maybe hundreds — of repositories scattered across a GitLab group and its subgroups. You want them all on your laptop, in the same shape they have on GitLab, each sitting on the branch where the real work is happening, and you want a single command to keep it that way.

That's the foundation. contextlake enumerates everything you can reach, clones what's missing into a faithful mirror of the namespace tree, pulls what's stale, and parks each repo on its most active branch — concurrently, with retries, and without ever stomping on the feature branch you're in the middle of.

On top of that mirror, an optional knowledge layer indexes everything into a graph and serves it to your AI tools over MCP — so they answer from real source. (Today the source is GitLab; the design is source-agnostic.)

It carries no credentials of its own: authentication rides entirely on your existing glab login and git setup.

pip install .
contextlake status      # see where you stand
contextlake sync        # fetch → clone → update → branches → verify

New here? QUICKSTART.md takes you from install to a fully-wired AI workspace (mirror → knowledge graph → Claude Code / Windsurf) in a few minutes.

What's in the box

The core loop

Discovers everything in a GitLab group and its subgroups via the API.
Clones what's missing, preserving GitLab's exact directory structure.
Updates what's stale with a fast-forward pull, honestly reporting whether anything actually changed.
Rides the active branch — picks each repo's liveliest branch by commit count, recency, or a hybrid of both (your call).
Verifies the mirror against GitLab and flags drift, orphans, and repos-nested-inside-repos.

Because it runs across hundreds of repos

Concurrent by default, with an adaptive worker pool that backs off when the network starts misbehaving and ramps back up when it recovers.
Resilient — exponential backoff with jitter on transient failures, fail-fast on the ones that won't recover (DNS, TLS).

Because it's your working machine

Branch safety: never yanks you off a working branch or clobbers uncommitted changes — skip, or --auto-stash, your choice.
--dry-run everything first if you're the cautious type.
Configurable via INI files (local + global) with sensible precedence, plus per-run CLI overrides.

Installation

The fastest, zero-config path is uv — it fetches the right Python and an isolated environment for you, so there's nothing to set up:

uv tool install "contextlake[kb]"        # install the CLI on your PATH
# or run it once, ephemerally, without installing:
uvx --from "contextlake[kb]" contextlake --help

Prefer pipx or pip? Those work too:

pipx install "contextlake[kb]"
# pip install "contextlake[kb]"          # into an active virtualenv

The [kb] extra pulls in the knowledge layer (graph index, embeddings, LLM-wiki, MCP server). Plain contextlake is just the GitLab-mirroring CLI.

Other prerequisites: git, and — for mirroring — an authenticated glab (glab auth login). Once installed, contextlake, python -m contextlake, and python3 contextlake.py are equivalent.

Quickstart — one repo, no setup

You don't need GitLab or any config to try contextlake on a repo you already have. Point it at any local git repo:

contextlake index --source .          # parse this repo into a local knowledge graph
contextlake graph --overview --open   # open the interactive graph in your browser
contextlake serve                     # …or serve it to your AI IDE over MCP

Everything lands in a local store (~/.contextlake/kb) — nothing leaves your machine. Index any path with --source PATH, or every git repo under a directory with --workspace DIR. Keep separate stores by pointing --config my.toml at a file with [kb] / store_dir = "...".

Where contextlake goes beyond single-repo tools is mirroring and cross-referencing a whole GitLab fleet — that's the setup below.

Configure (fleet mode)

To mirror and cross-reference a whole GitLab group, copy the example and set your group + workspace:

cp .contextlake.ini.example ~/.contextlake.ini

[contextlake]
work_dir = ~/work
gitlab_group = your-gitlab-group

The tool carries no credentials of its own — auth rides on glab — so .contextlake.ini holds only non-secret settings and is gitignored by default. The full option reference is in docs/usage.md.

Behind a slow / TLS-inspecting corporate proxy (e.g. Zscaler) where glab's API calls time out, set GITLAB_TOKEN (a read_api token) — contextlake then enumerates projects via its own HTTP client, which tolerates the slow DNS where glab's short dial timeout fails.

Usage

Run commands as contextlake <command> — full per-command docs are in docs/usage.md.

Commands at a glance

Command	What it does
`status`	Show the workspace sync state vs GitLab (read-only)
`fetch`	Cache the GitLab project list
`clone`	Clone repos that exist on GitLab but not locally
`update`	Pull updates for local repos (skips only repos with a dirty working tree)
`branches`	Switch each repo to its most active branch
`verify`	Check the local mirror matches GitLab (drift, orphans, nesting)
`sync`	The full pipeline: fetch → clone → update → branches → verify → audit
`audit`	Repo health & age: empty/README-only repos + creation & last-commit dates (JSON + CSV)
`bootstrap`	Turnkey: sync + index + connect + embed + wiki + steer
`index`	Build the code/dependency graph (`--workspace`, incremental, `--watch`)
`connect`	Link repos to Atlassian / Figma / GitLab sources
`embed`	Build semantic-search vectors (zero-config built-in CPU model, or Ollama / an API)
`lint`	Graph health — stale repos (HEAD moved) and dangling edges; exits non-zero if any
`wiki`	LLM-synthesized, council-verified wiki pages (zero-config built-in model, or Ollama / an API)
`steer`	Write editor steering — `AGENTS.md`, `.mcp.json`, `.windsurfrules`, skills
`serve`	Expose the graph over MCP (`--transport stdio`/`http`)
`query`	Search the index (`--kind`, `--repo`, `--limit`, `--as-of <commit>`)
`graph`	Visualize the graph — offline interactive HTML / DOT / Mermaid / JSON (`--overview`, `--serve`)
`doctor`	Check the knowledge-layer environment (SQLite FTS5, git/glab, store, embeddings)

The first eight are the core sync (detailed below); the rest are the optional knowledge layer. Run any command with --config (sync INI) and, for the knowledge layer, --config/--kb-config pointing at your kb.toml.

Global options

These apply to any command:

--dry-run — preview clone/update/branch actions without changing anything.
-v / --verbose, -q / --quiet — control console verbosity.
--log-file PATH — append a full timestamped audit log (rotating).
--config PATH — use a specific config file (highest precedence).
--version — print the version and exit.

Output is colorized on a terminal (status glyphs, a progress bar); set NO_COLOR to disable or FORCE_COLOR to keep colours when piping. Colours are dropped automatically for non-TTY output (pipes, cron, log files).

A read-only status followed by a --dry-run sync is the safest way to preview what a sync would do:

contextlake status
contextlake --dry-run sync

Knowledge layer (optional)

Beyond mirroring, an optional layer (contextlake.kb) turns your repos into a knowledge graph and serves it to AI tools over MCP — so Claude Code, Windsurf, or Kiro can answer "where is X defined?" or "who calls Y?" instead of grepping. It can also link repos to their Atlassian / Figma / GitLab items, add semantic search, write a curated wiki, visualize the graph (contextlake graph → an offline, interactive HTML — fleet overview, a symbol's neighbourhood, or a single repo), and generate per-tool steering files + a skills library. Most of it needs no model; the rest works with a local Ollama or any OpenAI-compatible endpoint.

One command sets it all up:

contextlake bootstrap --kb-config ~/.contextlake/kb.toml

→ Full guide: docs/knowledge-layer.md.

Documentation

QUICKSTART.md — install → bootstrap → wire your editor, in minutes
docs/usage.md — every command, configuration, branch safety, scheduling
docs/knowledge-layer.md — the graph, connectors, search, wiki, steering
docs/internals.md — architecture & internals
docs/releasing.md — maintainer runbook: versioning, tagging, publishing to PyPI
BRANDING.md — brand guide (name, palette, logo, mascot)
CHANGELOG.md · ROADMAP.md · CONTRIBUTING.md

License

This project is licensed under the MIT License - see the LICENSE file for details.

Support

For issues or questions:

Check this documentation first
Review log files for error messages
Test individual commands to isolate issues
Verify glab authentication: glab auth status
Check GitLab access permissions in web interface

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

sayak-sarkar

These details have not been verified by PyPI

Release history Release notifications | RSS feed

2.11.0

Jun 28, 2026

2.10.0

Jun 27, 2026

2.9.1

Jun 26, 2026

2.9.0

Jun 26, 2026

2.8.0

Jun 26, 2026

2.7.0

Jun 25, 2026

2.6.0

Jun 25, 2026

2.5.1

Jun 25, 2026

This version

2.5.0

Jun 25, 2026

2.4.0

Jun 25, 2026

2.3.0

Jun 25, 2026

2.2.0

Jun 23, 2026

2.1.6

Jun 22, 2026

2.1.5

Jun 22, 2026

2.1.4

Jun 22, 2026

2.1.3

Jun 22, 2026

2.1.2

Jun 22, 2026

2.1.1

Jun 22, 2026

2.1.0

Jun 22, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

contextlake-2.5.0.tar.gz (261.2 kB view details)

Uploaded Jun 25, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

contextlake-2.5.0-py3-none-any.whl (265.4 kB view details)

Uploaded Jun 25, 2026 Python 3

File details

Details for the file contextlake-2.5.0.tar.gz.

File metadata

Download URL: contextlake-2.5.0.tar.gz
Upload date: Jun 25, 2026
Size: 261.2 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for contextlake-2.5.0.tar.gz
Algorithm	Hash digest
SHA256	`7e86a7b5f244bac1677bf63e25fcc3f9c3557e0c82f6295474153f1695dec915`
MD5	`ee6a98fc611a6854c5cc403eae78d3cb`
BLAKE2b-256	`ef4c9610f9b2c3931c4e5d47885deb1f036df93ad765aadf3814e2d7df72ff04`

See more details on using hashes here.

Provenance

The following attestation bundles were made for contextlake-2.5.0.tar.gz:

Publisher: release.yml on sayak-sarkar/contextlake

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: contextlake-2.5.0.tar.gz
- Subject digest: 7e86a7b5f244bac1677bf63e25fcc3f9c3557e0c82f6295474153f1695dec915
- Sigstore transparency entry: 1957031836
- Sigstore integration time: Jun 25, 2026
Source repository:
- Permalink: sayak-sarkar/contextlake@82110843eb81dd2315f07aad14ed18e289c54f17
- Branch / Tag: refs/tags/v2.5.0
- Owner: https://github.com/sayak-sarkar
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@82110843eb81dd2315f07aad14ed18e289c54f17
- Trigger Event: push

File details

Details for the file contextlake-2.5.0-py3-none-any.whl.

File metadata

Download URL: contextlake-2.5.0-py3-none-any.whl
Upload date: Jun 25, 2026
Size: 265.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for contextlake-2.5.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7fe6d2bf75fb0201ae3ebc6d4efdc62a8f4f665cb41a6452e475be9bde83cca1`
MD5	`5f8223553370478199152ca95e0328bc`
BLAKE2b-256	`7dd68533b0622356e607d5cd8b030a4089b6552caada08c4b4a6784f2fea270a`

See more details on using hashes here.

Provenance

The following attestation bundles were made for contextlake-2.5.0-py3-none-any.whl:

Publisher: release.yml on sayak-sarkar/contextlake

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: contextlake-2.5.0-py3-none-any.whl
- Subject digest: 7fe6d2bf75fb0201ae3ebc6d4efdc62a8f4f665cb41a6452e475be9bde83cca1
- Sigstore transparency entry: 1957031935
- Sigstore integration time: Jun 25, 2026
Source repository:
- Permalink: sayak-sarkar/contextlake@82110843eb81dd2315f07aad14ed18e289c54f17
- Branch / Tag: refs/tags/v2.5.0
- Owner: https://github.com/sayak-sarkar
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@82110843eb81dd2315f07aad14ed18e289c54f17
- Trigger Event: push

contextlake 2.5.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Meta

Unverified details

Meta

Classifiers

Project description

contextlake

What's in the box

Installation

Quickstart — one repo, no setup

Configure (fleet mode)

Usage

Commands at a glance

Global options

Knowledge layer (optional)

Documentation

License

Support

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Meta

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance