Skip to main content

Analyze a local repository and generate AI-readable project instruction files from a single repo model.

Project description

RepoCanon

Generate repo-specific AI context for Codex, Claude Code, Copilot, and Cursor.

CI PyPI version Python versions License: MIT

Turn any repository into canonical AI-readable project context.

RepoCanon is a Python CLI that analyzes a local codebase and generates project-specific instruction files for AI coding tools from a single internal repo model.

Instead of manually maintaining separate context for different tools, RepoCanon infers your repo’s structure, commands, conventions, and boundaries, then generates outputs such as:

  • AGENTS.md
  • CLAUDE.md
  • Copilot repository instructions
  • Cursor project rules

The goal is simple: make AI coding tools behave like they already understand your repo.

Why RepoCanon

AI coding tools are useful, but they usually guess:

  • where things live
  • how the repo is structured
  • which commands to run
  • what patterns are preferred
  • what boundaries should not be crossed

RepoCanon reduces that guesswork by turning repo-specific knowledge into maintainable instruction files.

What it does

RepoCanon:

  • analyzes a local repository
  • detects languages, frameworks, commands, and topology
  • infers conventions and architectural boundaries
  • builds a normalized project model
  • generates tool-specific AI context files from that model

RepoCanon is deterministic-first. It does not require an LLM to work.

Supported targets

  • Codex via AGENTS.md
  • Claude Code via CLAUDE.md
  • GitHub Copilot via .github/copilot-instructions.md (and optional path-scoped files)
  • Cursor via .cursor/rules/*.mdc

Installation

pip install repocanon

Requires Python 3.11+.

Quickstart

# 1. Analyze the current repo and persist a normalized model.
repocanon analyze .

# 2. Inspect what was inferred and how confident RepoCanon is.
repocanon audit .

# 3. Preview generated outputs without touching the filesystem.
repocanon preview all .

# 4. Write the generated files into the repo.
repocanon generate all .

You can also generate one target at a time:

repocanon generate agents .
repocanon generate claude .
repocanon generate copilot .
repocanon generate cursor .

Example outputs

A real run produces files like:

AGENTS.md
CLAUDE.md
.cursor/rules/project-overview.mdc
.cursor/rules/commands-and-validation.mdc
.cursor/rules/code-style-and-conventions.mdc
.cursor/rules/architecture-boundaries.mdc
.github/copilot-instructions.md
.github/instructions/tests.instructions.md

See docs/samples/ for sample generated files from the bundled fixture repos.

How it works

RepoCanon has three layers:

1. Repo analysis

It scans the local repo and extracts:

  • languages
  • frameworks
  • package managers
  • commands
  • configs
  • directory structure
  • file patterns

2. Convention inference

It infers patterns such as:

  • test layout (centralized vs colocated)
  • frontend/backend split
  • monorepo structure (apps/packages/libs/services)
  • architectural boundaries
  • naming conventions
  • preferred libraries
  • common anti-pattern risks (e.g. editing existing migrations)

3. Target generation

It maps one normalized project model into tool-specific outputs.

That means the same repo understanding can be reused across multiple AI coding tools.

Design principles

  • deterministic first
  • local-first (no telemetry, no network calls)
  • tool-agnostic core
  • small, readable outputs
  • no generic filler — every section is grounded in repo facts
  • explicit uncertainty when confidence is low
  • human-editable generated files (sections between <!-- repocanon:manual:* --> markers survive regeneration)

Commands

repocanon analyze [PATH]

Analyze the repository and write a normalized model to:

.repocanon/project-model.json

repocanon generate [target] [PATH]

Generate output for one target or all targets.

Supported targets:

  • agents
  • claude
  • copilot
  • cursor
  • all

Useful flags:

  • --dry-run
  • --output-dir
  • --force

repocanon preview [target] [PATH]

Print generated output to the terminal without writing files.

repocanon audit [PATH]

Show inferred conventions, rationale, and confidence levels.

repocanon diff [PATH]

Compare the current repo scan with the saved model and report meaningful changes.

repocanon init [PATH]

Create a local RepoCanon config file at .repocanon/config.toml.

Configuration

RepoCanon stores project config in:

.repocanon/config.toml

Example:

[project]
name = "my-repo"

[scan]
include = ["src/**", "app/**", "packages/**"]
exclude = ["node_modules/**", ".next/**", "dist/**", "build/**"]

[generate]
targets = ["agents", "claude", "copilot", "cursor"]
safe_overwrite = true

Architecture overview

repocanon/
├── analyzer/    # deterministic repo scanning + inference
├── models/      # Pydantic v2 project model
├── generators/  # one module per AI target
├── output/      # writers, preview, diff
├── report/      # audit + summary tables
└── cli.py       # Typer entry point

The analyzer is a straight pipeline: file inventory → manifest parsing → framework/package-manager detection → command extraction → topology + conventions → final ProjectModel. Generators only consume that model — they never touch the filesystem.

Limitations

RepoCanon is inference-based. It can detect a lot, but not everything.

It may be less accurate when:

  • the repo is highly unconventional
  • conventions are implicit rather than visible in files
  • commands live outside standard manifests
  • architecture is unclear from structure alone

When confidence is low, RepoCanon says so rather than inventing detail.

Roadmap

  • more framework detectors (Django, Rails, .NET, Spring, etc.)
  • stronger monorepo inference (Bazel, Pants, Nx graph)
  • better path-scoped output generation
  • safer merge/update behavior for edited generated files
  • optional LLM-assisted summarization (off by default)
  • additional target formats

Why not just write these files manually?

You can. But in practice:

  • they drift out of date
  • they are inconsistent across tools
  • they are often generic
  • they rarely reflect the actual repo structure

RepoCanon keeps those files grounded in the codebase.

How RepoCanon maps one repo model to multiple AI coding tools

RepoCanon is intentionally a many-to-one-to-many pipeline:

repo files ─┐                              ┌─► AGENTS.md            (Codex)
            ├─► analyzer ─► ProjectModel ──┼─► CLAUDE.md            (Claude Code)
manifests  ─┘                              ├─► copilot-instructions (Copilot)
                                           └─► .cursor/rules/*.mdc  (Cursor)

The analyzer collapses everything it sees into a single normalized ProjectModel (Pydantic v2). That model is the only thing target generators read; they never touch the filesystem. This gives RepoCanon two important properties:

  1. One source of truth. Languages, frameworks, commands, conventions, anti-patterns, and architecture boundaries all live in one place. Adding a new target means writing a new generator that consumes the same model — not re-implementing detection.
  2. Idiomatic outputs per tool. Each generator picks the parts of the model that make sense for its target and renders them in that tool's idiom: a verbose AGENTS.md for Codex, a terse CLAUDE.md for Claude Code, a repo-wide instructions file (plus optional path-scoped ones) for Copilot, and a small set of focused .mdc rule files for Cursor.

The same model also powers audit, diff, and preview, so you can verify what RepoCanon inferred before any file is written.

Contributing

Contributions are welcome. See CONTRIBUTING.md for local setup, tests, and development workflow.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

repocanon-0.1.3.tar.gz (35.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

repocanon-0.1.3-py3-none-any.whl (46.6 kB view details)

Uploaded Python 3

File details

Details for the file repocanon-0.1.3.tar.gz.

File metadata

  • Download URL: repocanon-0.1.3.tar.gz
  • Upload date:
  • Size: 35.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for repocanon-0.1.3.tar.gz
Algorithm Hash digest
SHA256 d553a6d60388a2fda8f355972a4daee060cadf0de98647add17a3b08e4d3d324
MD5 063f180a5a84bb766b44af0bd06058fa
BLAKE2b-256 fb04af3acd82d3bf6c9d18adbfc86b5ca92ba7c59cec252420d03549bb3f522a

See more details on using hashes here.

Provenance

The following attestation bundles were made for repocanon-0.1.3.tar.gz:

Publisher: release.yml on NehharShah/repocanon

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file repocanon-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: repocanon-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 46.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for repocanon-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 7a4dddf69f72565cc3af9de25200122b80d8d16c2f6af41900b21f1fb6d77766
MD5 875cb7e9ebd9500b67ebdc4812000236
BLAKE2b-256 45f2bfe3211ef03d12fd1cc4eeec403ae2e4a0d8b19e8acb078790fe4b3349f3

See more details on using hashes here.

Provenance

The following attestation bundles were made for repocanon-0.1.3-py3-none-any.whl:

Publisher: release.yml on NehharShah/repocanon

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page