Skip to main content

The ultimate RAG for your monorepo. Query, understand, and edit multi-language codebases with the power of AI and knowledge graphs

Project description

Code-Graph-RAG

A graph-based RAG system that parses multi-language codebases with Tree-sitter, builds knowledge graphs in Memgraph, and enables natural language querying, editing, and optimization.

Install

pip install code-graph-rag

With all Tree-sitter grammars (Python, JS, TS, Rust, Go, Java, Scala, C++, Lua):

pip install 'code-graph-rag[treesitter-full]'

With semantic code search (UniXcoder embeddings):

pip install 'code-graph-rag[semantic]'

Prerequisites

  • Python 3.12+
  • Docker (for Memgraph)
  • cmake (for building pymgclient)
  • ripgrep (rg) (for shell command text searching)

CLI Quick Start

The package installs a cgr command.

Start Memgraph, parse a repo, and query it:

docker compose up -d                       # start Memgraph
cgr start --repo-path ./my-project \
          --update-graph --clean           # parse & launch interactive chat

Index to protobuf for offline use:

cgr index -o ./index-output --repo-path ./my-project

Export knowledge graph to JSON:

cgr export -o graph.json

AI-guided optimization:

cgr optimize python --repo-path ./my-project

Run as an MCP server (for Claude Code):

cgr mcp-server

Check your setup:

cgr doctor

Python SDK

The cgr package provides short imports for programmatic use.

Load and query an exported graph

from cgr import load_graph

graph = load_graph("graph.json")
print(graph.summary())

functions = graph.find_nodes_by_label("Function")
for fn in functions[:5]:
    rels = graph.get_relationships_for_node(fn.node_id)
    print(f"{fn.properties['name']}: {len(rels)} relationships")

Query Memgraph with Cypher

from cgr import MemgraphIngestor

with MemgraphIngestor(host="localhost", port=7687) as db:
    rows = db.fetch_all("MATCH (f:Function) RETURN f.name LIMIT 10")
    for row in rows:
        print(row)

Generate Cypher from natural language

import asyncio
from cgr import CypherGenerator

async def main():
    gen = CypherGenerator()
    cypher = await gen.generate("Find all classes that inherit from BaseModel")
    print(cypher)

asyncio.run(main())

Semantic code search

Requires the semantic extra.

from cgr import embed_code

embedding = embed_code("def authenticate(user, password): ...")
print(f"Embedding dimension: {len(embedding)}")

Configuration

from cgr import settings

settings.set_orchestrator("openai", "gpt-4o", api_key="sk-...")
settings.set_cypher("google", "gemini-2.5-flash", api_key="your-key")

Environment Variables

Configure via .env or environment variables:

Variable Default Description
MEMGRAPH_HOST localhost Memgraph hostname
MEMGRAPH_PORT 7687 Memgraph port
ORCHESTRATOR_PROVIDER Provider: google, openai, ollama
ORCHESTRATOR_MODEL Model ID (e.g. gpt-4o, gemini-2.5-pro)
ORCHESTRATOR_API_KEY API key for the provider (not needed for ollama)
CYPHER_PROVIDER Provider for Cypher generation
CYPHER_MODEL Model ID for Cypher generation (e.g. codellama, gpt-4o-mini)
CYPHER_API_KEY API key for Cypher provider (not needed for ollama)
TARGET_REPO_PATH . Default repository path

Documentation

Full documentation, architecture details, and contribution guide: docs.code-graph-rag.com

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

code_graph_rag-0.0.187.tar.gz (282.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

code_graph_rag-0.0.187-py3-none-any.whl (320.8 kB view details)

Uploaded Python 3

File details

Details for the file code_graph_rag-0.0.187.tar.gz.

File metadata

  • Download URL: code_graph_rag-0.0.187.tar.gz
  • Upload date:
  • Size: 282.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for code_graph_rag-0.0.187.tar.gz
Algorithm Hash digest
SHA256 a002462c4488c0bf54a5f8f9b709a6a17803c72b4f236b87ae80a7c5b3d8c237
MD5 1ce4823d58aa7a2d8642826f32246020
BLAKE2b-256 db520e7aeaabe7021f68c9cee8770bf898a755bb37ec6ed23ab8e7718026010f

See more details on using hashes here.

File details

Details for the file code_graph_rag-0.0.187-py3-none-any.whl.

File metadata

File hashes

Hashes for code_graph_rag-0.0.187-py3-none-any.whl
Algorithm Hash digest
SHA256 ffb0bd13a6571b7f36261c353de3715ce3e300d4cbc66201320c0a0c156e4032
MD5 414beeafd531b9995d3c9068ce1f26cc
BLAKE2b-256 869af8b9c2f41109f18b6538a53cfa9548bcfa4f980813c05a83dc4b72713117

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page