Skip to main content

A node parser which can create a hierarchy of all code scopes in a directory.

Project description

CodeHierarchyAgentPack

# install
pip install llama-index-packs-code-hierarchy

# download source code
llamaindex-cli download-llamapack CodeHierarchyAgentPack -d ./code_hierarchy_pack

The CodeHierarchyAgentPack is useful to split long code files into more reasonable chunks, while creating an agent on top to navigate the code. What this will do is create a "Hierarchy" of sorts, where sections of the code are made more reasonable by replacing the scope body with short comments telling the LLM to search for a referenced node if it wants to read that context body.

Nodes in this hierarchy will be split based on scope, like function, class, or method scope, and will have links to their children and parents so the LLM can traverse the tree.

from llama_index.core.text_splitter import CodeSplitter
from llama_index.llms.openai import OpenAI
from llama_index.packs.code_hierarchy import (
    CodeHierarchyAgentPack,
    CodeHierarchyNodeParser,
)

llm = OpenAI(model="gpt-4", temperature=0.2)

documents = SimpleDirectoryReader(
    input_files=[
        Path("../llama_index/packs/code_hierarchy/code_hierarchy.py")
    ],
    file_metadata=lambda x: {"filepath": x},
).load_data()

split_nodes = CodeHierarchyNodeParser(
    language="python",
    # You can further parameterize the CodeSplitter to split the code
    # into "chunks" that match your context window size using
    # chunck_lines and max_chars parameters, here we just use the defaults
    code_splitter=CodeSplitter(
        language="python", max_chars=1000, chunk_lines=10
    ),
).get_nodes_from_documents(documents)

pack = CodeHierarchyAgentPack(split_nodes=split_nodes, llm=llm)

pack.run(
    "How does the get_code_hierarchy_from_nodes function from the code hierarchy node parser work? Provide specific implementation details."
)

A full example can be found here in combination with `.

Repo Maps

The pack contains a CodeHierarchyKeywordQueryEngine that uses a CodeHierarchyNodeParser to generate a map of a repository's structure and contents. This is useful for the LLM to understand the structure of a codebase, and to be able to reference specific files or directories.

For example:

  • code_hierarchy
    • _SignatureCaptureType
    • _SignatureCaptureOptions
    • _ScopeMethod
    • _CommentOptions
    • _ScopeItem
    • _ChunkNodeOutput
    • CodeHierarchyNodeParser
      • class_name
      • init
      • _get_node_name
        • recur
      • _get_node_signature
        • find_start
        • find_end
      • _chunk_node
      • get_code_hierarchy_from_nodes
        • get_subdict
        • recur_inclusive_scope
        • dict_to_markdown
      • _parse_nodes
      • _get_indentation
      • _get_comment_text
      • _create_comment_line
      • _get_replacement_text
      • _skeletonize
      • _skeletonize_list
        • recur

Usage as a Tool with an Agent

You can create a tool for any agent using the nodes from the node parser:

from llama_index.core.agent.workflow import FunctionAgent
from llama_index.llms.openai import OpenAI
from llama_index.core.tools import QueryEngineTool
from llama_index.packs.code_hierarchy import CodeHierarchyKeywordQueryEngine

query_engine = CodeHierarchyKeywordQueryEngine(
    nodes=split_nodes,
)

tool = QueryEngineTool.from_defaults(
    query_engine=query_engine,
    name="code_lookup",
    description="Useful for looking up information about the code hierarchy codebase.",
)

agent = FunctionAgent(
    tools=[tool],
    system_prompt=query_engine.get_tool_instructions(),
    llm=OpenAI(model="gpt-4.1"),
)

Adding new languages

To add a new language you need to edit _DEFAULT_SIGNATURE_IDENTIFIERS in code_hierarchy.py.

The docstrings are infomative as how you ought to do this and its nuances, it should work for most languages.

Please test your new language by adding a new file to tests/file/code/ and testing all your edge cases.

People often ask "how do I find the Node Types I need for a new language?" The best way is to use breakpoints. I have added a comment TIP: This is a wonderful place to put a debug breakpoint in the code_hierarchy.py file, put a breakpoint there, input some code in the desired language, and step through it to find the name of the node you want to capture.

The code as it is should handle any language which:

  1. expects you to indent deeper scopes
  2. has a way to comment, either full line or between delimiters

Future

I'm considering adding all the languages from aider by incorporating .scm files instead of _SignatureCaptureType, _SignatureCaptureOptions, and _DEFAULT_SIGNATURE_IDENTIFIERS

Contributing

You will need to set your OPENAI_API_KEY in your env to run the notebook or test the pack.

You can run tests with pytest tests in the root directory of this pack.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llama_index_packs_code_hierarchy-0.6.1.tar.gz (17.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file llama_index_packs_code_hierarchy-0.6.1.tar.gz.

File metadata

File hashes

Hashes for llama_index_packs_code_hierarchy-0.6.1.tar.gz
Algorithm Hash digest
SHA256 1da205fa485a55e4d1dc8b335d6c88ba3dda12eba326add60f13d0a2f2b3f64d
MD5 722dced7cbe2456fe1e08e8241ab8668
BLAKE2b-256 d334af3c263edfeb4ea847c3bf87d2d8c8ed37f62e9ff8ee47b1327bbfaa2025

See more details on using hashes here.

File details

Details for the file llama_index_packs_code_hierarchy-0.6.1-py3-none-any.whl.

File metadata

File hashes

Hashes for llama_index_packs_code_hierarchy-0.6.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e80e2bfb4fea12a82e715eaa149b15e98b1974e3927d160b42293bbd17d0936f
MD5 3164b683ae43f1c9e93591a1a18f1282
BLAKE2b-256 a0872e3670b44853a916dc60f5f982b9ec0ff0c5c24636cec45945f431adb324

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page