Skip to main content

Llama-github is an open-source Python library that empowers LLM Chatbots, AI Agents, and Auto-dev Agents to conduct Retrieval from actively selected GitHub public projects. It Augments through LLMs and Generates context for any coding question, in order to streamline the development of sophisticated AI-driven applications.

Project description

llama-github

PyPI version Downloads License

Llama-github is a powerful tool that helps you retrieve the most relevant code snippets, issues, and repository information from GitHub based on your queries, transforming them into valuable knowledge context. It empowers LLM Chatbots, AI Agents, and Auto-dev Agents to solve complex coding tasks. Whether you're a developer looking for quick solutions or an engineer implementing advanced Auto Dev AI Agents, llama-github makes it easy and efficient.

If you like this project or believe it has potential, please give it a ⭐️. Your support is our greatest motivation!

Architecture

High Level Architecture

Installation

pip install llama-github

Usage

Here's a simple example of how to use llama-github:

from llama_github import GithubRAG

# Initialize GithubRAG with your credentials
github_rag = GithubRAG(
    github_access_token="your_github_access_token", 
    openai_api_key="your_openai_api_key", # Optional in Simple Mode
    jina_api_key="your_jina_api_key" # Optional - unless you want high concurrency production deployment (s.jina.ai API will be used in llama-github)
)

# Retrieve context for a coding question (simple_mode is default set to False)
query = "How to create a NumPy array in Python?"
context = github_rag.retrieve_context(
    query, # In professional mode, one query will take nearly 1 min to generate final contexts. You could set log level to INFO to monitor the retrieval progress
    # simple_mode = True
)

print(context)

For more advanced usage and examples, please refer to the documentation.

Key Features

  • 🔍 Intelligent GitHub Retrieval: Harness the power of llama-github to retrieve highly relevant code snippets, issues, and repository information from GitHub based on user queries. Our advanced retrieval techniques ensure you find the most pertinent information quickly and efficiently.

  • ⚡ Repository Pool Caching: Llama-github has an innovative repository pool caching mechanism. By caching repositories (including READMEs, structures, code, and issues) across threads, llama-github significantly accelerates GitHub search retrieval efficiency and minimizes the consumption of GitHub API tokens. Deploy llama-github in multi-threaded production environments with confidence, knowing that it will perform optimally and save you valuable resources.

  • 🧠 LLM-Powered Question Analysis: Leverage state-of-the-art language models to analyze user questions and generate highly effective search strategies and criteria. Llama-github intelligently breaks down complex queries, ensuring that you retrieve the most relevant information from GitHub's vast repository network.

  • 📚 Comprehensive Context Generation: Generate rich, contextually relevant answers by seamlessly combining information retrieved from GitHub with the reasoning capabilities of advanced language models. Llama-github excels at handling even the most complex and lengthy questions, providing comprehensive and insightful responses that include extensive context to support your development needs.

  • 🚀 Asynchronous Processing Excellence: Llama-github is built from the ground up to leverage the full potential of asynchronous programming. With meticulously implemented asynchronous mechanisms woven throughout the codebase, llama-github can handle multiple requests concurrently, significantly boosting overall performance. Experience the difference as llama-github efficiently manages high-volume workloads without compromising on speed or quality.

  • 🔧 Flexible LLM Integration: Easily integrate llama-github with various LLM providers, embedding models, and reranking models to tailor the library's capabilities to your specific requirements. Our extensible architecture allows you to customize and enhance llama-github's functionality, ensuring that it adapts seamlessly to your unique development environment.

  • 🔒 Robust Authentication Options: Llama-github supports both personal access tokens and GitHub App authentication, providing you with the flexibility to integrate it into different development setups. Whether you're an individual developer or working within an organizational context, llama-github has you covered with secure and reliable authentication mechanisms.

  • 🛠️ Logging and Error Handling: We understand the importance of smooth operations and easy troubleshooting. That's why llama-github comes equipped with comprehensive logging and error handling mechanisms. Gain deep insights into the library's behavior, quickly diagnose issues, and maintain a stable and reliable development workflow.

🤖 Try Our AI-Powered PR Review Assistant: LlamaPReview

If you find llama-github useful, you might also be interested in our AI-powered GitHub PR review assistant, LlamaPReview. It's designed to complement your development workflow and further enhance code quality.

Key Features of LlamaPReview:

  • 🚀 One-click installation, zero configuration required, fully auto-run
  • 💯 Currently free to use - no credit card or payment info needed
  • 🧠 AI-powered, automatic PR reviews with deep code understanding
  • 🌐 Supports multiple programming languages

LlamaPReview utilizes llama-github's advanced context retrieval and LLM-powered analysis to provide intelligent, context-aware code reviews. It's like having a senior developer, armed with the full context of your repository, review every PR automatically!

👉 Install LlamaPReview Now (Free)

By using llama-github for context retrieval and LlamaPReview for code reviews, you can create a powerful, AI-enhanced development environment.

Vision and Roadmap

Vision

Our vision is to become a pivotal module in the future of AI-driven development solutions, seamlessly integrating with GitHub to empower LLMs in automatically resolving complex coding tasks.

Vision Architecture

Roadmap

For a detailed view of our project roadmap, please visit our Project Roadmap.

Acknowledgments

We would like to express our gratitude to the following open-source projects for their support and contributions:

  • LangChain: For providing the foundational framework that empowers the LLM prompting and processing capabilities in llama-github.
  • Jina.ai: For offering s.jina.ai API and open source reranker and embedding models that enhance the accuracy and relevance of the generated contexts in llama-github.

Their contributions have been instrumental in the development of llama-github, and we highly recommend checking out their projects for more innovative solutions.

Contributing

We welcome contributions to llama-github! Please see our contributing guidelines for more information.

License

This project is licensed under the terms of the Apache 2.0 license. See the LICENSE file for more details.

Contact

If you have any questions, suggestions, or feedback, please feel free to reach out to us at Jet Xu's email.


Thank you for choosing llama-github! We hope this library enhances your AI development experience and helps you build powerful applications with ease.

Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

[0.2.1] - 2024-11-16

Optimized

  • approperately handle more file types when calculate file changes in PR

[0.2.0] - 2024-11-16

Optimized

  • fix bugs for generate repo from pool by using Github_install_id

[0.1.9] - 2024-11-04

Optimized

  • fix bugs for get pr content

[0.1.8] - 2024-11-03

Optimized

  • fix bugs for get pr content file diff calculate logic

[0.1.7] - 2024-10-31

Optimized

  • fix bugs for get pr content

[0.1.6] - 2024-10-30

New Features

  • Enhanced PR content analysis with detailed commit information extraction
  • Improved issue linking detection with support for multiple reference formats
    • Full GitHub URLs, #references, and keyword-based references
    • Added validation for issue numbers

Improvements

  • Added detailed commit metadata extraction including stats and file changes
  • Enhanced error handling for commit fetching

[0.1.5] - 2024-10-14

Optimized

  • requirements.txt updated to more precise list

0.1.4 - 2024-10-14

Improved

  • Optimized simple_mode:
    • Removed dependencies on Torch and Transformers libraries
    • Reduced memory footprint
    • Eliminated related imports
    • Enhanced compatibility with AWS Lambda environment

0.1.3 - 2024-10-14

Added

  • Modified LLMManager class to skip loading embedding and reranker models when simple_mode is enabled
  • Updated retrieve_context method to use instance's simple_mode by default, with option to override

Improved

  • Faster initialization process when simple_mode is enabled, skipping embedding and reranker model loading
  • More flexible usage of simple_mode in retrieve_context, allowing per-call customization

Developer Notes

  • When using simple_mode=True during GithubRAG initialization, be aware that embedding and reranking functionalities will not be available
  • The retrieve_context method now uses a late binding approach for simple_mode parameter

0.1.2 - 2024-10-09

Added

  • New get_pr_content method in Repository class for comprehensive PR data retrieval
  • Singleton pattern implementation for efficient PR data caching
  • Support for LLM-assisted PR analysis and Q&A capabilities
  • Automatic caching mechanism to reduce API calls and improve performance
  • Threaded comment and review retrieval functionality

Changed

  • Improved PR data fetching process to include metadata, file changes, and comments

Optimized

  • Reduced API calls through intelligent caching of PR data

Developer Notes

  • First call to get_pr_content fetches data from GitHub API, subsequent calls use cached data
  • Cache automatically refreshes when PR is updated

0.1.1 - 2024-08-23

Added

  • Implemented answer_with_context method for direct answer generation (closes #6)
  • Added support for Mistral AI LLM provider
  • Enhanced retrieve_context function to include metadata (e.g., URLs) with each context string (closes #2)

Changed

  • Improved reranking with jina-reranker-v2 for better context retrieval
  • Updated return type of retrieve_context to accommodate metadata

Fixed

  • Resolved warning during context retrieval (closes #3)

Improved

  • Enhanced overall context retrieval process
  • Expanded LLM support for more versatile use cases

0.1.0 - 2024-08-15

Added

  • Initial release of llama-github
  • Basic functionality for retrieving context from GitHub repositories
  • Integration with LLM for processing and generating responses

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llama_github-0.2.1.tar.gz (49.3 kB view details)

Uploaded Source

Built Distribution

llama_github-0.2.1-py3-none-any.whl (49.6 kB view details)

Uploaded Python 3

File details

Details for the file llama_github-0.2.1.tar.gz.

File metadata

  • Download URL: llama_github-0.2.1.tar.gz
  • Upload date:
  • Size: 49.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for llama_github-0.2.1.tar.gz
Algorithm Hash digest
SHA256 3fd29dc773fbf47d4410ae11aaeca6c3edf55e05238ba0e48d0ec58844bc72cc
MD5 f706aff24a581c30994d696838896500
BLAKE2b-256 2781fc32c375ad21833a70572425711fda62b5b421e5dfbeb54ed92cdea4c314

See more details on using hashes here.

Provenance

The following attestation bundles were made for llama_github-0.2.1.tar.gz:

Publisher: publish.yml on JetXu-LLM/llama-github

Attestations:

File details

Details for the file llama_github-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: llama_github-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 49.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for llama_github-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 c511a955560f59a109e4e198d5cdf86fc529ff68fa2d7048b2402383fd4157e6
MD5 53dec295b4a09bc11b6b9f02b312436e
BLAKE2b-256 d04967fcae4d9290cd1759dccf7d33d7d29cc27c8b770b429c19d320db2f4693

See more details on using hashes here.

Provenance

The following attestation bundles were made for llama_github-0.2.1-py3-none-any.whl:

Publisher: publish.yml on JetXu-LLM/llama-github

Attestations:

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page