llama-index readers github integration

These details have not been verified by PyPI

Project description

LlamaIndex Readers Integration: Github

pip install llama-index-readers-github

The github readers package consists of three separate readers:

Repository Reader
Issues Reader
Collaborators Reader

Authentication

The readers support two authentication methods:

1. Personal Access Token (PAT)

Generate a token under your account settings at https://github.com/settings/tokens

from llama_index.readers.github import GithubClient

# Direct token
client = GithubClient(github_token="ghp_your_token_here")

# Or via environment variable
import os

os.environ["GITHUB_TOKEN"] = "ghp_your_token_here"
client = GithubClient()  # Automatically uses GITHUB_TOKEN

2. GitHub App Authentication

For better security, rate limits, and organization-level access, use GitHub App authentication:

from llama_index.readers.github import GithubClient, GitHubAppAuth

# Load your GitHub App private key
with open("path/to/private-key.pem", "r") as f:
    private_key = f.read()

# Create GitHub App auth handler
app_auth = GitHubAppAuth(
    app_id="123456",  # Your GitHub App ID
    private_key=private_key,  # Private key content (PEM format)
    installation_id="789012",  # Installation ID for the target org/repo
)

# Use with any client
client = GithubClient(github_app_auth=app_auth)

Installation for GitHub App support:

pip install llama-index-readers-github[github-app]

Benefits of GitHub App authentication:

Higher rate limits: 5,000 requests/hour per installation (vs 5,000/hour for PAT)
Fine-grained permissions: Repository-specific access control
Better security: Tokens auto-expire after 1 hour
Organization-level: Can be installed across multiple repositories
Auditability: Actions attributed to the app, not individual users

Repository Reader

This reader will read through a repo, with options to specifically filter directories, file extensions, file paths, and custom processing logic.

Basic Usage

from llama_index.readers.github import GithubRepositoryReader, GithubClient

client = github_client = GithubClient(github_token=github_token, verbose=False)

reader = GithubRepositoryReader(
    github_client=github_client,
    owner="run-llama",
    repo="llama_index",
    use_parser=False,
    verbose=True,
    filter_directories=(
        ["docs"],
        GithubRepositoryReader.FilterType.INCLUDE,
    ),
    filter_file_extensions=(
        [
            ".png",
            ".jpg",
            ".jpeg",
            ".gif",
            ".svg",
            ".ico",
            "json",
            ".ipynb",
        ],
        GithubRepositoryReader.FilterType.EXCLUDE,
    ),
)

documents = reader.load_data(branch="main")

Advanced Filtering Options

Filter Specific File Paths

# Include only specific files
reader = GithubRepositoryReader(
    github_client=github_client,
    owner="run-llama",
    repo="llama_index",
    filter_file_paths=(
        ["README.md", "src/main.py", "docs/guide.md"],
        GithubRepositoryReader.FilterType.INCLUDE,
    ),
)

# Exclude specific files
reader = GithubRepositoryReader(
    github_client=github_client,
    owner="run-llama",
    repo="llama_index",
    filter_file_paths=(
        ["tests/test_file.py", "temp/cache.txt"],
        GithubRepositoryReader.FilterType.EXCLUDE,
    ),
)

Custom File Processing Callback

def process_file_callback(file_path: str, file_size: int) -> tuple[bool, str]:
    """Custom logic to determine if a file should be processed.

    Args:
        file_path: The full path to the file
        file_size: The size of the file in bytes

    Returns:
        Tuple of (should_process: bool, reason: str)
    """
    # Skip large files
    if file_size > 1024 * 1024:  # 1MB
        return False, f"File too large: {file_size} bytes"

    # Skip test files
    if "test" in file_path.lower():
        return False, "Skipping test files"

    # Skip binary files by extension
    binary_extensions = [".exe", ".bin", ".so", ".dylib"]
    if any(file_path.endswith(ext) for ext in binary_extensions):
        return False, "Skipping binary files"

    return True, ""


reader = GithubRepositoryReader(
    github_client=github_client,
    owner="run-llama",
    repo="llama_index",
    process_file_callback=process_file_callback,
    fail_on_error=False,  # Continue processing if callback fails
)

Custom Folder for Temporary Files

from llama_index.core.readers.base import BaseReader


# Custom parser for specific file types
class CustomMarkdownParser(BaseReader):
    def load_data(self, file_path, extra_info=None):
        # Custom parsing logic here
        pass


reader = GithubRepositoryReader(
    github_client=github_client,
    owner="run-llama",
    repo="llama_index",
    use_parser=True,
    custom_parsers={".md": CustomMarkdownParser()},
    custom_folder="/tmp/github_processing",  # Custom temp directory
)

Event System Integration

The reader integrates with LlamaIndex's instrumentation system to provide detailed events during processing:

from llama_index.core.instrumentation import get_dispatcher
from llama_index.core.instrumentation.event_handlers import BaseEventHandler
from llama_index.readers.github.repository.event import (
    GitHubFileProcessedEvent,
    GitHubFileSkippedEvent,
    GitHubFileFailedEvent,
    GitHubRepositoryProcessingStartedEvent,
    GitHubRepositoryProcessingCompletedEvent,
)


class GitHubEventHandler(BaseEventHandler):
    def handle(self, event):
        if isinstance(event, GitHubRepositoryProcessingStartedEvent):
            print(f"Started processing repository: {event.repository_name}")
        elif isinstance(event, GitHubFileProcessedEvent):
            print(
                f"Processed file: {event.file_path} ({event.file_size} bytes)"
            )
        elif isinstance(event, GitHubFileSkippedEvent):
            print(f"Skipped file: {event.file_path} - {event.reason}")
        elif isinstance(event, GitHubFileFailedEvent):
            print(f"Failed to process file: {event.file_path} - {event.error}")
        elif isinstance(event, GitHubRepositoryProcessingCompletedEvent):
            print(
                f"Completed processing. Total documents: {event.total_documents}"
            )


# Register the event handler
dispatcher = get_dispatcher()
handler = GitHubEventHandler()
dispatcher.add_event_handler(handler)

# Use the reader - events will be automatically dispatched
reader = GithubRepositoryReader(
    github_client=github_client,
    owner="run-llama",
    repo="llama_index",
)
documents = reader.load_data(branch="main")

Available Events

The following events are dispatched during repository processing:

GitHubRepositoryProcessingStartedEvent: Fired when repository processing begins
- repository_name: Name of the repository (owner/repo)
- branch_or_commit: Branch name or commit SHA being processed
GitHubRepositoryProcessingCompletedEvent: Fired when repository processing completes
- repository_name: Name of the repository
- branch_or_commit: Branch name or commit SHA
- total_documents: Number of documents created
GitHubTotalFilesToProcessEvent: Fired with the total count of files to be processed
- repository_name: Name of the repository
- branch_or_commit: Branch name or commit SHA
- total_files: Total number of files found
GitHubFileProcessingStartedEvent: Fired when individual file processing starts
- file_path: Path to the file being processed
- file_type: File extension
GitHubFileProcessedEvent: Fired when a file is successfully processed
- file_path: Path to the processed file
- file_type: File extension
- file_size: Size of the file in bytes
- document: The created Document object
GitHubFileSkippedEvent: Fired when a file is skipped
- file_path: Path to the skipped file
- file_type: File extension
- reason: Reason why the file was skipped
GitHubFileFailedEvent: Fired when file processing fails
- file_path: Path to the failed file
- file_type: File extension
- error: Error message describing the failure

Issues Reader

from llama_index.readers.github import (
    GitHubRepositoryIssuesReader,
    GitHubIssuesClient,
)

github_client = GitHubIssuesClient(github_token=github_token, verbose=True)

reader = GitHubRepositoryIssuesReader(
    github_client=github_client,
    owner="moncho",
    repo="dry",
    verbose=True,
)

documents = reader.load_data(
    state=GitHubRepositoryIssuesReader.IssueState.ALL,
    labelFilters=[("bug", GitHubRepositoryIssuesReader.FilterType.INCLUDE)],
)

Collaborators Reader

from llama_index.readers.github import (
    GitHubRepositoryCollaboratorsReader,
    GitHubCollaboratorsClient,
)

github_client = GitHubCollaboratorsClient(
    github_token=github_token, verbose=True
)

reader = GitHubRepositoryCollaboratorsReader(
    github_client=github_client,
    owner="moncho",
    repo="dry",
    verbose=True,
)

documents = reader.load_data()

GitHub App Setup Guide

To create and configure a GitHub App for authentication:

1. Create a GitHub App

Go to your GitHub account settings → Developer settings → GitHub Apps → New GitHub App
Fill in the required information:
- GitHub App name: Choose a unique name (e.g., "My LlamaIndex Reader")
- Homepage URL: Your application or organization URL
- Webhook: Uncheck "Active" (not needed for this use case)

2. Set Permissions

Under Repository permissions, set:

Contents: Read-only (to read repository files)
Metadata: Read-only (required automatically)
Issues: Read-only (if using Issues reader)
Pull requests: Read-only (issues endpoint includes PRs)

3. Install the App

After creating the app, note your App ID (shown at the top)
Generate a private key:
- Scroll down to "Private keys"
- Click "Generate a private key"
- Save the downloaded .pem file securely
Install the app:
- Click "Install App" in the left sidebar
- Choose the account/organization
- Select specific repositories or all repositories
- Complete installation

4. Get Installation ID

After installation, you'll be redirected to a URL like:

https://github.com/settings/installations/12345678

The number 12345678 is your installation ID. You can also find it via the API:

curl -H "Authorization: Bearer YOUR_JWT_TOKEN" \
     https://api.github.com/app/installations

5. Use in Code

from llama_index.readers.github import GithubClient, GitHubAppAuth

# Load private key
with open("path/to/your-app-private-key.pem", "r") as f:
    private_key = f.read()

# Create auth handler
app_auth = GitHubAppAuth(
    app_id="YOUR_APP_ID",
    private_key=private_key,
    installation_id="YOUR_INSTALLATION_ID",
)

# Use with any client
client = GithubClient(github_app_auth=app_auth)

Token Management

The GitHubAppAuth class automatically:

Generates JWTs for app authentication
Obtains installation access tokens
Caches tokens (valid for 1 hour)
Refreshes tokens automatically when they expire or are within 5 minutes of expiry

You can manually invalidate a token if needed:

app_auth.invalidate_token()  # Forces refresh on next request

Troubleshooting

"Failed to get installation token: 401"

Verify your App ID is correct
Ensure the private key matches your GitHub App
Check that the app is installed for the target repository

"Failed to get installation token: 404"

Verify the installation ID is correct
Ensure the app installation wasn't uninstalled

"Import PyJWT failed"

Install GitHub App support: pip install llama-index-readers-github[github-app]

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.11.2

Mar 13, 2026

0.11.1

Mar 13, 2026

This version

0.11.0

Mar 12, 2026

0.10.0

Feb 16, 2026

0.9.0

Oct 27, 2025

0.8.2

Sep 12, 2025

0.8.1

Sep 8, 2025

0.8.0

Jul 31, 2025

0.7.0

Jul 30, 2025

0.6.1

May 6, 2025

0.6.0

Feb 28, 2025

0.5.0

Nov 18, 2024

0.4.0

Nov 12, 2024

0.3.0

Nov 12, 2024

0.2.0

Aug 22, 2024

0.1.9

May 8, 2024

0.1.8

Apr 30, 2024

0.1.7

Feb 22, 2024

0.1.6

Feb 21, 2024

0.1.5

Feb 19, 2024

0.1.4

Feb 18, 2024

0.1.3

Feb 18, 2024

0.1.2

Feb 13, 2024

0.1.1

Feb 12, 2024

0.1.0

Feb 10, 2024

0.0.2

Feb 5, 2024

0.0.1

Feb 4, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llama_index_readers_github-0.11.0.tar.gz (25.9 kB view details)

Uploaded Mar 12, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

llama_index_readers_github-0.11.0-py3-none-any.whl (33.9 kB view details)

Uploaded Mar 12, 2026 Python 3

File details

Details for the file llama_index_readers_github-0.11.0.tar.gz.

File metadata

Download URL: llama_index_readers_github-0.11.0.tar.gz
Upload date: Mar 12, 2026
Size: 25.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for llama_index_readers_github-0.11.0.tar.gz
Algorithm	Hash digest
SHA256	`4c377bef0fcafdc888525c88eba8ba6a31ec2fef442b6757ade9b65b0fe6b2ba`
MD5	`97a0533a36197f86e455368c0d57ac31`
BLAKE2b-256	`3d095883d16c1f079f208eed1acf0835c3f25d2939636a42692dbfb77f12eda5`

See more details on using hashes here.

File details

Details for the file llama_index_readers_github-0.11.0-py3-none-any.whl.

File metadata

Download URL: llama_index_readers_github-0.11.0-py3-none-any.whl
Upload date: Mar 12, 2026
Size: 33.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for llama_index_readers_github-0.11.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`499473a8f6f252eaba76942aa0e5acb08c5ba9993e3783be25fba3a42d98b770`
MD5	`836945d294e3c34f21cd2f3f01ecf4bc`
BLAKE2b-256	`b60324695310fd01b2970afcf2a6338526e4787df2b889acd7087ab784800aa3`

See more details on using hashes here.

llama-index-readers-github 0.11.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

LlamaIndex Readers Integration: Github

Authentication

1. Personal Access Token (PAT)

2. GitHub App Authentication

Repository Reader

Basic Usage

Advanced Filtering Options

Filter Specific File Paths

Custom File Processing Callback

Custom Folder for Temporary Files

Event System Integration

Available Events

Issues Reader

Collaborators Reader

GitHub App Setup Guide

1. Create a GitHub App

2. Set Permissions

3. Install the App

4. Get Installation ID

5. Use in Code

Token Management

Troubleshooting

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes