Extract repository contents into formatted text for LLM context
Project description
repocontext
Extract repository contents into formatted text for LLM context.
A Python library that fetches GitHub repositories, builds hierarchical file trees, and generates formatted text output with directory structure and file contents including token counts for LLM context limits.
Features
- GitHub Support: Fetch public and private repositories
- Async Operations: Efficient file fetching with concurrency control
- Token Counting: Accurate GPT token counting via tiktoken
- Extensible: Easy to add new providers (GitLab, Azure DevOps, etc.)
- Structured Output: Directory trees with file contents in markdown
Installation
pip install repocontext
Quick Start
from repocontext import fetch
# Simple usage - synchronous
result = fetch("https://github.com/owner/repo")
print(result.markdown)
print(f"Tokens: {result.token_count}")
# With file contents
result = fetch("https://github.com/owner/repo", token="ghp_xxx", fetch_content=True)
print(result.markdown)
Usage Examples
Using the Provider Directly
import asyncio
from repocontext import GitHubProvider, Formatter, build_tree
async def main():
provider = GitHubProvider()
# Fetch repository
result = await provider.get_repository(
"https://github.com/owner/repo",
token=None,
fetch_content=True
)
print(result.directory_tree)
print(f"Found {result.file_count} files")
print(f"Total tokens: {result.token_count}")
asyncio.run(main())
Filtering Files
import asyncio
from repocontext import GitHubProvider
async def main():
provider = GitHubProvider()
nodes = await provider.fetch_tree("https://github.com/owner/repo")
# Get only Python files
py_files = [n for n in nodes if n.is_file() and n.get_extension() == ".py"]
# Get files from specific directory
src_files = [n for n in nodes if n.path.startswith("src/")]
# Get files larger than 1KB
large_files = [n for n in nodes if n.is_file() and n.size and n.size > 1024]
asyncio.run(main())
Building Trees and Formatting
from repocontext import build_tree, Formatter, FileNode, TreeNode
# Build tree from flat nodes
tree = build_tree(
nodes,
selected_paths={n.path for n in selected_files},
excluded_paths=set(),
expanded_paths={n.path for n in nodes if n.is_directory()},
)
# Format as markdown
markdown = Formatter.format_markdown(tree, contents)
API Reference
Main Function
from repocontext import fetch
result = fetch(
url="https://github.com/owner/repo", # Required
token=None, # Optional GitHub token for private repos
fetch_content=False # Set True to include file contents
)
Returns RepositoryResult with:
url- The repository URLbranch- The resolved branch namefiles- List of FileNode objectsdirectories- List of directory pathscontents- List of FileContent objects (when fetch_content=True)markdown- Full markdown outputdirectory_tree- ASCII tree representationtoken_count- Total token countline_count- Total line countfile_count- Number of filesstats- Statistics dictionary
Providers
GitHubProvider
from repocontext import GitHubProvider
provider = GitHubProvider()
# Set credentials (optional for public repos)
provider.set_credentials("ghp_your_token_here")
# Get full repository result
result = await provider.get_repository(
url="https://github.com/owner/repo",
token=None,
fetch_content=False,
branch=None # Optional branch override
)
# Fetch tree only
nodes = await provider.fetch_tree(url, branch="main", path="src")
# Fetch multiple files with concurrency
async for content in provider.fetch_multiple(file_nodes):
print(content.path, len(content.text))
Types
FileNode
from repocontext import FileNode
node = FileNode(path="src/main.py", type="blob", size=1024, sha="abc123")
node.is_file() # True
node.is_directory() # False
node.get_name() # "main.py"
node.get_extension() # ".py"
TreeNode
from repocontext import TreeNode
node = TreeNode(
name="src",
path="src",
type="directory",
children=[...],
selected=True
)
node.is_file() # False
node.is_directory() # True
FileContent
from repocontext import FileContent
content = FileContent(
path="src/main.py",
text="print('hello')",
url="https://...",
line_count=1,
token_count=3
)
Formatter
from repocontext import Formatter
# Count tokens
tokens = Formatter.count_tokens("hello world")
# Format project tree
tree_str = Formatter.format_project_tree(tree_nodes)
# Format as markdown
markdown = Formatter.format_markdown(tree, contents)
Tree Building
from repocontext import build_tree, extract_directories
# Build hierarchical tree from flat nodes
tree = build_tree(
nodes,
selected_paths=set_of_selected_paths,
excluded_paths=set_of_excluded_paths,
expanded_paths=set_of_expanded_directories
)
# Extract all directory paths
dirs = extract_directories(nodes)
Exception Handling
from repocontext import (
InvalidURLError,
AuthenticationError,
NotFoundError,
RateLimitError,
NetworkError,
)
try:
result = fetch("https://github.com/owner/repo")
except InvalidURLError as e:
print(f"Invalid URL: {e.user_message}")
except AuthenticationError as e:
print(f"Auth failed: {e.user_message}")
except RateLimitError as e:
print(f"Rate limited: {e.user_message}")
except NetworkError as e:
print(f"Network error: {e.user_message}")
Extending the Package
Adding a New Provider
Create a new provider by extending BaseProvider:
from repocontext.providers import BaseProvider, register_provider
@register_provider("gitlab")
class GitLabProvider(BaseProvider):
API_BASE = "https://gitlab.com/api/v4"
@property
def get_type(self) -> str:
return "gitlab"
@property
def get_name(self) -> str:
return "GitLab"
def requires_auth(self) -> bool:
return True
def validate_url(self, url: str) -> bool:
return url.startswith("https://gitlab.com/")
def parse_url(self, url: str) -> ParsedRepoInfo:
# Parse the URL and return ParsedRepoInfo
...
async def _fetch_tree(self, url: str, **options) -> list[FileNode]:
# Fetch the repository tree
...
async def _fetch_file_content(self, node: FileNode) -> FileContent:
# Fetch a single file's content
...
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file repo_context_lib-0.2.1.tar.gz.
File metadata
- Download URL: repo_context_lib-0.2.1.tar.gz
- Upload date:
- Size: 857.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2ba22fda4fd8d55cc8eca433571ca0f3a20854b99b90b4710447aec064f714ab
|
|
| MD5 |
a49481245a2cd4157e2dce9c8112cdfb
|
|
| BLAKE2b-256 |
9f006150a6415db5617a749b50d20e78c9b31407e45257deee4899452d204b2a
|
File details
Details for the file repo_context_lib-0.2.1-py3-none-any.whl.
File metadata
- Download URL: repo_context_lib-0.2.1-py3-none-any.whl
- Upload date:
- Size: 17.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ecf55cf4bd55bd7f6a1ea4b366e80c3a28eca5bccf1bbeb2a2179bac586b78f3
|
|
| MD5 |
b5198ff8211d3bf1484313256f10a586
|
|
| BLAKE2b-256 |
1324c75677bda2e04528f08f07ee111fd180726b5ac93845414ac63a7893c821
|