Skip to main content

Python bindings for infiniloom - Repository context engine for LLMs

Project description

Infiniloom Python Bindings

Python bindings for Infiniloom - a repository context engine for Large Language Models.

Installation

pip install infiniloom

Building from Source

git clone https://github.com/Topos-Labs/infiniloom.git
cd infiniloom/bindings/python
pip install maturin
maturin develop  # For development
maturin build --release  # For production wheel

Quick Start

Functional API

import infiniloom

# Pack a repository into Claude-optimized XML
context = infiniloom.pack("/path/to/repo", format="xml", model="claude")
print(context)

# Scan repository and get statistics
stats = infiniloom.scan("/path/to/repo")
print(f"Files: {stats['total_files']}")
print(f"Languages: {stats['languages']}")

# Count tokens for a specific model
tokens = infiniloom.count_tokens("Hello, world!", model="claude")
print(f"Tokens: {tokens}")

Object-Oriented API

from infiniloom import Infiniloom

# Create an Infiniloom instance
loom = Infiniloom("/path/to/repo")

# Get repository statistics
stats = loom.stats()
print(stats)

# Generate repository context
context = loom.pack(format="xml", model="claude", compression="balanced")

# Get repository map with important symbols
repo_map = loom.map(map_budget=2000, max_symbols=50)
for symbol in repo_map['key_symbols']:
    print(f"{symbol['name']} ({symbol['kind']}) in {symbol['file']}")

# Scan for security issues
findings = loom.scan_security()
for finding in findings:
    print(f"{finding['severity']}: {finding['message']} at {finding['file']}:{finding['line']}")

# List all files
files = loom.files()
for file in files:
    print(f"{file['path']} - {file['language']} ({file['tokens']} tokens)")

API Reference

Functions

pack(path, format="xml", model="claude", compression="balanced", map_budget=2000, max_symbols=50)

Pack a repository into an LLM-optimized format.

Parameters:

  • path (str): Path to the repository
  • format (str): Output format - "xml", "markdown", "json", "yaml", "toon", or "plain"
  • model (str): Target model for token counting. Supports:
    • OpenAI GPT-5.x: "gpt-5.2", "gpt-5.2-pro", "gpt-5.1", "gpt-5.1-mini", "gpt-5.1-codex", "gpt-5", "gpt-5-mini", "gpt-5-nano"
    • OpenAI O-series: "o4-mini", "o3", "o3-mini", "o1", "o1-mini", "o1-preview"
    • OpenAI GPT-4: "gpt-4o", "gpt-4o-mini", "gpt-4", "gpt-3.5-turbo"
    • Anthropic: "claude" (default)
    • Google: "gemini"
    • Meta: "llama", "codellama"
    • Others: "deepseek", "mistral", "qwen", "cohere", "grok"
  • compression (str): Compression level - "none", "minimal", "balanced", "aggressive", "extreme", "focused", "semantic"
  • map_budget (int): Token budget for repository map (default: 2000)
  • max_symbols (int): Maximum symbols to include (default: 50)

Returns: str - Formatted repository context

scan(path, include_hidden=False, respect_gitignore=True)

Scan a repository and return statistics.

Parameters:

  • path (str): Path to the repository
  • include_hidden (bool): Include hidden files (default: False)
  • respect_gitignore (bool): Respect .gitignore files (default: True)

Returns: dict - Repository statistics including:

  • name: Repository name
  • path: Absolute path
  • total_files: Number of files
  • total_lines: Total lines of code
  • total_tokens: Token counts for each model
  • languages: Language breakdown
  • branch: Git branch (if available)
  • commit: Git commit hash (if available)

count_tokens(text, model="claude")

Count tokens in text for a specific model.

Parameters:

  • text (str): Text to count tokens for
  • model (str): Target model. Supports all models listed above in pack(), including GPT-5.x series

Returns: int - Number of tokens (exact for OpenAI models via tiktoken, calibrated estimates for others)

semantic_compress(text, similarity_threshold=0.7, budget_ratio=0.5)

Compress text using semantic compression while preserving important content.

Parameters:

  • text (str): Text to compress
  • similarity_threshold (float): Threshold for grouping similar chunks (0.0-1.0, default: 0.7)
  • budget_ratio (float): Target size as ratio of original (0.0-1.0, default: 0.5)

Returns: str - Compressed text

import infiniloom

long_text = "... your long text content ..."
compressed = infiniloom.semantic_compress(long_text, budget_ratio=0.3)
print(compressed)

scan_security(path)

Scan repository for security issues.

Parameters:

  • path (str): Path to the repository

Returns: list[dict] - List of security findings with:

  • file: File path
  • line: Line number
  • severity: Severity level ("Critical", "High", "Medium", "Low", "Info")
  • kind: Type of finding (e.g., "aws_access_key", "github_token")
  • pattern: The matched pattern

is_git_repo(path)

Check if a path is a git repository.

Parameters:

  • path (str): Path to check

Returns: bool - True if path is a git repository, False otherwise

from infiniloom import is_git_repo

if is_git_repo("/path/to/repo"):
    print("This is a git repository")

Classes

Infiniloom(path)

Object-oriented interface for repository analysis.

Methods:

load(include_hidden=False, respect_gitignore=True)

Load the repository into memory.

stats()

Get repository statistics. Returns same structure as scan() function.

pack(format="xml", model="claude", compression="balanced", map_budget=2000)

Pack the repository. Returns formatted string.

map(map_budget=2000, max_symbols=50)

Get repository map with key symbols. Returns dict with:

  • summary: Text summary
  • token_count: Estimated tokens
  • key_symbols: List of important symbols
scan_security()

Scan for security issues. Returns list of findings.

files()

Get list of all files. Returns list of dicts with file metadata.

GitRepo(path)

Git repository wrapper for accessing git operations like status, diff, log, and blame.

Constructor:

  • path (str): Path to the git repository

Raises: InfiniloomError if path is not a git repository

Methods:

current_branch()

Get the current branch name.

Returns: str - Current branch name (e.g., "main", "feature/xyz")

current_commit()

Get the current commit hash.

Returns: str - Full SHA-1 hash of HEAD commit (40 characters)

status()

Get working tree status (both staged and unstaged changes).

Returns: list[dict] - List of file status objects with:

  • path: File path
  • status: Status type ("Added", "Modified", "Deleted", "Renamed", "Copied", "Unknown")
  • old_path: Old path for renames (optional)
log(count=10)

Get recent commits.

Parameters:

  • count (int): Maximum number of commits to return (default: 10)

Returns: list[dict] - List of commit objects with:

  • hash: Full commit hash
  • short_hash: Short commit hash (7 characters)
  • author: Author name
  • email: Author email
  • date: Commit date (ISO 8601 format)
  • message: Commit message (first line)
file_log(path, count=10)

Get commits that modified a specific file.

Parameters:

  • path (str): File path relative to repo root
  • count (int): Maximum number of commits to return (default: 10)

Returns: list[dict] - List of commits that modified the file

blame(path)

Get blame information for a file.

Parameters:

  • path (str): File path relative to repo root

Returns: list[dict] - List of blame line objects with:

  • commit: Commit hash that introduced the line
  • author: Author who wrote the line
  • date: Date when line was written
  • line_number: Line number (1-indexed)
ls_files()

Get list of files tracked by git.

Returns: list[str] - Array of file paths tracked by git

diff_files(from_ref, to_ref)

Get files changed between two commits.

Parameters:

  • from_ref (str): Starting commit/branch/tag
  • to_ref (str): Ending commit/branch/tag

Returns: list[dict] - List of changed files with:

  • path: File path
  • status: Status ("Added", "Modified", "Deleted", "Renamed", "Copied")
  • additions: Number of lines added
  • deletions: Number of lines deleted
uncommitted_diff(path)

Get diff content for uncommitted changes in a file.

Parameters:

  • path (str): File path relative to repo root

Returns: str - Unified diff content

all_uncommitted_diffs()

Get diff for all uncommitted changes.

Returns: str - Combined unified diff for all changed files

has_changes(path)

Check if a file has uncommitted changes.

Parameters:

  • path (str): File path relative to repo root

Returns: bool - True if file has changes

last_modified_commit(path)

Get the last commit that modified a file.

Parameters:

  • path (str): File path relative to repo root

Returns: dict - Commit information object

file_change_frequency(path, days=30)

Get file change frequency in recent days.

Parameters:

  • path (str): File path relative to repo root
  • days (int): Number of days to look back (default: 30)

Returns: int - Number of commits that modified the file in the period

file_at_ref(path, git_ref)

Get file content at a specific git ref (commit, branch, tag).

Parameters:

  • path (str): File path relative to repo root
  • git_ref (str): Git ref (commit hash, branch name, tag, HEAD~n, etc.)

Returns: str - File content

repo = GitRepo("/path/to/repo")
old_version = repo.file_at_ref("src/main.py", "HEAD~5")
main_version = repo.file_at_ref("src/main.py", "main")
diff_hunks(from_ref, to_ref, path=None)

Parse diff between two refs into structured hunks with line-level changes. Useful for PR review tools that need to post comments at specific lines.

Parameters:

  • from_ref (str): Starting ref (e.g., "main", "HEAD~5", commit hash)
  • to_ref (str): Ending ref (e.g., "HEAD", "feature-branch")
  • path (str, optional): File path to filter to a single file

Returns: list[dict] - List of diff hunks with:

  • old_start: Starting line in old file
  • old_count: Number of lines in old file
  • new_start: Starting line in new file
  • new_count: Number of lines in new file
  • header: Hunk header
  • lines: List of line dicts with change_type, old_line, new_line, content
repo = GitRepo("/path/to/repo")
hunks = repo.diff_hunks("main", "HEAD", "src/index.py")
for hunk in hunks:
    print(f"Hunk at old:{hunk['old_start']} new:{hunk['new_start']}")
    for line in hunk['lines']:
        print(f"{line['change_type']}: {line['content']}")
uncommitted_hunks(path=None)

Parse uncommitted changes (working tree vs HEAD) into structured hunks.

Parameters:

  • path (str, optional): File path to filter to a single file

Returns: list[dict] - List of diff hunks for uncommitted changes

staged_hunks(path=None)

Parse staged changes into structured hunks.

Parameters:

  • path (str, optional): File path to filter to a single file

Returns: list[dict] - List of diff hunks for staged changes only

Example:

from infiniloom import GitRepo, is_git_repo

# Check if path is a git repo first
if is_git_repo("/path/to/repo"):
    repo = GitRepo("/path/to/repo")

    # Get current state
    print(f"Branch: {repo.current_branch()}")
    print(f"Commit: {repo.current_commit()}")

    # Get recent commits
    for commit in repo.log(count=5):
        print(f"{commit['short_hash']}: {commit['message']}")

    # Get file history
    for commit in repo.file_log("src/main.py", count=3):
        print(f"{commit['date']}: {commit['message']}")

    # Get blame information
    for line in repo.blame("src/main.py")[:10]:
        print(f"Line {line['line_number']}: {line['author']}")

    # Check for uncommitted changes
    if repo.has_changes("src/main.py"):
        diff = repo.uncommitted_diff("src/main.py")
        print(diff)

Async Functions

Infiniloom provides async versions of the main functions for use in async/await contexts. These use a thread pool executor to avoid blocking the event loop.

import asyncio
import infiniloom

async def main():
    # Pack repository asynchronously
    context = await infiniloom.pack_async("/path/to/repo", format="xml", model="claude")

    # Scan repository asynchronously
    stats = await infiniloom.scan_async("/path/to/repo")

    # Count tokens asynchronously
    tokens = await infiniloom.count_tokens_async("Hello, world!", model="claude")

    # Scan security asynchronously
    findings = await infiniloom.scan_security_async("/path/to/repo")

    # Semantic compress asynchronously
    compressed = await infiniloom.semantic_compress_async(long_text, budget_ratio=0.3)

asyncio.run(main())

Available Async Functions

  • pack_async(path, format="xml", model="claude", compression="balanced", ...) - Async pack
  • scan_async(path, include_hidden=False, respect_gitignore=True) - Async scan
  • count_tokens_async(text, model="claude") - Async token counting
  • scan_security_async(path) - Async security scanning
  • semantic_compress_async(text, similarity_threshold=0.7, budget_ratio=0.5) - Async compression

Formats

XML (Claude-optimized)

Best for Claude models. Uses XML structure that Claude understands well.

context = infiniloom.pack("/path/to/repo", format="xml", model="claude")

Markdown (GPT-optimized)

Best for GPT models. Uses Markdown with clear hierarchical structure.

context = infiniloom.pack("/path/to/repo", format="markdown", model="gpt")

JSON

Generic JSON format for programmatic processing.

context = infiniloom.pack("/path/to/repo", format="json")

YAML (Gemini-optimized)

Best for Gemini. Query should be placed at the end.

context = infiniloom.pack("/path/to/repo", format="yaml", model="gemini")

TOON (Token-Efficient)

Most token-efficient format (~40% smaller than JSON). Best for limited context windows.

context = infiniloom.pack("/path/to/repo", format="toon")

Compression Levels

  • none: No compression (0% reduction)
  • minimal: Remove empty lines, trim whitespace (15% reduction)
  • balanced: Remove comments, normalize whitespace (35% reduction) - Default
  • aggressive: Remove docstrings, keep signatures only (60% reduction)
  • extreme: Key symbols only (80% reduction)
  • focused: Key symbols with small context (75% reduction)
  • semantic: Heuristic semantic compression (~60-70% reduction)

Integration Examples

With Anthropic Claude

import infiniloom
import anthropic

# Generate context
context = infiniloom.pack(
    "/path/to/repo",
    format="xml",
    model="claude",
    compression="balanced"
)

# Send to Claude
client = anthropic.Anthropic()
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=4096,
    messages=[{
        "role": "user",
        "content": f"{context}\n\nExplain the architecture of this codebase."
    }]
)
print(response.content[0].text)

With OpenAI GPT

import infiniloom
import openai

context = infiniloom.pack("/path/to/repo", format="markdown", model="gpt")

client = openai.OpenAI()
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{
        "role": "user",
        "content": f"{context}\n\nWhat are the main components?"
    }]
)
print(response.choices[0].message.content)

With Google Gemini

import infiniloom
import google.generativeai as genai

context = infiniloom.pack("/path/to/repo", format="yaml", model="gemini")

genai.configure(api_key="YOUR_API_KEY")
model = genai.GenerativeModel("gemini-1.5-pro")
response = model.generate_content(f"{context}\n\nSummarize this codebase")
print(response.text)

Advanced Usage

Custom Token Budget

from infiniloom import Infiniloom

loom = Infiniloom("/large/repo")

# Generate smaller context for models with limited context windows
compact_map = loom.map(map_budget=1000, max_symbols=25)

# Generate larger context for models with large context windows
detailed_map = loom.map(map_budget=5000, max_symbols=200)

Security Scanning

from infiniloom import Infiniloom

loom = Infiniloom("/path/to/repo")
findings = loom.scan_security()

# Filter by severity
critical = [f for f in findings if f['severity'] == 'Critical']
high = [f for f in findings if f['severity'] == 'High']

print(f"Critical: {len(critical)}, High: {len(high)}")

for finding in critical:
    print(f"{finding['file']}:{finding['line']}")
    print(f"  {finding['category']}: {finding['message']}")

File Filtering

from infiniloom import Infiniloom

loom = Infiniloom("/path/to/repo")
files = loom.files()

# Get Python files only
python_files = [f for f in files if f['language'] == 'python']

# Get high-importance files
important_files = [f for f in files if f['importance'] > 0.7]

# Get large files
large_files = [f for f in files if f['tokens'] > 1000]

Performance

Infiniloom is built in Rust for maximum performance:

  • Fast scanning: Parallel file processing with ignore patterns
  • Memory efficient: Streaming processing, optional content loading
  • Native speed: No Python overhead for core operations

Requirements

  • Python 3.8+
  • Rust 1.91+ (for building from source)

License

MIT License - see LICENSE for details.

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

infiniloom-0.3.3.tar.gz (273.1 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

infiniloom-0.3.3-pp310-pypy310_pp73-manylinux_2_28_aarch64.whl (7.9 MB view details)

Uploaded PyPymanylinux: glibc 2.28+ ARM64

infiniloom-0.3.3-pp39-pypy39_pp73-manylinux_2_28_aarch64.whl (7.9 MB view details)

Uploaded PyPymanylinux: glibc 2.28+ ARM64

infiniloom-0.3.3-pp38-pypy38_pp73-manylinux_2_28_aarch64.whl (7.9 MB view details)

Uploaded PyPymanylinux: glibc 2.28+ ARM64

infiniloom-0.3.3-pp37-pypy37_pp73-manylinux_2_28_aarch64.whl (7.9 MB view details)

Uploaded PyPymanylinux: glibc 2.28+ ARM64

infiniloom-0.3.3-cp38-abi3-win_amd64.whl (8.0 MB view details)

Uploaded CPython 3.8+Windows x86-64

infiniloom-0.3.3-cp38-abi3-manylinux_2_28_aarch64.whl (7.9 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.28+ ARM64

infiniloom-0.3.3-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (8.1 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ x86-64

infiniloom-0.3.3-cp38-abi3-macosx_11_0_arm64.whl (8.2 MB view details)

Uploaded CPython 3.8+macOS 11.0+ ARM64

infiniloom-0.3.3-cp38-abi3-macosx_10_12_x86_64.whl (8.0 MB view details)

Uploaded CPython 3.8+macOS 10.12+ x86-64

File details

Details for the file infiniloom-0.3.3.tar.gz.

File metadata

  • Download URL: infiniloom-0.3.3.tar.gz
  • Upload date:
  • Size: 273.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for infiniloom-0.3.3.tar.gz
Algorithm Hash digest
SHA256 a9d03cbfbc5941e0be95f843dff2046d5e85559290179a8cdee5d3e92e6fc7ae
MD5 cb7448f08b6464c1d35e7f71af311c13
BLAKE2b-256 ce171acf4e699800c01813a50fcf0fe53675eec557f55a68b9aca6646dcc3cb4

See more details on using hashes here.

File details

Details for the file infiniloom-0.3.3-pp310-pypy310_pp73-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for infiniloom-0.3.3-pp310-pypy310_pp73-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 e4a913613d466c3a6e8b61c2aeb23c147ce907c603abedfa30585c7aa60b8642
MD5 3974323fe9505d77990aa0248d8cb2d5
BLAKE2b-256 5f37846827ee97f73d46e3a2b107f189ee69c5e18733e2092601a0f0f3f5cd86

See more details on using hashes here.

File details

Details for the file infiniloom-0.3.3-pp39-pypy39_pp73-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for infiniloom-0.3.3-pp39-pypy39_pp73-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 aabfb59119f8f4108af6ddd6b6f8f2eefa79bc2a7ea2d5505266ddbaa4748d4b
MD5 21dfdf01231acb20bc2acb05ec8b9b4f
BLAKE2b-256 de5ffaf03158b7e0b99018591007577637342bf938aa9b7f720e683454ed2619

See more details on using hashes here.

File details

Details for the file infiniloom-0.3.3-pp38-pypy38_pp73-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for infiniloom-0.3.3-pp38-pypy38_pp73-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 bf6c527d5138e12565ae7da119c48ef01ee28a301488f98b727140d20793e810
MD5 72c8947e032e87f94cb70acaba31cae1
BLAKE2b-256 508adb5760278ae720c16bedd9cce8407860315f21465afa79f9e1040de748fd

See more details on using hashes here.

File details

Details for the file infiniloom-0.3.3-pp37-pypy37_pp73-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for infiniloom-0.3.3-pp37-pypy37_pp73-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 5053aec61ce82b4daa5658ecc4f21a0bd37d4af1a56f8221798d327f6f36cccc
MD5 774b320083050408f19147da821fce72
BLAKE2b-256 200ab65e7c429ac61f7ecaff80c8b788f7b6347dbca7a0b13ae8fd4c82b9206b

See more details on using hashes here.

File details

Details for the file infiniloom-0.3.3-cp38-abi3-win_amd64.whl.

File metadata

  • Download URL: infiniloom-0.3.3-cp38-abi3-win_amd64.whl
  • Upload date:
  • Size: 8.0 MB
  • Tags: CPython 3.8+, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for infiniloom-0.3.3-cp38-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 3f774ae232f580f9a816062e38276d3996e05c64450d75d840191eb714b31f9d
MD5 b8938d4578292074c8225b45cf090396
BLAKE2b-256 ec3880f97c0264ca4b30c9c863992b265e33215127e9fc15640e750d15f10c40

See more details on using hashes here.

File details

Details for the file infiniloom-0.3.3-cp38-abi3-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for infiniloom-0.3.3-cp38-abi3-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 62763a60275f7bca5a7eb287853b8653e60b1d6f93869910517b958f13ca92a3
MD5 bcf1912716d578a9ac387323545e5fc5
BLAKE2b-256 f751c4bace195c88ab26763e5cfce4d72f94caf2b3c9234dc933f5d00347b8cb

See more details on using hashes here.

File details

Details for the file infiniloom-0.3.3-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for infiniloom-0.3.3-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 1b3c44879d86978b17b057b790837e1c32469df5b01f0ecacd18c7153231f7c2
MD5 96698d74ee7e21a9adfab3646b3c2b73
BLAKE2b-256 8115409d2a7633bfa8ff1627fb42da8b86e27500c69b1f7fa86ef3259accbe2f

See more details on using hashes here.

File details

Details for the file infiniloom-0.3.3-cp38-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for infiniloom-0.3.3-cp38-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 50eb489b4ff5f4651e31b73ae046e4eab762184c480a6df02df08d20d8662abb
MD5 a75f8434765dd8f8f9a1857327320f5b
BLAKE2b-256 e7872918992fd2c29f8520a4ee9f39d9be592f2d3fa47ddd5469a09605632479

See more details on using hashes here.

File details

Details for the file infiniloom-0.3.3-cp38-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for infiniloom-0.3.3-cp38-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 6e7eed94ddf1152d875c3707955d717e6b9834a04acaf181c226509cc42fab0a
MD5 539330fecb17ec41c822781de5afd032
BLAKE2b-256 1fbfde67738115b43fec89d7697ad18305a4963201ed1451225ae84ff1de01b0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page