Skip to main content

Python bindings for infiniloom - Repository context engine for LLMs

Project description

Infiniloom Python Bindings

Python bindings for Infiniloom - a repository context engine for Large Language Models.

Installation

pip install infiniloom

Building from Source

git clone https://github.com/Topos-Labs/infiniloom.git
cd infiniloom/bindings/python
pip install maturin
maturin develop  # For development
maturin build --release  # For production wheel

Quick Start

Functional API

import infiniloom

# Pack a repository into Claude-optimized XML
context = infiniloom.pack("/path/to/repo", format="xml", model="claude")
print(context)

# Scan repository and get statistics
stats = infiniloom.scan("/path/to/repo")
print(f"Files: {stats['total_files']}")
print(f"Languages: {stats['languages']}")

# Count tokens for a specific model
tokens = infiniloom.count_tokens("Hello, world!", model="claude")
print(f"Tokens: {tokens}")

Object-Oriented API

from infiniloom import Infiniloom

# Create an Infiniloom instance
loom = Infiniloom("/path/to/repo")

# Get repository statistics
stats = loom.stats()
print(stats)

# Generate repository context
context = loom.pack(format="xml", model="claude", compression="balanced")

# Get repository map with important symbols
repo_map = loom.map(map_budget=2000, max_symbols=50)
for symbol in repo_map['key_symbols']:
    print(f"{symbol['name']} ({symbol['kind']}) in {symbol['file']}")

# Scan for security issues
findings = loom.scan_security()
for finding in findings:
    print(f"{finding['severity']}: {finding['message']} at {finding['file']}:{finding['line']}")

# List all files
files = loom.files()
for file in files:
    print(f"{file['path']} - {file['language']} ({file['tokens']} tokens)")

API Reference

Functions

pack(path, format="xml", model="claude", compression="balanced", map_budget=2000, max_symbols=50)

Pack a repository into an LLM-optimized format.

Parameters:

  • path (str): Path to the repository
  • format (str): Output format - "xml", "markdown", "json", "yaml", "toon", or "plain"
  • model (str): Target model for token counting. Supports:
    • OpenAI GPT-5.x: "gpt-5.2", "gpt-5.2-pro", "gpt-5.1", "gpt-5.1-mini", "gpt-5.1-codex", "gpt-5", "gpt-5-mini", "gpt-5-nano"
    • OpenAI O-series: "o4-mini", "o3", "o3-mini", "o1", "o1-mini", "o1-preview"
    • OpenAI GPT-4: "gpt-4o", "gpt-4o-mini", "gpt-4", "gpt-3.5-turbo"
    • Anthropic: "claude" (default)
    • Google: "gemini"
    • Meta: "llama", "codellama"
    • Others: "deepseek", "mistral", "qwen", "cohere", "grok"
  • compression (str): Compression level - "none", "minimal", "balanced", "aggressive", "extreme", "focused", "semantic"
  • map_budget (int): Token budget for repository map (default: 2000)
  • max_symbols (int): Maximum symbols to include (default: 50)

Returns: str - Formatted repository context

scan(path, include_hidden=False, respect_gitignore=True)

Scan a repository and return statistics.

Parameters:

  • path (str): Path to the repository
  • include_hidden (bool): Include hidden files (default: False)
  • respect_gitignore (bool): Respect .gitignore files (default: True)

Returns: dict - Repository statistics including:

  • name: Repository name
  • path: Absolute path
  • total_files: Number of files
  • total_lines: Total lines of code
  • total_tokens: Token counts for each model
  • languages: Language breakdown
  • branch: Git branch (if available)
  • commit: Git commit hash (if available)

count_tokens(text, model="claude")

Count tokens in text for a specific model.

Parameters:

  • text (str): Text to count tokens for
  • model (str): Target model. Supports all models listed above in pack(), including GPT-5.x series

Returns: int - Number of tokens (exact for OpenAI models via tiktoken, calibrated estimates for others)

semantic_compress(text, similarity_threshold=0.7, budget_ratio=0.5)

Compress text using semantic compression while preserving important content.

Parameters:

  • text (str): Text to compress
  • similarity_threshold (float): Threshold for grouping similar chunks (0.0-1.0, default: 0.7)
  • budget_ratio (float): Target size as ratio of original (0.0-1.0, default: 0.5)

Returns: str - Compressed text

import infiniloom

long_text = "... your long text content ..."
compressed = infiniloom.semantic_compress(long_text, budget_ratio=0.3)
print(compressed)

scan_security(path)

Scan repository for security issues.

Parameters:

  • path (str): Path to the repository

Returns: list[dict] - List of security findings with:

  • file: File path
  • line: Line number
  • severity: Severity level ("Critical", "High", "Medium", "Low", "Info")
  • kind: Type of finding (e.g., "aws_access_key", "github_token")
  • pattern: The matched pattern

is_git_repo(path)

Check if a path is a git repository.

Parameters:

  • path (str): Path to check

Returns: bool - True if path is a git repository, False otherwise

from infiniloom import is_git_repo

if is_git_repo("/path/to/repo"):
    print("This is a git repository")

Classes

Infiniloom(path)

Object-oriented interface for repository analysis.

Methods:

load(include_hidden=False, respect_gitignore=True)

Load the repository into memory.

stats()

Get repository statistics. Returns same structure as scan() function.

pack(format="xml", model="claude", compression="balanced", map_budget=2000)

Pack the repository. Returns formatted string.

map(map_budget=2000, max_symbols=50)

Get repository map with key symbols. Returns dict with:

  • summary: Text summary
  • token_count: Estimated tokens
  • key_symbols: List of important symbols
scan_security()

Scan for security issues. Returns list of findings.

files()

Get list of all files. Returns list of dicts with file metadata.

GitRepo(path)

Git repository wrapper for accessing git operations like status, diff, log, and blame.

Constructor:

  • path (str): Path to the git repository

Raises: InfiniloomError if path is not a git repository

Methods:

current_branch()

Get the current branch name.

Returns: str - Current branch name (e.g., "main", "feature/xyz")

current_commit()

Get the current commit hash.

Returns: str - Full SHA-1 hash of HEAD commit (40 characters)

status()

Get working tree status (both staged and unstaged changes).

Returns: list[dict] - List of file status objects with:

  • path: File path
  • status: Status type ("Added", "Modified", "Deleted", "Renamed", "Copied", "Unknown")
  • old_path: Old path for renames (optional)
log(count=10)

Get recent commits.

Parameters:

  • count (int): Maximum number of commits to return (default: 10)

Returns: list[dict] - List of commit objects with:

  • hash: Full commit hash
  • short_hash: Short commit hash (7 characters)
  • author: Author name
  • email: Author email
  • date: Commit date (ISO 8601 format)
  • message: Commit message (first line)
file_log(path, count=10)

Get commits that modified a specific file.

Parameters:

  • path (str): File path relative to repo root
  • count (int): Maximum number of commits to return (default: 10)

Returns: list[dict] - List of commits that modified the file

blame(path)

Get blame information for a file.

Parameters:

  • path (str): File path relative to repo root

Returns: list[dict] - List of blame line objects with:

  • commit: Commit hash that introduced the line
  • author: Author who wrote the line
  • date: Date when line was written
  • line_number: Line number (1-indexed)
ls_files()

Get list of files tracked by git.

Returns: list[str] - Array of file paths tracked by git

diff_files(from_ref, to_ref)

Get files changed between two commits.

Parameters:

  • from_ref (str): Starting commit/branch/tag
  • to_ref (str): Ending commit/branch/tag

Returns: list[dict] - List of changed files with:

  • path: File path
  • status: Status ("Added", "Modified", "Deleted", "Renamed", "Copied")
  • additions: Number of lines added
  • deletions: Number of lines deleted
uncommitted_diff(path)

Get diff content for uncommitted changes in a file.

Parameters:

  • path (str): File path relative to repo root

Returns: str - Unified diff content

all_uncommitted_diffs()

Get diff for all uncommitted changes.

Returns: str - Combined unified diff for all changed files

has_changes(path)

Check if a file has uncommitted changes.

Parameters:

  • path (str): File path relative to repo root

Returns: bool - True if file has changes

last_modified_commit(path)

Get the last commit that modified a file.

Parameters:

  • path (str): File path relative to repo root

Returns: dict - Commit information object

file_change_frequency(path, days=30)

Get file change frequency in recent days.

Parameters:

  • path (str): File path relative to repo root
  • days (int): Number of days to look back (default: 30)

Returns: int - Number of commits that modified the file in the period

Example:

from infiniloom import GitRepo, is_git_repo

# Check if path is a git repo first
if is_git_repo("/path/to/repo"):
    repo = GitRepo("/path/to/repo")

    # Get current state
    print(f"Branch: {repo.current_branch()}")
    print(f"Commit: {repo.current_commit()}")

    # Get recent commits
    for commit in repo.log(count=5):
        print(f"{commit['short_hash']}: {commit['message']}")

    # Get file history
    for commit in repo.file_log("src/main.py", count=3):
        print(f"{commit['date']}: {commit['message']}")

    # Get blame information
    for line in repo.blame("src/main.py")[:10]:
        print(f"Line {line['line_number']}: {line['author']}")

    # Check for uncommitted changes
    if repo.has_changes("src/main.py"):
        diff = repo.uncommitted_diff("src/main.py")
        print(diff)

Formats

XML (Claude-optimized)

Best for Claude models. Uses XML structure that Claude understands well.

context = infiniloom.pack("/path/to/repo", format="xml", model="claude")

Markdown (GPT-optimized)

Best for GPT models. Uses Markdown with clear hierarchical structure.

context = infiniloom.pack("/path/to/repo", format="markdown", model="gpt")

JSON

Generic JSON format for programmatic processing.

context = infiniloom.pack("/path/to/repo", format="json")

YAML (Gemini-optimized)

Best for Gemini. Query should be placed at the end.

context = infiniloom.pack("/path/to/repo", format="yaml", model="gemini")

TOON (Token-Efficient)

Most token-efficient format (~40% smaller than JSON). Best for limited context windows.

context = infiniloom.pack("/path/to/repo", format="toon")

Compression Levels

  • none: No compression (0% reduction)
  • minimal: Remove empty lines, trim whitespace (15% reduction)
  • balanced: Remove comments, normalize whitespace (35% reduction) - Default
  • aggressive: Remove docstrings, keep signatures only (60% reduction)
  • extreme: Key symbols only (80% reduction)
  • focused: Key symbols with small context (75% reduction)
  • semantic: Heuristic semantic compression (~60-70% reduction)

Integration Examples

With Anthropic Claude

import infiniloom
import anthropic

# Generate context
context = infiniloom.pack(
    "/path/to/repo",
    format="xml",
    model="claude",
    compression="balanced"
)

# Send to Claude
client = anthropic.Anthropic()
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=4096,
    messages=[{
        "role": "user",
        "content": f"{context}\n\nExplain the architecture of this codebase."
    }]
)
print(response.content[0].text)

With OpenAI GPT

import infiniloom
import openai

context = infiniloom.pack("/path/to/repo", format="markdown", model="gpt")

client = openai.OpenAI()
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{
        "role": "user",
        "content": f"{context}\n\nWhat are the main components?"
    }]
)
print(response.choices[0].message.content)

With Google Gemini

import infiniloom
import google.generativeai as genai

context = infiniloom.pack("/path/to/repo", format="yaml", model="gemini")

genai.configure(api_key="YOUR_API_KEY")
model = genai.GenerativeModel("gemini-1.5-pro")
response = model.generate_content(f"{context}\n\nSummarize this codebase")
print(response.text)

Advanced Usage

Custom Token Budget

from infiniloom import Infiniloom

loom = Infiniloom("/large/repo")

# Generate smaller context for models with limited context windows
compact_map = loom.map(map_budget=1000, max_symbols=25)

# Generate larger context for models with large context windows
detailed_map = loom.map(map_budget=5000, max_symbols=200)

Security Scanning

from infiniloom import Infiniloom

loom = Infiniloom("/path/to/repo")
findings = loom.scan_security()

# Filter by severity
critical = [f for f in findings if f['severity'] == 'Critical']
high = [f for f in findings if f['severity'] == 'High']

print(f"Critical: {len(critical)}, High: {len(high)}")

for finding in critical:
    print(f"{finding['file']}:{finding['line']}")
    print(f"  {finding['category']}: {finding['message']}")

File Filtering

from infiniloom import Infiniloom

loom = Infiniloom("/path/to/repo")
files = loom.files()

# Get Python files only
python_files = [f for f in files if f['language'] == 'python']

# Get high-importance files
important_files = [f for f in files if f['importance'] > 0.7]

# Get large files
large_files = [f for f in files if f['tokens'] > 1000]

Performance

Infiniloom is built in Rust for maximum performance:

  • Fast scanning: Parallel file processing with ignore patterns
  • Memory efficient: Streaming processing, optional content loading
  • Native speed: No Python overhead for core operations

Requirements

  • Python 3.8+
  • Rust 1.91+ (for building from source)

License

MIT License - see LICENSE for details.

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

infiniloom-0.3.1.tar.gz (255.5 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

infiniloom-0.3.1-cp38-abi3-win_amd64.whl (7.7 MB view details)

Uploaded CPython 3.8+Windows x86-64

infiniloom-0.3.1-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.8 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ x86-64

infiniloom-0.3.1-cp38-abi3-macosx_11_0_arm64.whl (7.9 MB view details)

Uploaded CPython 3.8+macOS 11.0+ ARM64

File details

Details for the file infiniloom-0.3.1.tar.gz.

File metadata

  • Download URL: infiniloom-0.3.1.tar.gz
  • Upload date:
  • Size: 255.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for infiniloom-0.3.1.tar.gz
Algorithm Hash digest
SHA256 e0636240860afd0c60a24ced4b0324111897b8c7f0e846df079d815537aae729
MD5 40211a32830c2af0b94fc035e9f610f6
BLAKE2b-256 9bb1b2376a2eae91c2bebcd5fd5ff5417b47bf10e5f1b0f2724026171e18b962

See more details on using hashes here.

File details

Details for the file infiniloom-0.3.1-cp38-abi3-win_amd64.whl.

File metadata

  • Download URL: infiniloom-0.3.1-cp38-abi3-win_amd64.whl
  • Upload date:
  • Size: 7.7 MB
  • Tags: CPython 3.8+, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for infiniloom-0.3.1-cp38-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 8e103cc44ec550bb5fb79b448d597ec95b4579cbd678c5f47be31782866b473c
MD5 ec2d94b3111c85b7bc8cca932b8ae251
BLAKE2b-256 14b0b06eef41385416a3edc2d7b4d31e410ba7707a5b6787875d92a40c026113

See more details on using hashes here.

File details

Details for the file infiniloom-0.3.1-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for infiniloom-0.3.1-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 895636a75f4c447ce72c519b0a8b627276a6b163ce5a84d6dc0f8552c9fb22c9
MD5 3992854f702baa4e1b57fadfd13ae92f
BLAKE2b-256 2bebc807da862acc9c65d39da3041925499befe33b3ceac9d5c26f681cbbc2df

See more details on using hashes here.

File details

Details for the file infiniloom-0.3.1-cp38-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for infiniloom-0.3.1-cp38-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 59da292980f49371a68aaab27d8372059fd80773776c25ce401fddc18065ed7b
MD5 5621072f68bcf98008090581a4a934a8
BLAKE2b-256 882b09f5e8000988cc623cdae774aa483f7eb4a4d6b5fbbda3681cb1da0c4e5d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page