Skip to main content

Python bindings for infiniloom - Repository context engine for LLMs

Project description

Infiniloom Python Bindings

Python bindings for Infiniloom - a repository context engine for Large Language Models.

Installation

pip install infiniloom

Building from Source

git clone https://github.com/Topos-Labs/infiniloom.git
cd infiniloom/bindings/python
pip install maturin
maturin develop  # For development
maturin build --release  # For production wheel

Quick Start

Functional API

import infiniloom

# Pack a repository into Claude-optimized XML
context = infiniloom.pack("/path/to/repo", format="xml", model="claude")
print(context)

# Scan repository and get statistics
stats = infiniloom.scan("/path/to/repo")
print(f"Files: {stats['total_files']}")
print(f"Languages: {stats['languages']}")

# Count tokens for a specific model
tokens = infiniloom.count_tokens("Hello, world!", model="claude")
print(f"Tokens: {tokens}")

Object-Oriented API

from infiniloom import Infiniloom

# Create an Infiniloom instance
loom = Infiniloom("/path/to/repo")

# Get repository statistics
stats = loom.stats()
print(stats)

# Generate repository context
context = loom.pack(format="xml", model="claude", compression="balanced")

# Get repository map with important symbols
repo_map = loom.map(map_budget=2000, max_symbols=50)
for symbol in repo_map['key_symbols']:
    print(f"{symbol['name']} ({symbol['kind']}) in {symbol['file']}")

# Scan for security issues
findings = loom.scan_security()
for finding in findings:
    print(f"{finding['severity']}: {finding['message']} at {finding['file']}:{finding['line']}")

# List all files
files = loom.files()
for file in files:
    print(f"{file['path']} - {file['language']} ({file['tokens']} tokens)")

API Reference

Functions

pack(path, format="xml", model="claude", compression="balanced", map_budget=2000, max_symbols=50)

Pack a repository into an LLM-optimized format.

Parameters:

  • path (str): Path to the repository
  • format (str): Output format - "xml", "markdown", "json", "yaml", "toon", or "plain"
  • model (str): Target model for token counting. Supports:
    • OpenAI GPT-5.x: "gpt-5.2", "gpt-5.2-pro", "gpt-5.1", "gpt-5.1-mini", "gpt-5.1-codex", "gpt-5", "gpt-5-mini", "gpt-5-nano"
    • OpenAI O-series: "o4-mini", "o3", "o3-mini", "o1", "o1-mini", "o1-preview"
    • OpenAI GPT-4: "gpt-4o", "gpt-4o-mini", "gpt-4", "gpt-3.5-turbo"
    • Anthropic: "claude" (default)
    • Google: "gemini"
    • Meta: "llama", "codellama"
    • Others: "deepseek", "mistral", "qwen", "cohere", "grok"
  • compression (str): Compression level - "none", "minimal", "balanced", "aggressive", "extreme", "focused", "semantic"
  • map_budget (int): Token budget for repository map (default: 2000)
  • max_symbols (int): Maximum symbols to include (default: 50)

Returns: str - Formatted repository context

scan(path, include_hidden=False, respect_gitignore=True)

Scan a repository and return statistics.

Parameters:

  • path (str): Path to the repository
  • include_hidden (bool): Include hidden files (default: False)
  • respect_gitignore (bool): Respect .gitignore files (default: True)

Returns: dict - Repository statistics including:

  • name: Repository name
  • path: Absolute path
  • total_files: Number of files
  • total_lines: Total lines of code
  • total_tokens: Token counts for each model
  • languages: Language breakdown
  • branch: Git branch (if available)
  • commit: Git commit hash (if available)

count_tokens(text, model="claude")

Count tokens in text for a specific model.

Parameters:

  • text (str): Text to count tokens for
  • model (str): Target model. Supports all models listed above in pack(), including GPT-5.x series

Returns: int - Number of tokens (exact for OpenAI models via tiktoken, calibrated estimates for others)

semantic_compress(text, similarity_threshold=0.7, budget_ratio=0.5)

Compress text using semantic compression while preserving important content.

Parameters:

  • text (str): Text to compress
  • similarity_threshold (float): Threshold for grouping similar chunks (0.0-1.0, default: 0.7)
  • budget_ratio (float): Target size as ratio of original (0.0-1.0, default: 0.5)

Returns: str - Compressed text

import infiniloom

long_text = "... your long text content ..."
compressed = infiniloom.semantic_compress(long_text, budget_ratio=0.3)
print(compressed)

scan_security(path)

Scan repository for security issues.

Parameters:

  • path (str): Path to the repository

Returns: list[dict] - List of security findings with:

  • file: File path
  • line: Line number
  • severity: Severity level ("Critical", "High", "Medium", "Low", "Info")
  • kind: Type of finding (e.g., "aws_access_key", "github_token")
  • pattern: The matched pattern

is_git_repo(path)

Check if a path is a git repository.

Parameters:

  • path (str): Path to check

Returns: bool - True if path is a git repository, False otherwise

from infiniloom import is_git_repo

if is_git_repo("/path/to/repo"):
    print("This is a git repository")

Classes

Infiniloom(path)

Object-oriented interface for repository analysis.

Methods:

load(include_hidden=False, respect_gitignore=True)

Load the repository into memory.

stats()

Get repository statistics. Returns same structure as scan() function.

pack(format="xml", model="claude", compression="balanced", map_budget=2000)

Pack the repository. Returns formatted string.

map(map_budget=2000, max_symbols=50)

Get repository map with key symbols. Returns dict with:

  • summary: Text summary
  • token_count: Estimated tokens
  • key_symbols: List of important symbols
scan_security()

Scan for security issues. Returns list of findings.

files()

Get list of all files. Returns list of dicts with file metadata.

GitRepo(path)

Git repository wrapper for accessing git operations like status, diff, log, and blame.

Constructor:

  • path (str): Path to the git repository

Raises: InfiniloomError if path is not a git repository

Methods:

current_branch()

Get the current branch name.

Returns: str - Current branch name (e.g., "main", "feature/xyz")

current_commit()

Get the current commit hash.

Returns: str - Full SHA-1 hash of HEAD commit (40 characters)

status()

Get working tree status (both staged and unstaged changes).

Returns: list[dict] - List of file status objects with:

  • path: File path
  • status: Status type ("Added", "Modified", "Deleted", "Renamed", "Copied", "Unknown")
  • old_path: Old path for renames (optional)
log(count=10)

Get recent commits.

Parameters:

  • count (int): Maximum number of commits to return (default: 10)

Returns: list[dict] - List of commit objects with:

  • hash: Full commit hash
  • short_hash: Short commit hash (7 characters)
  • author: Author name
  • email: Author email
  • date: Commit date (ISO 8601 format)
  • message: Commit message (first line)
file_log(path, count=10)

Get commits that modified a specific file.

Parameters:

  • path (str): File path relative to repo root
  • count (int): Maximum number of commits to return (default: 10)

Returns: list[dict] - List of commits that modified the file

blame(path)

Get blame information for a file.

Parameters:

  • path (str): File path relative to repo root

Returns: list[dict] - List of blame line objects with:

  • commit: Commit hash that introduced the line
  • author: Author who wrote the line
  • date: Date when line was written
  • line_number: Line number (1-indexed)
ls_files()

Get list of files tracked by git.

Returns: list[str] - Array of file paths tracked by git

diff_files(from_ref, to_ref)

Get files changed between two commits.

Parameters:

  • from_ref (str): Starting commit/branch/tag
  • to_ref (str): Ending commit/branch/tag

Returns: list[dict] - List of changed files with:

  • path: File path
  • status: Status ("Added", "Modified", "Deleted", "Renamed", "Copied")
  • additions: Number of lines added
  • deletions: Number of lines deleted
uncommitted_diff(path)

Get diff content for uncommitted changes in a file.

Parameters:

  • path (str): File path relative to repo root

Returns: str - Unified diff content

all_uncommitted_diffs()

Get diff for all uncommitted changes.

Returns: str - Combined unified diff for all changed files

has_changes(path)

Check if a file has uncommitted changes.

Parameters:

  • path (str): File path relative to repo root

Returns: bool - True if file has changes

last_modified_commit(path)

Get the last commit that modified a file.

Parameters:

  • path (str): File path relative to repo root

Returns: dict - Commit information object

file_change_frequency(path, days=30)

Get file change frequency in recent days.

Parameters:

  • path (str): File path relative to repo root
  • days (int): Number of days to look back (default: 30)

Returns: int - Number of commits that modified the file in the period

Example:

from infiniloom import GitRepo, is_git_repo

# Check if path is a git repo first
if is_git_repo("/path/to/repo"):
    repo = GitRepo("/path/to/repo")

    # Get current state
    print(f"Branch: {repo.current_branch()}")
    print(f"Commit: {repo.current_commit()}")

    # Get recent commits
    for commit in repo.log(count=5):
        print(f"{commit['short_hash']}: {commit['message']}")

    # Get file history
    for commit in repo.file_log("src/main.py", count=3):
        print(f"{commit['date']}: {commit['message']}")

    # Get blame information
    for line in repo.blame("src/main.py")[:10]:
        print(f"Line {line['line_number']}: {line['author']}")

    # Check for uncommitted changes
    if repo.has_changes("src/main.py"):
        diff = repo.uncommitted_diff("src/main.py")
        print(diff)

Formats

XML (Claude-optimized)

Best for Claude models. Uses XML structure that Claude understands well.

context = infiniloom.pack("/path/to/repo", format="xml", model="claude")

Markdown (GPT-optimized)

Best for GPT models. Uses Markdown with clear hierarchical structure.

context = infiniloom.pack("/path/to/repo", format="markdown", model="gpt")

JSON

Generic JSON format for programmatic processing.

context = infiniloom.pack("/path/to/repo", format="json")

YAML (Gemini-optimized)

Best for Gemini. Query should be placed at the end.

context = infiniloom.pack("/path/to/repo", format="yaml", model="gemini")

TOON (Token-Efficient)

Most token-efficient format (~40% smaller than JSON). Best for limited context windows.

context = infiniloom.pack("/path/to/repo", format="toon")

Compression Levels

  • none: No compression (0% reduction)
  • minimal: Remove empty lines, trim whitespace (15% reduction)
  • balanced: Remove comments, normalize whitespace (35% reduction) - Default
  • aggressive: Remove docstrings, keep signatures only (60% reduction)
  • extreme: Key symbols only (80% reduction)
  • focused: Key symbols with small context (75% reduction)
  • semantic: Heuristic semantic compression (~60-70% reduction)

Integration Examples

With Anthropic Claude

import infiniloom
import anthropic

# Generate context
context = infiniloom.pack(
    "/path/to/repo",
    format="xml",
    model="claude",
    compression="balanced"
)

# Send to Claude
client = anthropic.Anthropic()
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=4096,
    messages=[{
        "role": "user",
        "content": f"{context}\n\nExplain the architecture of this codebase."
    }]
)
print(response.content[0].text)

With OpenAI GPT

import infiniloom
import openai

context = infiniloom.pack("/path/to/repo", format="markdown", model="gpt")

client = openai.OpenAI()
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{
        "role": "user",
        "content": f"{context}\n\nWhat are the main components?"
    }]
)
print(response.choices[0].message.content)

With Google Gemini

import infiniloom
import google.generativeai as genai

context = infiniloom.pack("/path/to/repo", format="yaml", model="gemini")

genai.configure(api_key="YOUR_API_KEY")
model = genai.GenerativeModel("gemini-1.5-pro")
response = model.generate_content(f"{context}\n\nSummarize this codebase")
print(response.text)

Advanced Usage

Custom Token Budget

from infiniloom import Infiniloom

loom = Infiniloom("/large/repo")

# Generate smaller context for models with limited context windows
compact_map = loom.map(map_budget=1000, max_symbols=25)

# Generate larger context for models with large context windows
detailed_map = loom.map(map_budget=5000, max_symbols=200)

Security Scanning

from infiniloom import Infiniloom

loom = Infiniloom("/path/to/repo")
findings = loom.scan_security()

# Filter by severity
critical = [f for f in findings if f['severity'] == 'Critical']
high = [f for f in findings if f['severity'] == 'High']

print(f"Critical: {len(critical)}, High: {len(high)}")

for finding in critical:
    print(f"{finding['file']}:{finding['line']}")
    print(f"  {finding['category']}: {finding['message']}")

File Filtering

from infiniloom import Infiniloom

loom = Infiniloom("/path/to/repo")
files = loom.files()

# Get Python files only
python_files = [f for f in files if f['language'] == 'python']

# Get high-importance files
important_files = [f for f in files if f['importance'] > 0.7]

# Get large files
large_files = [f for f in files if f['tokens'] > 1000]

Performance

Infiniloom is built in Rust for maximum performance:

  • Fast scanning: Parallel file processing with ignore patterns
  • Memory efficient: Streaming processing, optional content loading
  • Native speed: No Python overhead for core operations

Requirements

  • Python 3.8+
  • Rust 1.91+ (for building from source)

License

MIT License - see LICENSE for details.

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

infiniloom-0.3.0.tar.gz (253.5 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

infiniloom-0.3.0-cp38-abi3-win_amd64.whl (7.7 MB view details)

Uploaded CPython 3.8+Windows x86-64

infiniloom-0.3.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.8 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ x86-64

infiniloom-0.3.0-cp38-abi3-macosx_11_0_arm64.whl (7.9 MB view details)

Uploaded CPython 3.8+macOS 11.0+ ARM64

File details

Details for the file infiniloom-0.3.0.tar.gz.

File metadata

  • Download URL: infiniloom-0.3.0.tar.gz
  • Upload date:
  • Size: 253.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for infiniloom-0.3.0.tar.gz
Algorithm Hash digest
SHA256 de0c4078eeee7c2d3c6d2981ed6610df75d96bb20a9d30c6cf403c272a031399
MD5 75fbaa1b5fcf7e0aa5d0653d6f611c24
BLAKE2b-256 5affe80d2dbf0f2a0c7b71c53b8657201e5a63183cc22e851393c0f8ebbee440

See more details on using hashes here.

File details

Details for the file infiniloom-0.3.0-cp38-abi3-win_amd64.whl.

File metadata

  • Download URL: infiniloom-0.3.0-cp38-abi3-win_amd64.whl
  • Upload date:
  • Size: 7.7 MB
  • Tags: CPython 3.8+, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for infiniloom-0.3.0-cp38-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 c5e600cb53c88662a17703a990ae69ab2c65c3c0e334ba6e9cf92ccf9230c579
MD5 34f793a3c8771ac34b152102130d8ba2
BLAKE2b-256 c47adcf2860c96a0f427bcf9592fc32410dff76e6cb8f2417f0e9c99e4ce86e6

See more details on using hashes here.

File details

Details for the file infiniloom-0.3.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for infiniloom-0.3.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 72b12918ba115d39f09aa8cdfadec915b8e0efc91b81b7ce3df8b51406f799fa
MD5 36478ea1e2261efca722c6df98315dcb
BLAKE2b-256 10e31e7818a113fe5a67866693e085cba5f6c9af65c61b5c498d7271d997e1b8

See more details on using hashes here.

File details

Details for the file infiniloom-0.3.0-cp38-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for infiniloom-0.3.0-cp38-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 6088acd4fc340168fc29a63e0f28c84cb59fd53e519f3014d7c6855636af4702
MD5 f914fa3400fb7ec19637ad8056506bcc
BLAKE2b-256 43be5d3514e6b638540e84c3bbeb024b38ef6192d6aa4199e8d2af7859b61919

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page