Python bindings for infiniloom - Repository context engine for LLMs
Project description
Infiniloom Python Bindings
Python bindings for Infiniloom - a repository context engine for Large Language Models.
Installation
Phase 2 (not implemented intentionally): pip install infiniloom (PyPI release).
For now, build from source:
git clone https://github.com/Topos-Labs/infiniloom.git
cd infiniloom/bindings/python
pip install maturin
maturin develop # For development
maturin build --release # For production build
Quick Start
Functional API
import infiniloom
# Pack a repository into Claude-optimized XML
context = infiniloom.pack("/path/to/repo", format="xml", model="claude")
print(context)
# Scan repository and get statistics
stats = infiniloom.scan("/path/to/repo")
print(f"Files: {stats['total_files']}")
print(f"Languages: {stats['languages']}")
# Count tokens for a specific model
tokens = infiniloom.count_tokens("Hello, world!", model="claude")
print(f"Tokens: {tokens}")
Object-Oriented API
from infiniloom import Infiniloom
# Create an Infiniloom instance
loom = Infiniloom("/path/to/repo")
# Get repository statistics
stats = loom.stats()
print(stats)
# Generate repository context
context = loom.pack(format="xml", model="claude", compression="balanced")
# Get repository map with important symbols
repo_map = loom.map(map_budget=2000, max_symbols=50)
for symbol in repo_map['key_symbols']:
print(f"{symbol['name']} ({symbol['kind']}) in {symbol['file']}")
# Scan for security issues
findings = loom.scan_security()
for finding in findings:
print(f"{finding['severity']}: {finding['message']} at {finding['file']}:{finding['line']}")
# List all files
files = loom.files()
for file in files:
print(f"{file['path']} - {file['language']} ({file['tokens']} tokens)")
API Reference
Functions
pack(path, format="xml", model="claude", compression="balanced", map_budget=2000, max_symbols=50)
Pack a repository into an LLM-optimized format.
Parameters:
path(str): Path to the repositoryformat(str): Output format - "xml", "markdown", "json", or "yaml"model(str): Target model for token counting. Supports:- OpenAI: "gpt-5.2", "gpt-5.1", "gpt-5", "o4-mini", "o3", "o1", "gpt-4o", "gpt-4"
- Anthropic: "claude" (default)
- Google: "gemini"
- Meta: "llama"
- Others: "deepseek", "mistral", "qwen", "cohere", "grok"
compression(str): Compression level - "none", "minimal", "balanced", "aggressive", "extreme"map_budget(int): Token budget for repository map (default: 2000)max_symbols(int): Maximum symbols to include (default: 50)
Returns: str - Formatted repository context
scan(path, include_hidden=False, respect_gitignore=True)
Scan a repository and return statistics.
Parameters:
path(str): Path to the repositoryinclude_hidden(bool): Include hidden files (default: False)respect_gitignore(bool): Respect .gitignore files (default: True)
Returns: dict - Repository statistics including:
name: Repository namepath: Absolute pathtotal_files: Number of filestotal_lines: Total lines of codetotal_tokens: Token counts for each modellanguages: Language breakdownbranch: Git branch (if available)commit: Git commit hash (if available)
count_tokens(text, model="claude")
Count tokens in text for a specific model.
Parameters:
text(str): Text to count tokens formodel(str): Target model - "claude", "gpt", "gpt-4o", "gemini", or "llama"
Returns: int - Number of tokens
semantic_compress(text, similarity_threshold=0.7, budget_ratio=0.5)
Compress text using semantic compression while preserving important content.
Parameters:
text(str): Text to compresssimilarity_threshold(float): Threshold for grouping similar chunks (0.0-1.0, default: 0.7)budget_ratio(float): Target size as ratio of original (0.0-1.0, default: 0.5)
Returns: str - Compressed text
import infiniloom
long_text = "... your long text content ..."
compressed = infiniloom.semantic_compress(long_text, budget_ratio=0.3)
print(compressed)
scan_security(path)
Scan repository for security issues.
Parameters:
path(str): Path to the repository
Returns: list[dict] - List of security findings with:
file: File pathline: Line numberseverity: Severity levelcategory: Issue categorymessage: Descriptioncode: Code snippet (optional)
Classes
Infiniloom(path)
Object-oriented interface for repository analysis.
Methods:
load(include_hidden=False, respect_gitignore=True)
Load the repository into memory.
stats()
Get repository statistics. Returns same structure as scan() function.
pack(format="xml", model="claude", compression="balanced", map_budget=2000)
Pack the repository. Returns formatted string.
map(map_budget=2000, max_symbols=50)
Get repository map with key symbols. Returns dict with:
summary: Text summarytoken_count: Estimated tokenskey_symbols: List of important symbols
scan_security()
Scan for security issues. Returns list of findings.
files()
Get list of all files. Returns list of dicts with file metadata.
Formats
XML (Claude-optimized)
Best for Claude models. Uses XML structure that Claude understands well.
context = infiniloom.pack("/path/to/repo", format="xml", model="claude")
Markdown (GPT-optimized)
Best for GPT models. Uses Markdown with clear hierarchical structure.
context = infiniloom.pack("/path/to/repo", format="markdown", model="gpt")
JSON
Generic JSON format for programmatic processing.
context = infiniloom.pack("/path/to/repo", format="json")
YAML (Gemini-optimized)
Best for Gemini. Query should be placed at the end.
context = infiniloom.pack("/path/to/repo", format="yaml", model="gemini")
TOON (Token-Efficient)
Most token-efficient format (~40% smaller than JSON). Best for limited context windows.
context = infiniloom.pack("/path/to/repo", format="toon")
Compression Levels
- none: No compression (0% reduction)
- minimal: Remove empty lines, trim whitespace (15% reduction)
- balanced: Remove comments, normalize whitespace (35% reduction) - Default
- aggressive: Remove docstrings, keep signatures only (60% reduction)
- extreme: Key symbols only (80% reduction)
- focused: Key symbols with small context (75% reduction)
- semantic: Heuristic semantic compression (~60-70% reduction)
Integration Examples
With Anthropic Claude
import infiniloom
import anthropic
# Generate context
context = infiniloom.pack(
"/path/to/repo",
format="xml",
model="claude",
compression="balanced"
)
# Send to Claude
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
messages=[{
"role": "user",
"content": f"{context}\n\nExplain the architecture of this codebase."
}]
)
print(response.content[0].text)
With OpenAI GPT
import infiniloom
import openai
context = infiniloom.pack("/path/to/repo", format="markdown", model="gpt")
client = openai.OpenAI()
response = client.chat.completions.create(
model="gpt-4o",
messages=[{
"role": "user",
"content": f"{context}\n\nWhat are the main components?"
}]
)
print(response.choices[0].message.content)
With Google Gemini
import infiniloom
import google.generativeai as genai
context = infiniloom.pack("/path/to/repo", format="yaml", model="gemini")
genai.configure(api_key="YOUR_API_KEY")
model = genai.GenerativeModel("gemini-1.5-pro")
response = model.generate_content(f"{context}\n\nSummarize this codebase")
print(response.text)
Advanced Usage
Custom Token Budget
from infiniloom import Infiniloom
loom = Infiniloom("/large/repo")
# Generate smaller context for models with limited context windows
compact_map = loom.map(map_budget=1000, max_symbols=25)
# Generate larger context for models with large context windows
detailed_map = loom.map(map_budget=5000, max_symbols=200)
Security Scanning
from infiniloom import Infiniloom
loom = Infiniloom("/path/to/repo")
findings = loom.scan_security()
# Filter by severity
critical = [f for f in findings if f['severity'] == 'Critical']
high = [f for f in findings if f['severity'] == 'High']
print(f"Critical: {len(critical)}, High: {len(high)}")
for finding in critical:
print(f"{finding['file']}:{finding['line']}")
print(f" {finding['category']}: {finding['message']}")
File Filtering
from infiniloom import Infiniloom
loom = Infiniloom("/path/to/repo")
files = loom.files()
# Get Python files only
python_files = [f for f in files if f['language'] == 'python']
# Get high-importance files
important_files = [f for f in files if f['importance'] > 0.7]
# Get large files
large_files = [f for f in files if f['tokens'] > 1000]
Performance
Infiniloom is built in Rust for maximum performance:
- Fast scanning: Parallel file processing with ignore patterns
- Memory efficient: Streaming processing, optional content loading
- Native speed: No Python overhead for core operations
Requirements
- Python 3.8+
- Rust 1.91+ (for building from source)
License
MIT License - see LICENSE for details.
Links
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file infiniloom-0.1.0.tar.gz.
File metadata
- Download URL: infiniloom-0.1.0.tar.gz
- Upload date:
- Size: 248.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.10.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
67eb6a6526c369518b8f6f2cd934c049d164174ebfbe95b80e71ee0f2d590275
|
|
| MD5 |
99ca4831aecfc74b447790a78f436d1c
|
|
| BLAKE2b-256 |
18f0c4e2daa3032d74f45f61c4e4526a4ac2252ca98a393c663831c049fab72e
|
File details
Details for the file infiniloom-0.1.0-cp38-abi3-macosx_11_0_arm64.whl.
File metadata
- Download URL: infiniloom-0.1.0-cp38-abi3-macosx_11_0_arm64.whl
- Upload date:
- Size: 7.9 MB
- Tags: CPython 3.8+, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.10.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e29fe100d013f5c8017487b326f4c6d7edccb21383f0481924973b9a26ed8b0d
|
|
| MD5 |
10e1568645bd68ec4809f12b44d40386
|
|
| BLAKE2b-256 |
5f22a0bb00c676645665548cd8c301f15f73990e43f5c5ae8e300a0f40dae125
|