Enterprise-ready, high-speed, multi-language code indexer with BSG compression and Time Machine snapshots/diffs — no LLM required.
Project description
B.A.T.H.O
Bidirectional AST Traversal & Hypergraph Orchestrator
BATHO indexes your codebase, compresses the result for LLM context windows, and tracks changes over time.
A code intelligence engine that transforms raw, massive codebases into queryable, Time-Aware Structured Graph. By safely parsing source code into an AST and extracting a highly structured relational Hypergraph, Batho acts as the ultimate memory layer for your repositories and codebase. Whether you are generating token-budgeted context to prevent AI agent amnesia, enforcing governance via webhook orchestration, or tracking code changes over time using time-based structured graph snapshots.
Quick Start
Get running in 30 seconds:
# Install
uv add batho
# or
pip install batho
# Index your project
batho index --root . --verbose --snapshot
# Generate compressed bsg for LLM injection
batho bsg --root . --mode compressed --budget 12000
# Create snapshot
batho index --root . --snapshot
# Auto-detect and patch changes
batho patch --root . --scan
# Install Git hooks for automated checks
batho hooks install --all
# Start the artifact bridge (REST API + MCP server)
batho bridge serve --root .
batho bridge mcp --transport stdio
# Show all commands
batho --help
Batho scans your codebase, extracts every function, class, import, and relationship, and writes structured output to .ctn/.
Why Batho?
Modern AI tools need structured code understanding — not just raw file contents. Batho bridges that gap.
| What you get | Why it matters |
|---|---|
| 40+ language AST parsing | One tool for polyglot repos — Python, TypeScript, Rust, Go, Java, and more |
| 10x context compression | Fit entire codebases into LLM context windows |
| Time Machine snapshots | Track how your codebase evolves between releases |
| Zero Code Execution | Safe to run in CI, pre-commit, or on untrusted repos |
| Caching | mtime+SHA skips unchanged files — re-indexes in seconds |
| CI/CD Pipeline Hooks | Turnkey GitHub Actions and GitLab CI templates |
| Incremental patching | 10-100x faster updates with complete lineage tracking |
How It Works
- Parse — tree-sitter extracts functions, classes, variables, imports with full signatures
- Graph — Entities and relationships (IMPORTS, CALLS, USES, DEFINES) form a code graph
- Compress — BSG renders the graph in multiple formats: compressed, full, JSON, hierarchical
- Track — Time Machine snapshots let you diff code intelligence over time
Features
Multi-Language AST Extraction
Batho uses tree-sitter for precise, language-aware parsing:
- Functions — name, signature, parameters, return type, docstring
- Classes — name, base classes, methods, attributes
- Interfaces/Traits — method signatures
- Variables — declarations, types, assignments
- Imports — module paths, selective imports
Relationships captured: IMPORTS · CALLS · USES · DEFINES
BSG (Batho Structured Graph) Compression
Transforms full code graphs into compact representations:
# Generate full bsg with signatures
batho bsg --root . --mode full
# Generate hierarchical directory view
batho bsg --root . --mode hierarchical
# Generate compressed bsg for LLM injection
batho bsg --root . --mode compressed --budget 12000
| Mode | Best for | Output File |
|---|---|---|
| Full | Developer reference with signatures + line numbers | bsg_full.json |
| Hierarchical | Directory-tree overviews | bsg_hierarchical.json |
| Compressed | LLM prompt injection (4K–40K tokens) | bsg_compressed.json |
Batho Time Machine
batho index --root . --snapshot # Create snapshot
batho snapshots --root . # List all snapshots
batho diff-snapshots --root . SNAP_A SNAP_B # Compare versions
Versioned snapshots with UUID + timestamp, entity/relationship diffs, and staleness scoring for automated re-indexing.
Incremental Patching with Tracking
# Auto-detect and patch changes
batho patch --root . --scan
# List all patch operations
batho patches --root . --format timeline
# Show detailed patch info
batho patch-info --root . --patch-id ID
# Apply patch from diff file
batho apply-patch --root . --base-snapshot ID --diff-file changes.diff
# Cherry-pick patch to different snapshot
batho cherry-pick --root . --patch-id ID --target-snapshot ID
Smart Indexing
- mtime + SHA-256 cache — unchanged files are skipped instantly
- Parallel extraction — auto-scaled threads (CPU × 2, capped at 32)
- Binary detection — magic bytes + entropy analysis
- Ignore support —
.gitignore+.bathoignorevia pathspec - Per-file isolation — one bad file never aborts the scan
Stack Detection
Automatically identifies your tech stack from config files:
| Category | Frameworks / Tools |
|---|---|
| Python | FastAPI, Django, Flask |
| Node.js | React, Vue, Express, NestJS |
| Java | Spring, Maven, Gradle |
| .NET | ASP.NET, Entity Framework |
| Go | Gin, Echo |
| Ruby | Rails, Sinatra |
| Rust | Cargo |
| Mobile | Android, iOS |
| Data/ML | PyTorch, TensorFlow, Pandas |
Supported Languages
| Category | Languages |
|---|---|
| Web / Backend | Python, TypeScript, JavaScript, Go, Java, Ruby, PHP, C#, Scala, Kotlin |
| Systems | Rust, C, C++, Zig, Objective-C |
| Mobile | Swift, Kotlin (Android), Objective-C (iOS) |
| Functional | Haskell, Erlang, OCaml, Elixir, Julia, Agda |
| Scripting | Bash, Perl, Lua, R |
| Other | Dart, Verilog, Hack |
| Markup / Config | JSON, YAML, TOML, HTML, CSS/SCSS/SASS/LESS, Markdown, HCL/Terraform |
Parser availability depends on installed
tree_sitter_language_packgrammars. Missing grammars are skipped gracefully.
Installation
pip install batho # pip
uv pip install batho # uv
pip install -e . # development (editable)
Developer Setup (uv)
Use this section when you want to contribute to Batho locally, run tests, and verify the CLI from source.
1. Clone the repository
git clone https://github.com/sageoz/batho.git
cd batho
2. Install project dependencies for development and testing
uv sync --all-groups --all-extras
This creates and syncs the project environment with runtime, test, and dev dependencies.
3. Run tests
# Full suite
uv run pytest
# Optional: focused checks while iterating
uv run pytest tests/core/test_config.py -q
uv run pytest tests/utils/test_logging.py -q
4. Run the CLI directly from local source
This path is best during development because it always uses your current working tree.
uv run python -m batho_cli --help
uv run python -m batho_cli index --root .
5. Reinstall the global batho command from your local source
Use this when you want the plain batho command to reflect your latest local code.
uv tool install --reinstall .
hash -r
batho index --root .
6. Quick troubleshooting
If behavior differs between local and global runs, compare both paths:
uv run python -m batho_cli index --root .
batho index --root .
If they differ, reinstall the tool again:
uv tool install --reinstall .
hash -r
CLI Reference
# Show all commands
batho --help
# Show command-specific help
batho <command> --help
Command Matrix
| Command | Purpose |
|---|---|
index |
Build/update graph + BSG artifacts for a repo |
stats |
Show current index metadata and health summary |
snapshots |
List stored snapshots |
diff-snapshots |
Diff two snapshots |
patch |
Apply incremental updates from scan/diff/files |
patches |
List patch operations |
patch-info |
Show patch operation details |
patch-chain |
Show chain of patches for a snapshot |
apply-patch |
Apply patch by diff file or patch id |
cherry-pick |
Apply a patch to another snapshot |
sync |
Sync pending artifacts to configured cloud endpoint |
hooks |
Git client-side hook management (install/remove/run) |
invalidate |
Clear index file cache |
cache |
AST cache management (stats, invalidate, clear) |
storage |
Persistent artifact registry tools (backfill, verify, cleanup, stats, rebuild-indexes, compact) |
query |
Query persisted entity/relationship indexes |
bsg |
Render BSG outputs (compressed, full, hierarchical) |
Indexing & Snapshots
# Full index
batho index --root /path/to/repo --verbose
# Force full rebuild (disable incremental path)
batho index --root /path/to/repo --full
# Force cache reset before indexing (clears file cache + AST cache)
batho index --root /path/to/repo --force
# Deterministic fresh parse run (bypass AST cache for this invocation)
batho index --root /path/to/repo --force --no-ast-cache --verbose
# Index and create snapshot
batho index --root /path/to/repo --snapshot --snapshot-label "release-candidate"
# Snapshot inspection
batho snapshots --root /path/to/repo
batho diff-snapshots --root /path/to/repo --snapshot-a SNAP_A --snapshot-b SNAP_B
Patch Lifecycle
# Auto-detect file changes and patch
batho patch --root /path/to/repo --scan
# Patch from unified diff
batho patch --root /path/to/repo --diff /path/to/changes.diff
# Patch specific files
batho patch --root /path/to/repo src/a.py src/b.py
# Patch history and details
batho patches --root /path/to/repo --format timeline
batho patch-info --root /path/to/repo --patch-id PATCH_ID --format summary
batho patch-chain --root /path/to/repo --snapshot-id SNAP_ID --full
# Advanced patch operations
batho apply-patch --root /path/to/repo --base-snapshot SNAP_ID --diff-file /path/to/changes.diff
batho cherry-pick --root /path/to/repo --patch-id PATCH_ID --target-snapshot SNAP_ID
BSG Rendering & Querying
# Render BSG formats
batho bsg --root /path/to/repo --mode compressed --budget 12000
batho bsg --root /path/to/repo --mode full
batho bsg --root /path/to/repo --mode hierarchical
# Query persisted graph indexes
batho query --root /path/to/repo --entity-type function --limit 50
batho query --root /path/to/repo --file-path src/api.py
batho query --root /path/to/repo --relationship-type calls --rebuild-index
Cache & Storage Operations
# Index cache cleanup
batho invalidate --root /path/to/repo
# AST cache management
batho cache stats
batho cache invalidate "**/*.py"
batho cache clear
# Persistent storage management
batho storage backfill --root /path/to/repo
batho storage verify --root /path/to/repo --repair
batho storage cleanup --root /path/to/repo # dry-run
batho storage cleanup --root /path/to/repo --apply # execute cleanup
batho storage stats --root /path/to/repo
batho storage rebuild-indexes --root /path/to/repo
batho storage compact --root /path/to/repo # dry-run
batho storage compact --root /path/to/repo --apply # execute compaction
Cloud Sync Operations
# Preview pending artifacts (no upload)
batho sync --root /path/to/repo --dry-run
# Sync pending artifacts to cloud endpoint
export BATHO_CLOUD_SYNC_ENABLED=true
export BATHO_CLOUD_ENDPOINT="https://sync.batho.dev/v1"
export BATHO_CLOUD_API_KEY="batho_live_xxxxx"
batho sync --root /path/to/repo
# Retry only failed artifact uploads
batho sync --root /path/to/repo --retry-failed
# Show local sync status summary
batho sync --root /path/to/repo --status
Bridge (Artifact Registry REST + MCP)
Expose .ctn/ artifacts via HTTP and MCP for dashboard/IDE integrations.
# Start REST API server (default http://127.0.0.1:8766)
batho bridge serve --root /path/to/repo
batho bridge serve --root /path/to/repo --host 0.0.0.0 --port 8766
# Start MCP server (stdio for IDE integration)
batho bridge mcp --root /path/to/repo --transport stdio
# Start MCP server (SSE for remote clients)
batho bridge mcp --root /path/to/repo --transport sse --port 8767
# Check registry status
batho bridge status --root /path/to/repo
# Verify all artifacts are loadable
batho bridge verify --root /path/to/repo
REST endpoints (mounted under /api/v1/bridge/):
GET /indexes— List all indexesGET /indexes/{index_id}— Get specific index metadataGET /artifacts?type={artifact_type}&limit={n}— List artifact recordsGET /artifacts/{artifact_type}?index_id={id}— Load artifact JSON contentGET /artifacts/{artifact_type}/content?path={logical_path}— Load by logical pathGET /stats— Registry statistics
MCP tools: bridge_list_indexes, bridge_get_index, bridge_list_artifacts,
bridge_get_artifact, bridge_get_artifact_by_path, bridge_search_artifacts,
bridge_get_stats.
Git Hooks Management
YAML-driven Git client-side hook automation with enterprise reliability.
# List configured hooks and templates
batho hooks list --root /path/to/repo
# Check installation status
batho hooks status --hook pre-commit
# Install all enabled hooks (auto-bootstraps .batho/hooks.yaml if missing)
batho hooks install --all
# Install specific hook with force (overwrites unmanaged)
batho hooks install --hook pre-commit --force
# Remove managed hooks
batho hooks remove --all
# Run hook manually (supports custom hooks for CI/CD)
batho hooks run --hook enterprise-nightly --verbose
Configuration in .batho/hooks.yaml:
version: hooks.v1
defaults:
shell: /bin/sh
timeout: 60
hooks:
pre-commit:
enabled: true
stages:
- run: ruff check .
- run: pytest --co -q
pre-push:
enabled: true
stages:
- run: pytest -x --tb=short
Enable in batho.yaml:
hooks:
enabled: true
include: true
Index Flags
| Flag | Default | Description |
|---|---|---|
--max-workers |
0 (auto) |
Worker threads — 0 uses CPU × 2, capped at 32 |
--max-file-size-kb |
500 |
Skip files larger than this |
--extensions |
all supported | Restrict indexing to selected extensions |
--full |
off | Disable incremental reuse and force full rebuild |
--force |
off | Clear index file cache and AST cache before indexing |
--no-ast-cache |
off | Bypass AST cache for the current indexing run |
--base-snapshot |
auto | Prefer this snapshot for incremental indexing |
--output-json |
none | Optional override path for graph JSON output |
--metrics-output |
from config | Write metrics JSON to explicit path |
--verbose |
off | Print progress to stdout |
--snapshot |
off | Create snapshot after indexing |
--snapshot-label |
none | Attach label to generated snapshot |
Global Logging Flags
| Flag | Default | Description |
|---|---|---|
--log-level |
from config | Override logging level (DEBUG, INFO, WARNING, ERROR, CRITICAL) |
-q, --quiet |
from config | Suppress non-error CLI output and log events below ERROR |
--log-json |
off | Force JSON log output (useful in CI) |
--log-file |
from config | Write logs to the specified file path |
BSG Options
| Flag | Default | Description |
|---|---|---|
--mode |
compressed |
Rendering mode: compressed, full, hierarchical |
--budget |
12000 |
Token budget for compressed mode |
Patch Options
| Flag | Default | Description |
|---|---|---|
--scan |
off | Auto-scan for changes |
--dry-run |
off | Preview changes without applying |
--base-snapshot |
auto | Use specific snapshot as base |
--force-index-patch |
off | Force traditional index-based patching |
--diff |
none | Apply patch from unified diff |
files... |
none | Patch explicit changed files |
Output
.ctn/
├── index.json # Index metadata + staleness + persistence model
├── artifact_registry.db # SQLite artifact registry (durable outputs)
├── file_cache.json # Index file cache
├── file_hashes.json # Content-hash tracker for incremental scans
├── metrics.json # Optional metrics output
├── interception_stats.json # Rule interception matrix
├── evolution_ledger.json # Failure synthesis ledger
├── snapshots/ # Time Machine snapshots
│ └── batho_<project>_<sha>_<ts>.json
├── patches/ # Patch operation history
│ ├── index.json
│ └── patch_<operation_id>.json
└── <index_id>/
├── graph.json # Entities + relationships
├── bsg.json # Structured symbol graph
├── bsg_compressed.json # LLM-ready compressed output
├── bsg_full.json # Full textual BSG output
├── bsg_hierarchical.json # Hierarchical textual BSG output
└── context/
├── overview.md
└── files.md
Default AST cache database location: .ctn/local/cache/ast_cache.db (configured by bsg.cache.path).
graph.json example
{
"schema_version": "graph.v1",
"entities": [
{"id": "e1", "name": "login", "type": "function", "file": "auth.py", "start_line": 10, "end_line": 25}
],
"relationships": [
{"source_id": "e1", "target_id": "e2", "type": "IMPORTS"}
]
}
bsg.json example
{
"schema_version": "bsg.v1",
"nodes": [
{
"id": "e1",
"type": "FUNCTION",
"name": "login",
"file": "src/auth.py",
"start_line": 10,
"end_line": 25
}
],
"edges": [],
"indexes": {
"nodes_by_file": {
"src/auth.py": ["e1"]
}
}
}
Configuration
Batho works out of the box with zero config. For production use, configure with the unified root config file ./batho.yaml (or start from batho.yaml.example) plus optional environment overrides.
Configuration precedence:
- Built-in defaults
./batho.yaml- Environment variables (override file values)
- CLI flags (override for a specific run)
Output behavior:
- User-facing command output is written to stdout.
- Warnings/errors and operational logs are written to stderr.
Core Config Areas
| Area | Keys | What it controls |
|---|---|---|
logging |
level, json_format, quiet, file, format |
Process-wide logging and CLI verbosity behavior |
paths |
ctn_dir |
Artifact output directory |
indexer |
max_file_size_kb, max_workers, max_indexed_files, ignore_*, metrics_output |
Base indexing limits and outputs |
rules |
enabled, builtin_plugins, custom_rules_*, strict_validation |
Rule plugins and metadata enrichment |
bsg.parallel |
enabled, max_workers, chunk_size |
Parallel file extraction |
bsg.ignore |
enabled, file |
.bathoignore integration |
bsg.cache |
enabled, path, max_size_mb, ttl_days |
AST cache behavior |
bsg.incremental |
enabled, fallback_to_full, auto_detect_git |
Incremental indexing strategy |
bsg.symbol_resolution |
enabled, fuzzy_matching, cache_symbols |
Cross-file symbol resolution |
bsg.serialization |
method, compression, batch_size |
BSG render strategy |
bsg.parsing |
error_recovery, partial_parsing, max_file_size_mb, skip_comments |
Parser behavior |
bsg.query |
enabled, index_on_write, cache_enabled, cache_size, default_limit, query_timeout_ms |
Persistent query indexes |
bsg.storage |
enabled, backend, registry_path, content_scope, cloud_sync_ready, mmap_enabled, retention.* |
Durable artifact registry and retention |
hooks |
enabled, include |
Git client-side hook automation pointer |
Environment Variables (Common)
| Variable | Default | Description |
|---|---|---|
BATHO_LOG_LEVEL |
INFO |
DEBUG, INFO, WARNING, ERROR |
BATHO_LOG_JSON |
null |
Force JSON logs (true) or leave auto mode (unset) |
BATHO_LOG_QUIET |
false |
Suppress non-error output globally |
BATHO_LOG_FILE |
unset | Optional log file path |
BATHO_CTN_DIR |
.ctn |
Output directory |
BATHO_MAX_FILE_SIZE_KB |
500 |
Max file size to parse |
BATHO_MAX_INDEXED_FILES |
200000 |
Hard cap on indexed files |
BATHO_INDEX_WORKERS |
0 |
Worker threads (0 = auto) |
BATHO_METRICS_OUTPUT |
.ctn/metrics.json |
Metrics output path |
BATHO_RULES_ENABLED |
config value | Enable BSG rule plugin stage |
BATHO_RULES_CUSTOM_RULES_PATH |
unset | YAML file containing custom BSG rules |
BATHO_RULES_BUILTIN_PLUGINS |
bsg_core |
Comma-separated built-in plugin names |
BATHO_RULES_DISABLED_RULES |
unset | Comma-separated rule names to disable |
BATHO_BSG_STORAGE_ENABLED |
true |
Enable durable artifact registry |
BATHO_BSG_STORAGE_REGISTRY_PATH |
.ctn/artifact_registry.db |
Registry database path |
BATHO_BSG_STORAGE_MMAP_ENABLED |
false |
Enable mmap reads for large persisted JSON |
BATHO_BSG_QUERY_INDEX_ON_WRITE |
true |
Build query index at write time |
BATHO_BSG_QUERY_CACHE_SIZE |
256 |
Query service cache size |
For the complete env override set, see
batho/config.py.
Config File
# ./batho.yaml
logging:
level: DEBUG
json_format: true
quiet: false
file: .ctn/batho.log
format: "%(message)s"
indexer:
max_file_size_kb: 1000
max_workers: 16
ignore_patterns:
- "**/vendor/**"
- "**/dist/**"
flags:
strict: true
fail_on_warning: true
rules:
enabled: true
builtin_plugins: [bsg_core]
disabled_rules: []
custom_rules_path: ./bsg-rules.yaml
custom_rules_inline:
- name: payment-cluster
entity_types: ["function", "method"]
name_patterns: ["*payment*", "*invoice*"]
metadata:
bsg.cluster_hint: billing
# Validation controls
strict_validation: false
fail_on_rule_error: false
bsg:
parallel:
enabled: true
max_workers: 16
chunk_size: 50
cache:
enabled: true
path: .ctn/local/cache/ast_cache.db
max_size_mb: 1024
ttl_days: 30
query:
enabled: true
index_on_write: true
cache_enabled: true
cache_size: 256
default_limit: 200
storage:
enabled: true
backend: sqlite
registry_path: .ctn/artifact_registry.db
content_scope: durable
cloud_sync_ready: true
mmap_enabled: false
retention:
enabled: true
snapshot_ttl_days: 90
patch_ttl_days: 90
metrics_ttl_days: 30
context_ttl_days: 90
Scenario Playbooks
1) Local Dev (fast feedback)
indexer:
max_workers: 0
max_file_size_kb: 500
bsg:
incremental:
enabled: true
cache:
enabled: true
batho index --root .
batho patch --root . --scan
batho bsg --root . --mode compressed --budget 12000
2) Large Monorepo (throughput)
indexer:
max_file_size_kb: 2000
bsg:
parallel:
enabled: true
max_workers: 16
ignore:
enabled: true
file: .bathoignore
storage:
mmap_enabled: true
batho index --root /repo --snapshot
batho storage stats --root /repo
batho query --root /repo --relationship-type calls --limit 200
3) CI/CD (deterministic + observable)
logging:
level: INFO
json_format: true
indexer:
metrics_output: .ctn/metrics.json
bsg:
storage:
enabled: true
batho index --root . --log-json --snapshot
batho stats --root .
batho storage verify --root .
4) Persistent Storage Hygiene (cloud-sync-ready v1)
# register existing artifacts
batho storage backfill --root .
# verify and repair drift
batho storage verify --root . --repair
# inspect registry + graph cache health
batho storage stats --root .
# rebuild query indexes
batho storage rebuild-indexes --root .
# retention dry-run / apply
batho storage cleanup --root .
batho storage cleanup --root . --apply
# deduplicate registry (dry-run first)
batho storage compact --root .
batho storage compact --root . --apply
BSG Rule Plugins
Batho now applies BSG rules through internal plugin modules, not the root rules folder.
- Built-in rules are loaded from packaged plugins (default:
bsg_core). - Custom rules can be defined inline in
batho.yamlviarules.custom_rules_inline. - Custom rules can also be loaded from
rules.custom_rules_pathYAML files. - Rule actions currently focus on deterministic metadata enrichment for graph entities (for example
bsg.category,bsg.scope_tier,bsg.service_tag).
Custom rules YAML accepts either a top-level list or a rules: list.
rules:
- name: mark-test-files
file_patterns: ["tests/**", "**/*_test.py"]
metadata:
bsg.category: TEST
- name: derive-service-tag
file_patterns: ["services/*/**"]
actions:
derive_service_tag: true
Using Batho with AI
Batho is built to power AI-assisted development. Here are common patterns:
Feed LLM Context
# Generate compressed bsg for LLM injection
batho bsg --root . --mode compressed --budget 12000
# → Output saved to .ctn/{index_id}/bsg_compressed.json
# → Load and inject into your LLM prompt as codebase context
Or programmatically:
import json
from pathlib import Path
# Load compressed bsg generated by CLI
with open('.ctn/{index_id}/bsg_compressed.json', 'r') as f:
data = json.load(f)
compressed_text = data['compressed_text']
stats = data['stats']
# → Inject 'compressed_text' into your LLM prompt as codebase context
Codebase Q&A
# Find all functions that call 'authenticate'
for rel in graph.relationships:
target = graph.get_entity(rel.target_id)
if target and target.name == "authenticate":
source = graph.get_entity(rel.source_id)
print(f"{source.name} → authenticate ({source.file})")
Use Batho as a Python Library (Custom Scripts)
Batho is not only a CLI. You can import it as a Python library to build custom automation scripts, CI workflows, and internal developer tools.
Public Python API
The batho package exports core APIs directly:
- Indexing and graph:
CodeGraphIndexer,InMemoryGraph,BSGMap - Time Machine:
create_snapshot,list_snapshots,load_snapshot,diff_snapshots - Incremental patching:
FileChange,FileChangeType,FileChangeTracker,incremental_patch - Git-aware change discovery:
get_changed_file_status_since - Query layer:
QueryService
Example: Index + Snapshot from a Script
from pathlib import Path
from batho import BSGMap, CodeGraphIndexer, create_snapshot
root = Path(".").resolve()
ctn_dir = root / ".ctn"
ctn_dir.mkdir(parents=True, exist_ok=True)
indexer = CodeGraphIndexer(cache_path=str(ctn_dir / "file_cache.json"), root=str(root))
graph = indexer.build_graph(root=str(root), snapshot_id="script-run")
bsg = BSGMap.build(graph, root=str(root))
snapshot_id = create_snapshot(ctn_dir, root, graph, bsg, label="nightly-script")
print({"entities": len(graph.entities), "relationships": len(graph.relationships), "snapshot": snapshot_id})
Example: Incremental Patch in Automation
from pathlib import Path
from batho import FileChangeTracker, incremental_patch
root = Path(".").resolve()
ctn_dir = root / ".ctn"
base_snapshot_id = "<existing_snapshot_id>"
tracker = FileChangeTracker(root)
hash_cache_path = ctn_dir / "file_hashes.json"
tracker.load(hash_cache_path)
changes = tracker.scan_for_changes(max_file_size_kb=500)
tracker.save(hash_cache_path)
if changes:
result = incremental_patch(ctn_dir, base_snapshot_id, changes)
print(result)
else:
print("No changes detected")
Example: Query Indexed Data Programmatically
from pathlib import Path
from batho import QueryService
ctn_dir = Path(".ctn")
query = QueryService(ctn_dir)
functions = query.entities_by_type("function", limit=20)
for row in functions:
print(f"{row['name']} -> {row['file']}")
Impact Analysis (Pre-Refactoring)
# Find every caller of a function before changing it
for rel in graph.relationships:
if rel.target_id == target_id and rel.type.name == "CALLS":
caller = graph.get_entity(rel.source_id)
print(f" Will be affected: {caller.name} in {caller.file}")
RAG / Vector Embedding
batho index --root /path/to/repo
batho bsg --root /path/to/repo --mode compressed
# → Embed .ctn/*/bsg_compressed.json chunks into your vector DB
Agentic AI
Autonomous agents can use Batho's structured graph to navigate codebases, resolve imports, and understand call chains — without reading every file.
Integrations
CI/CD (GitHub Actions)
name: Code Index
on: [push, pull_request]
jobs:
index:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- run: pip install batho
- run: batho index --root . --verbose --log-json --snapshot
- run: batho stats --root .
- uses: actions/upload-artifact@v4
with:
name: batho-output
path: .ctn/
Pre-commit Hook
# .pre-commit-config.yaml
- repo: local
hooks:
- id: batho-index
name: Batho Code Index
entry: batho index --root .
language: system
pass_filenames: false
always_run: true
- id: batho-patch
name: Batho Incremental Patch
entry: batho patch --root . --scan
language: system
pass_filenames: false
always_run: true
VS Code Task
{
"version": "2.0.0",
"tasks": [{
"label": "Batho Index",
"type": "shell",
"command": "batho index --root ${workspaceFolder} --verbose --snapshot"
},
{
"label": "Batho Patch",
"type": "shell",
"command": "batho patch --root ${workspaceFolder} --scan"
}]
}
Security & Compliance
| Guarantee | Details |
|---|---|
| Parse-only | Batho never executes your code — safe on untrusted repos |
| Binary detection | Magic bytes + Shannon entropy analysis |
| Ignore rules | Respects .gitignore and .bathoignore |
| Atomic writes | Temp file + rename — no partial outputs on crash |
| Fully offline | Zero network calls — runs air-gapped |
For regulated environments, add SBOM and license checks in CI:
pip install cyclonedx-bom && cyclonedx-py -o sbom.xml
pip install pip-licenses && pip-licenses --allow-only "Apache Software License"
Performance
| Repo Size | Workers (auto) | Typical Time |
|---|---|---|
| < 50 files | 4 | < 2s |
| 50–200 files | 8 | 2–5s |
| 200–1K files | 16 | 5–15s |
| 1K+ files | 32 | varies |
Tips for large monorepos (2M+ LOC):
- Run on fast local SSD
- Use
--log-jsonto reduce console overhead - Add build artifacts to
.bathoignore:node_modules/ vendor/ dist/ build/ __pycache__/
Architecture
batho/
├── batho_cli.py # CLI command entrypoints
└── batho/
├── __init__.py # Public Python API exports
├── config.py # Configuration and env overrides
├── time_machine.py # Snapshots, diffs, incremental patching
├── context/
│ ├── codegraph.py # Graph indexing and extraction pipeline
│ ├── pipeline.py # Parallel worker orchestration
│ ├── bsg_map.py # Multi-format BSG renderer
│ ├── query.py # Query service over persisted artifacts
│ └── languages/ # Per-language tree-sitter extractors
└── utils/
├── logging.py # Structured logging
├── hash.py # SHA-256 helpers
└── ignore.py # .gitignore / .bathoignore handling
Contributing
Batho is open source and welcomes contributions. Whether it's a bug report, a new language extractor, or a docs improvement — we'd love your help.
- Fork the repo
- Create a feature branch
- Run the test suite:
uv run pytest - Submit a pull request
License
Apache 2.0 — see LICENSE
🎉 Thank You!
Ready to get started? Install Batho and index your first project in 30 seconds.
🚀 Batho v1.0.0 - Code Intelligence for the AI Era
PyPI · Issues · Discussions · Full Documentation
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file batho-0.1.4.tar.gz.
File metadata
- Download URL: batho-0.1.4.tar.gz
- Upload date:
- Size: 1.6 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
08f4cab18b310d60d1accfaa97367da7c6b842f6f1868f4c5bc3f386471193de
|
|
| MD5 |
e6cc97ca016cc196e3cc34de56b35820
|
|
| BLAKE2b-256 |
1f8c4157e5556b20aabbb7fd09100d703064f97c0505d0958f97caf8e0049dc0
|
File details
Details for the file batho-0.1.4-py3-none-any.whl.
File metadata
- Download URL: batho-0.1.4-py3-none-any.whl
- Upload date:
- Size: 516.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
69eab1b0ca954e6667e6d3633b472ff59b4a9bcac56f1a8e4cbcbb9afca09ac7
|
|
| MD5 |
b48e593bd48ec169317757d9678cf7f7
|
|
| BLAKE2b-256 |
9e39e91ab2951f73359da4baa8738eb9a433f814eb7b5dca8fbf87ac82ee7d46
|