Enterprise-grade multi-language source code indexer and semantic code intelligence engine powered by tree-sitter AST parsing, BSG compression, and relational code graph snapshots.
Project description
B.A.T.H.O
Bidirectional AST Traversal & Hypergraph Orchestrator
BATHO indexes your codebase, compresses the result for LLM context windows, and tracks changes over time.
📚 Official Documentation
For complete documentation, guides, and API reference, visit batho.sageoz.org.
🚀 Quick Start
# Install
pip install batho / uv add batho
# Build full repository index
batho build --root .
# Exports the latest Batho artifact into a transportable ZIP by default ("artifact_<dir>.batho").
batho export --root .
# Incrementally patch changed files (using native content hashes)
batho patch --root .
# Query granular node changes between runs or for a specific file/entity
batho diff --file batho/cli/build.py
# Verify artifact health and repair integrity anomalies
batho fix --deep
✨ Key Features
- 40+ language AST parsing — Python, TypeScript, Rust, Go, Java, and more
- 10x context compression — Fit entire codebases into LLM context windows
- Time Machine snapshots — Track codebase evolution with incremental patching
- BSG Plugin System v2 — 38 built-in plugins (28 foundation + 10 interceptors) for security, quality, and optimization
- Single-Pass Extraction — One parse per file; cross-file references emit contextual stubs (
EntityType.UNRESOLVED) resolved post-extraction by ScopeManager - Deterministic IDs — Position-based ID generation for stable entity tracking (no false positives from code movement)
- Graph Optimization — Cyclic dependency detection and orphan node pruning
- Dependency-Aware Resolution — CDEU module resolves stdlib and third-party symbols (pip, npm, cargo, go) via manifest parsing and live introspection
- Symbol Resolution — Cross-file symbol resolution with hierarchical encoding and SymbolRole tagging
- Arrow IPC Artifact Store — Three-blob design (agent/storage/rel views) written to
.batho/artifact/viaBathoBundle; zero-copy memory-mapped reads - BSG Store — Persistent entity/relationship graph in
.batho/bsg/current/viaBsgScratchStore; streaming flush + compaction for large repos - File Changelog — Node-level diff history tracking per run
- Integrity Verification — Multi-stage fix command with auto-repair capabilities
- Zero code execution — Safe to run in CI or on untrusted repos
🛠️ CLI Commands & Usage
Batho provides a clean command-line interface:
1. Build Index (batho build)
Parses all repository source files and builds the initial AST code graph database.
batho build --root .
--full: Force a full rebuild by deleting the existing database first.--max-workers <N>: Maximum parallel workers for parsing (defaults to CPU count).--max-file-size-kb <KB>: Skip files exceeding this size (default: 500 KB).
2. Incremental Patch (batho patch)
Scans the filesystem for modified/added files, re-parses them, and updates the index using copy-on-write database transactions.
batho patch --root .
--max-file-size-kb <KB>: Skip files exceeding this size during hash scans.
3. Export (batho export)
Exports the latest BSG artifact into a transportable ZIP by default (<root>/artifact_<dir>.batho). Use --json to export a JSON view instead (default: <root>/batho_export.json).
batho export --root .
--root <path>: Repository root directory (default:.).--json: Export a JSON view instead of the default transport ZIP artifact.--view <view>: JSON view format (default:storagewith--json):storage,agent,overview,files,symbols,dependencies,delta,rel.--output <path>: Custom file path.--filter <glob>: Narrow exported files (e.g.src/**/*.py).--category <category>: Filter by code category (source,test,doc,config,infra,all).--token-budget <N>: Maximum token budget for agent view.--baseline <path>: Baseline export file (required fordeltaview).--rel: Include relationship lists in the export.--index-id <ID>: Export a specific index run UUID (default: latest).--format <format>: JSON output format (jsonorpretty; default:json).
4. Database Integrity (batho fix)
Performs multi-stage database verification (db → state → blobs → graph) and executes repair routines.
batho fix --deep
--deep: Full validation (decompresses and checks zstd payload blobs).--dry-run: Diagnoses issues without committing repairs.--target <target>: Target a specific checker (db,state,blobs,graph,all).--phase <1-4>: Run a specific verification phase.--parallel: Run independent checks concurrently.--format <text|json|csv>: Report format.
5. Node History (batho diff)
Queries exact, granular changes across runs, files, or specific symbols.
batho diff --file batho/cli/build.py
--run <run_uuid>: Changes made in a specific run.--entity <entity_id>: Full evolution history of a symbol.--file <rel_path>: All node-level changes in a file across runs.--since <run_uuid>: Bounded history start (only with--entity).--json: Format output as JSON.
6. Storage Maintenance (batho gc)
Manages database runs, prunes old history, and vacuums database pages.
batho gc vacuum
run <run_uuid>: Delete a specific indexing run.runs --older-than <days>: Prune runs older than a threshold.status: Display storage metrics.vacuum: Reclaim disk space.
🔄 Migration from v1 to v1.1.0
Breaking Changes:
-
Unified Configuration — All config is now in a single
batho.yamlfile- Old: Multiple config files scattered across the codebase
- New: Single
batho.yamlwith all settings (indexer, bsg, rules, storage) - Action: Delete old config files; run
batho buildto auto-generatebatho.yaml
-
Storage format — SQLite
.bathoartifact replaced by Apache Arrow IPC- Old:
artifact_<dirname>.bathoSQLite database - New:
.batho/artifact/*.ipc+.batho/bsg/current/*.ipc(Arrow IPC files) - Action: Run
batho build --root . --fullto rebuild from scratch
- Old:
-
Config key rename —
paths.db_pathremoved; replaced bypaths.artifact_dir- Old:
paths.db_path: "{root}" - New:
paths.artifact_dir: .batho/artifact(env:BATHO_ARTIFACT_DIR) - Action: Remove
db_pathfrom yourbatho.yaml; addartifact_dirif custom path needed
- Old:
New Defaults:
- Symbol resolution is enabled by default (can be disabled in config)
- Arrow IPC artifact store at
.batho/artifact/(no SQLite dependency) - BSG plugins are enabled by default (38 built-in security/quality plugins)
- Parallel processing is enabled by default (up to 16 workers)
Recommended Migration Steps:
# 1. Delete old SQLite artifact (format incompatible)
rm -rf .batho/
# 2. Let Batho regenerate batho.yaml with current defaults
batho build --root .
# 3. Customize batho.yaml as needed (see docs/config.md)
⚙️ Configuration
Batho works out of the box with zero config. For production use, configure with ./batho.yaml:
schema_version: batho-config.v1
logging:
level: ERROR
quiet: false
paths:
artifact_dir: .batho/artifact # Arrow IPC artifact store (override: BATHO_ARTIFACT_DIR)
cache_dir: .batho/cache
bsg_dir: .batho/bsg
indexer:
max_file_size_kb: 500
ignore_patterns: []
flags:
strict: false
audit_log_enabled: true
bsg:
cache:
enabled: true
max_size_mb: 1024
symbol_resolution:
enabled: true
bidirectional:
enabled: true
�️ Roadmap & Backlog
Backlog (Future Releases)
| Feature | Description | Status |
|---|---|---|
| Fleet Intelligence | Multi-repo discovery, symbol routing, cross-repo impact analysis | 0% — Not started |
| MCP Hub | Model Context Protocol server for AI agent integration | Not started |
| Cloud Sync | Remote artifact storage and synchronization | Not started |
| Call-chain Analysis | Analyze function call graphs and dependencies | Not started |
�� Installation
pip install batho
PyPI: https://pypi.org/project/batho/
🛠️ Developer Setup
# Clone the repository
git clone https://github.com/sageoz/batho.git
cd batho
# Install dependencies
uv sync --all-groups --all-extras
# Run tests
uv run pytest
# Run CLI from source
uv run python -m batho_cli --help
📄 License
Apache License 2.0 - see LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file batho-1.1.0.tar.gz.
File metadata
- Download URL: batho-1.1.0.tar.gz
- Upload date:
- Size: 1.2 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9f99faf5ba8c25fc072f3a7d2bb56b120ad8776239cf7cc7e986c0e9cfdb5950
|
|
| MD5 |
aee0640c137c7bcd8569d490ad1ea47a
|
|
| BLAKE2b-256 |
3990eaa16b9e40162062c26c7b2586eb397038c102179a03adeabaf923de3295
|
File details
Details for the file batho-1.1.0-py3-none-any.whl.
File metadata
- Download URL: batho-1.1.0-py3-none-any.whl
- Upload date:
- Size: 348.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f942722576f34681a835561a299790eb791ca1a44b7a5464f8c5ec4521752252
|
|
| MD5 |
86b2a1e1e7d4b39f8b97666b72cad0ed
|
|
| BLAKE2b-256 |
61d9e6d62ccbb57a5ab5ba10e8103c6b6e46497615134160f9e487f2b5327f3d
|