Skip to main content

Enterprise-grade multi-language source code indexer and semantic code intelligence engine powered by tree-sitter AST parsing, BSG compression, and relational code graph snapshots.

Project description

Batho

B.A.T.H.O

Bidirectional AST Traversal & Hypergraph Orchestrator
BATHO indexes your codebase, compresses the result for LLM context windows, and tracks changes over time.

PyPI License Documentation


📚 Official Documentation

For complete documentation, guides, and API reference, visit batho.sageoz.org.

🚀 Quick Start

# Install
pip install batho / uv add batho

# Build full repository index
batho build --root .

# Exports the latest Batho artifact into a transportable ZIP by default ("artifact_<dir>.batho").
batho export --root .

# Incrementally patch changed files (using native content hashes)
batho patch --root .

# Query granular node changes between runs or for a specific file/entity
batho diff --file batho/cli/build.py

# Verify artifact health and repair integrity anomalies
batho fix --deep

✨ Key Features

  • 40+ language AST parsing — Python, TypeScript, Rust, Go, Java, and more
  • 10x context compression — Fit entire codebases into LLM context windows
  • Time Machine snapshots — Track codebase evolution with incremental patching
  • BSG Plugin System v2 — 38 built-in plugins (28 foundation + 10 interceptors) for security, quality, and optimization
  • Single-Pass Extraction — One parse per file; cross-file references emit contextual stubs (EntityType.UNRESOLVED) resolved post-extraction by ScopeManager
  • Deterministic IDs — Position-based ID generation for stable entity tracking (no false positives from code movement)
  • Graph Optimization — Cyclic dependency detection and orphan node pruning
  • Dependency-Aware Resolution — CDEU module resolves stdlib and third-party symbols (pip, npm, cargo, go) via manifest parsing and live introspection
  • Symbol Resolution — Cross-file symbol resolution with hierarchical encoding and SymbolRole tagging
  • Arrow IPC Artifact Store — Three-blob design (agent/storage/rel views) written to .batho/artifact/ via BathoBundle; zero-copy memory-mapped reads
  • BSG Store — Persistent entity/relationship graph in .batho/bsg/current/ via BsgScratchStore; streaming flush + compaction for large repos
  • File Changelog — Node-level diff history tracking per run
  • Integrity Verification — Multi-stage fix command with auto-repair capabilities
  • Zero code execution — Safe to run in CI or on untrusted repos

🛠️ CLI Commands & Usage

Batho provides a clean command-line interface:

1. Build Index (batho build)

Parses all repository source files and builds the initial AST code graph database.

batho build --root .
  • --full: Force a full rebuild by deleting the existing database first.
  • --max-workers <N>: Maximum parallel workers for parsing (defaults to CPU count).
  • --max-file-size-kb <KB>: Skip files exceeding this size (default: 500 KB).

2. Incremental Patch (batho patch)

Scans the filesystem for modified/added files, re-parses them, and updates the index using copy-on-write database transactions.

batho patch --root .
  • --max-file-size-kb <KB>: Skip files exceeding this size during hash scans.

3. Export (batho export)

Exports the latest BSG artifact into a transportable ZIP by default (<root>/artifact_<dir>.batho). Use --json to export a JSON view instead (default: <root>/batho_export.json).

batho export --root .
  • --root <path>: Repository root directory (default: .).
  • --json: Export a JSON view instead of the default transport ZIP artifact.
  • --view <view>: JSON view format (default: storage with --json): storage, agent, overview, files, symbols, dependencies, delta, rel.
  • --output <path>: Custom file path.
  • --filter <glob>: Narrow exported files (e.g. src/**/*.py).
  • --category <category>: Filter by code category (source, test, doc, config, infra, all).
  • --token-budget <N>: Maximum token budget for agent view.
  • --baseline <path>: Baseline export file (required for delta view).
  • --rel: Include relationship lists in the export.
  • --index-id <ID>: Export a specific index run UUID (default: latest).
  • --format <format>: JSON output format (json or pretty; default: json).

4. Database Integrity (batho fix)

Performs multi-stage database verification (dbstateblobsgraph) and executes repair routines.

batho fix --deep
  • --deep: Full validation (decompresses and checks zstd payload blobs).
  • --dry-run: Diagnoses issues without committing repairs.
  • --target <target>: Target a specific checker (db, state, blobs, graph, all).
  • --phase <1-4>: Run a specific verification phase.
  • --parallel: Run independent checks concurrently.
  • --format <text|json|csv>: Report format.

5. Node History (batho diff)

Queries exact, granular changes across runs, files, or specific symbols.

batho diff --file batho/cli/build.py
  • --run <run_uuid>: Changes made in a specific run.
  • --entity <entity_id>: Full evolution history of a symbol.
  • --file <rel_path>: All node-level changes in a file across runs.
  • --since <run_uuid>: Bounded history start (only with --entity).
  • --json: Format output as JSON.

6. Storage Maintenance (batho gc)

Manages database runs, prunes old history, and vacuums database pages.

batho gc vacuum
  • run <run_uuid>: Delete a specific indexing run.
  • runs --older-than <days>: Prune runs older than a threshold.
  • status: Display storage metrics.
  • vacuum: Reclaim disk space.

🔄 Migration from v1 to v1.1.0

Breaking Changes:

  1. Unified Configuration — All config is now in a single batho.yaml file

    • Old: Multiple config files scattered across the codebase
    • New: Single batho.yaml with all settings (indexer, bsg, rules, storage)
    • Action: Delete old config files; run batho build to auto-generate batho.yaml
  2. Storage format — SQLite .batho artifact replaced by Apache Arrow IPC

    • Old: artifact_<dirname>.batho SQLite database
    • New: .batho/artifact/*.ipc + .batho/bsg/current/*.ipc (Arrow IPC files)
    • Action: Run batho build --root . --full to rebuild from scratch
  3. Config key renamepaths.db_path removed; replaced by paths.artifact_dir

    • Old: paths.db_path: "{root}"
    • New: paths.artifact_dir: .batho/artifact (env: BATHO_ARTIFACT_DIR)
    • Action: Remove db_path from your batho.yaml; add artifact_dir if custom path needed

New Defaults:

  • Symbol resolution is enabled by default (can be disabled in config)
  • Arrow IPC artifact store at .batho/artifact/ (no SQLite dependency)
  • BSG plugins are enabled by default (38 built-in security/quality plugins)
  • Parallel processing is enabled by default (up to 16 workers)

Recommended Migration Steps:

# 1. Delete old SQLite artifact (format incompatible)
rm -rf .batho/

# 2. Let Batho regenerate batho.yaml with current defaults
batho build --root .

# 3. Customize batho.yaml as needed (see docs/config.md)

⚙️ Configuration

Batho works out of the box with zero config. For production use, configure with ./batho.yaml:

schema_version: batho-config.v1

logging:
  level: ERROR
  quiet: false

paths:
  artifact_dir: .batho/artifact   # Arrow IPC artifact store (override: BATHO_ARTIFACT_DIR)
  cache_dir: .batho/cache
  bsg_dir: .batho/bsg

indexer:
  max_file_size_kb: 500
  ignore_patterns: []

flags:
  strict: false
  audit_log_enabled: true

bsg:
  cache:
    enabled: true
    max_size_mb: 1024
  symbol_resolution:
    enabled: true
  bidirectional:
    enabled: true

�️ Roadmap & Backlog

Backlog (Future Releases)

Feature Description Status
Fleet Intelligence Multi-repo discovery, symbol routing, cross-repo impact analysis 0% — Not started
MCP Hub Model Context Protocol server for AI agent integration Not started
Cloud Sync Remote artifact storage and synchronization Not started
Call-chain Analysis Analyze function call graphs and dependencies Not started

�� Installation

pip install batho

PyPI: https://pypi.org/project/batho/

🛠️ Developer Setup

# Clone the repository
git clone https://github.com/sageoz/batho.git
cd batho

# Install dependencies
uv sync --all-groups --all-extras

# Run tests
uv run pytest

# Run CLI from source
uv run python -m batho_cli --help

📄 License

Apache License 2.0 - see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

batho-1.1.0.tar.gz (1.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

batho-1.1.0-py3-none-any.whl (348.0 kB view details)

Uploaded Python 3

File details

Details for the file batho-1.1.0.tar.gz.

File metadata

  • Download URL: batho-1.1.0.tar.gz
  • Upload date:
  • Size: 1.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for batho-1.1.0.tar.gz
Algorithm Hash digest
SHA256 9f99faf5ba8c25fc072f3a7d2bb56b120ad8776239cf7cc7e986c0e9cfdb5950
MD5 aee0640c137c7bcd8569d490ad1ea47a
BLAKE2b-256 3990eaa16b9e40162062c26c7b2586eb397038c102179a03adeabaf923de3295

See more details on using hashes here.

File details

Details for the file batho-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: batho-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 348.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for batho-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f942722576f34681a835561a299790eb791ca1a44b7a5464f8c5ec4521752252
MD5 86b2a1e1e7d4b39f8b97666b72cad0ed
BLAKE2b-256 61d9e6d62ccbb57a5ab5ba10e8103c6b6e46497615134160f9e487f2b5327f3d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page