Skip to main content

Enterprise-ready, high-speed, multi-language code indexer with BSG compression and Time Machine snapshots/diffs — no LLM required.

Project description

Batho

B.A.T.H.O

Bidirectional AST Traversal & Hypergraph Orchestrator
BATHO indexes your codebase, compresses the result for LLM context windows, and tracks changes over time.

PyPI License Languages


A code intelligence engine that transforms raw, massive codebases into queryable, Time-Aware Structured Graph. By safely parsing source code into an AST and extracting a highly structured relational Hypergraph, Batho acts as the ultimate memory layer for your repositories and codebase. Whether you are generating token-budgeted context to prevent AI agent amnesia, enforcing governance via webhook orchestration, or tracking code changes over time using time-based structured graph snapshots.


Quick Start

Get running in 30 seconds:

# Install
uv add batho 
# or
pip install batho

# Index your project
batho index --root . --verbose --snapshot

# Generate compressed bsg for LLM injection
batho bsg --root . --mode compressed --budget 12000

# Create snapshot
batho index --root . --snapshot  

# Auto-detect and patch changes
batho patch --root . --scan

# Install Git hooks for automated checks
batho hooks install --all

# Start the artifact bridge (REST API + MCP server)
batho bridge serve --root .
batho bridge mcp --transport stdio

# Show all commands
batho --help

Batho scans your codebase, extracts every function, class, import, and relationship, and writes structured output to .ctn/.

Why Batho?

Modern AI tools need structured code understanding — not just raw file contents. Batho bridges that gap.

What you get Why it matters
40+ language AST parsing One tool for polyglot repos — Python, TypeScript, Rust, Go, Java, and more
10x context compression Fit entire codebases into LLM context windows
Time Machine snapshots Track how your codebase evolves between releases
Zero Code Execution Safe to run in CI, pre-commit, or on untrusted repos
Caching mtime+SHA skips unchanged files — re-indexes in seconds
CI/CD Pipeline Hooks Turnkey GitHub Actions and GitLab CI templates
Incremental patching 10-100x faster updates with complete lineage tracking

How It Works

  1. Parse — tree-sitter extracts functions, classes, variables, imports with full signatures
  2. Graph — Entities and relationships (IMPORTS, CALLS, USES, DEFINES) form a code graph
  3. Compress — BSG renders the graph in multiple formats: compressed, full, JSON, hierarchical
  4. Track — Time Machine snapshots let you diff code intelligence over time

Features

Multi-Language AST Extraction

Batho uses tree-sitter for precise, language-aware parsing:

  • Functions — name, signature, parameters, return type, docstring
  • Classes — name, base classes, methods, attributes
  • Interfaces/Traits — method signatures
  • Variables — declarations, types, assignments
  • Imports — module paths, selective imports

Relationships captured: IMPORTS · CALLS · USES · DEFINES

BSG (Batho Structured Graph) Compression

Transforms full code graphs into compact representations:

# Generate full bsg with signatures
batho bsg --root . --mode full

# Generate hierarchical directory view
batho bsg --root . --mode hierarchical

# Generate compressed bsg for LLM injection
batho bsg --root . --mode compressed --budget 12000
Mode Best for Output File
Full Developer reference with signatures + line numbers bsg_full.json
Hierarchical Directory-tree overviews bsg_hierarchical.json
Compressed LLM prompt injection (4K–40K tokens) bsg_compressed.json

Batho Time Machine

batho index --root . --snapshot                    # Create snapshot
batho snapshots --root .                           # List all snapshots
batho diff-snapshots --root . SNAP_A SNAP_B        # Compare versions

Versioned snapshots with UUID + timestamp, entity/relationship diffs, and staleness scoring for automated re-indexing.

Incremental Patching with Tracking

# Auto-detect and patch changes
batho patch --root . --scan

# List all patch operations
batho patches --root . --format timeline

# Show detailed patch info
batho patch-info --root . --patch-id ID

# Apply patch from diff file
batho apply-patch --root . --base-snapshot ID --diff-file changes.diff

# Cherry-pick patch to different snapshot
batho cherry-pick --root . --patch-id ID --target-snapshot ID

Smart Indexing

  • mtime + SHA-256 cache — unchanged files are skipped instantly
  • Parallel extraction — auto-scaled threads (CPU × 2, capped at 32)
  • Binary detection — magic bytes + entropy analysis
  • Ignore support.gitignore + .bathoignore via pathspec
  • Per-file isolation — one bad file never aborts the scan

Stack Detection

Automatically identifies your tech stack from config files:

Category Frameworks / Tools
Python FastAPI, Django, Flask
Node.js React, Vue, Express, NestJS
Java Spring, Maven, Gradle
.NET ASP.NET, Entity Framework
Go Gin, Echo
Ruby Rails, Sinatra
Rust Cargo
Mobile Android, iOS
Data/ML PyTorch, TensorFlow, Pandas

Supported Languages

Category Languages
Web / Backend Python, TypeScript, JavaScript, Go, Java, Ruby, PHP, C#, Scala, Kotlin
Systems Rust, C, C++, Zig, Objective-C
Mobile Swift, Kotlin (Android), Objective-C (iOS)
Functional Haskell, Erlang, OCaml, Elixir, Julia, Agda
Scripting Bash, Perl, Lua, R
Other Dart, Verilog, Hack
Markup / Config JSON, YAML, TOML, HTML, CSS/SCSS/SASS/LESS, Markdown, HCL/Terraform

Parser availability depends on installed tree_sitter_language_pack grammars. Missing grammars are skipped gracefully.


Installation

pip install batho          # pip
uv pip install batho       # uv
pip install -e .           # development (editable)

Developer Setup (uv)

Use this section when you want to contribute to Batho locally, run tests, and verify the CLI from source.

1. Clone the repository

git clone https://github.com/sageoz/batho.git
cd batho

2. Install project dependencies for development and testing

uv sync --all-groups --all-extras

This creates and syncs the project environment with runtime, test, and dev dependencies.

3. Run tests

# Full suite
uv run pytest

# Optional: focused checks while iterating
uv run pytest tests/core/test_config.py -q
uv run pytest tests/utils/test_logging.py -q

4. Run the CLI directly from local source

This path is best during development because it always uses your current working tree.

uv run python -m batho_cli --help
uv run python -m batho_cli index --root .

5. Reinstall the global batho command from your local source

Use this when you want the plain batho command to reflect your latest local code.

uv tool install --reinstall .
hash -r
batho index --root .

6. Quick troubleshooting

If behavior differs between local and global runs, compare both paths:

uv run python -m batho_cli index --root .
batho index --root .

If they differ, reinstall the tool again:

uv tool install --reinstall .
hash -r

CLI Reference

# Show all commands
batho --help

# Show command-specific help
batho <command> --help

Command Matrix

Command Purpose
index Build/update graph + BSG artifacts for a repo
stats Show current index metadata and health summary
snapshots List stored snapshots
diff-snapshots Diff two snapshots
patch Apply incremental updates from scan/diff/files
patches List patch operations
patch-info Show patch operation details
patch-chain Show chain of patches for a snapshot
apply-patch Apply patch by diff file or patch id
cherry-pick Apply a patch to another snapshot
sync Sync pending artifacts to configured cloud endpoint
hooks Git client-side hook management (install/remove/run)
invalidate Clear index file cache
cache AST cache management (stats, invalidate, clear)
storage Persistent artifact registry tools (backfill, verify, cleanup, stats, rebuild-indexes, compact)
query Query persisted entity/relationship indexes
bsg Render BSG outputs (compressed, full, hierarchical)

Indexing & Snapshots

# Full index
batho index --root /path/to/repo --verbose

# Force full rebuild (disable incremental path)
batho index --root /path/to/repo --full

# Force cache reset before indexing (clears file cache + AST cache)
batho index --root /path/to/repo --force

# Deterministic fresh parse run (bypass AST cache for this invocation)
batho index --root /path/to/repo --force --no-ast-cache --verbose

# Index and create snapshot
batho index --root /path/to/repo --snapshot --snapshot-label "release-candidate"

# Snapshot inspection
batho snapshots --root /path/to/repo
batho diff-snapshots --root /path/to/repo --snapshot-a SNAP_A --snapshot-b SNAP_B

Patch Lifecycle

# Auto-detect file changes and patch
batho patch --root /path/to/repo --scan

# Patch from unified diff
batho patch --root /path/to/repo --diff /path/to/changes.diff

# Patch specific files
batho patch --root /path/to/repo src/a.py src/b.py

# Patch history and details
batho patches --root /path/to/repo --format timeline
batho patch-info --root /path/to/repo --patch-id PATCH_ID --format summary
batho patch-chain --root /path/to/repo --snapshot-id SNAP_ID --full

# Advanced patch operations
batho apply-patch --root /path/to/repo --base-snapshot SNAP_ID --diff-file /path/to/changes.diff
batho cherry-pick --root /path/to/repo --patch-id PATCH_ID --target-snapshot SNAP_ID

BSG Rendering & Querying

# Render BSG formats
batho bsg --root /path/to/repo --mode compressed --budget 12000
batho bsg --root /path/to/repo --mode full
batho bsg --root /path/to/repo --mode hierarchical

# Query persisted graph indexes
batho query --root /path/to/repo --entity-type function --limit 50
batho query --root /path/to/repo --file-path src/api.py
batho query --root /path/to/repo --relationship-type calls --rebuild-index

Cache & Storage Operations

# Index cache cleanup
batho invalidate --root /path/to/repo

# AST cache management
batho cache stats
batho cache invalidate "**/*.py"
batho cache clear

# Persistent storage management
batho storage backfill --root /path/to/repo
batho storage verify --root /path/to/repo --repair
batho storage cleanup --root /path/to/repo          # dry-run
batho storage cleanup --root /path/to/repo --apply  # execute cleanup
batho storage stats --root /path/to/repo
batho storage rebuild-indexes --root /path/to/repo
batho storage compact --root /path/to/repo            # dry-run
batho storage compact --root /path/to/repo --apply    # execute compaction

Cloud Sync Operations

# Preview pending artifacts (no upload)
batho sync --root /path/to/repo --dry-run

# Sync pending artifacts to cloud endpoint
export BATHO_CLOUD_SYNC_ENABLED=true
export BATHO_CLOUD_ENDPOINT="https://sync.batho.dev/v1"
export BATHO_CLOUD_API_KEY="batho_live_xxxxx"
batho sync --root /path/to/repo

# Retry only failed artifact uploads
batho sync --root /path/to/repo --retry-failed

# Show local sync status summary
batho sync --root /path/to/repo --status

Bridge (Artifact Registry REST + MCP)

Expose .ctn/ artifacts via HTTP and MCP for dashboard/IDE integrations.

# Start REST API server (default http://127.0.0.1:8766)
batho bridge serve --root /path/to/repo
batho bridge serve --root /path/to/repo --host 0.0.0.0 --port 8766

# Start MCP server (stdio for IDE integration)
batho bridge mcp --root /path/to/repo --transport stdio

# Start MCP server (SSE for remote clients)
batho bridge mcp --root /path/to/repo --transport sse --port 8767

# Check registry status
batho bridge status --root /path/to/repo

# Verify all artifacts are loadable
batho bridge verify --root /path/to/repo

REST endpoints (mounted under /api/v1/bridge/):

  • GET /indexes — List all indexes
  • GET /indexes/{index_id} — Get specific index metadata
  • GET /artifacts?type={artifact_type}&limit={n} — List artifact records
  • GET /artifacts/{artifact_type}?index_id={id} — Load artifact JSON content
  • GET /artifacts/{artifact_type}/content?path={logical_path} — Load by logical path
  • GET /stats — Registry statistics

MCP tools: bridge_list_indexes, bridge_get_index, bridge_list_artifacts, bridge_get_artifact, bridge_get_artifact_by_path, bridge_search_artifacts, bridge_get_stats.

Git Hooks Management

YAML-driven Git client-side hook automation with enterprise reliability.

# List configured hooks and templates
batho hooks list --root /path/to/repo

# Check installation status
batho hooks status --hook pre-commit

# Install all enabled hooks (auto-bootstraps .batho/hooks.yaml if missing)
batho hooks install --all

# Install specific hook with force (overwrites unmanaged)
batho hooks install --hook pre-commit --force

# Remove managed hooks
batho hooks remove --all

# Run hook manually (supports custom hooks for CI/CD)
batho hooks run --hook enterprise-nightly --verbose

Configuration in .batho/hooks.yaml:

version: hooks.v1
defaults:
  shell: /bin/sh
  timeout: 60
hooks:
  pre-commit:
    enabled: true
    stages:
      - run: ruff check .
      - run: pytest --co -q
  pre-push:
    enabled: true
    stages:
      - run: pytest -x --tb=short

Enable in batho.yaml:

hooks:
  enabled: true
  include: true

Index Flags

Flag Default Description
--max-workers 0 (auto) Worker threads — 0 uses CPU × 2, capped at 32
--max-file-size-kb 500 Skip files larger than this
--extensions all supported Restrict indexing to selected extensions
--full off Disable incremental reuse and force full rebuild
--force off Clear index file cache and AST cache before indexing
--no-ast-cache off Bypass AST cache for the current indexing run
--base-snapshot auto Prefer this snapshot for incremental indexing
--output-json none Optional override path for graph JSON output
--metrics-output from config Write metrics JSON to explicit path
--verbose off Print progress to stdout
--snapshot off Create snapshot after indexing
--snapshot-label none Attach label to generated snapshot

Global Logging Flags

Flag Default Description
--log-level from config Override logging level (DEBUG, INFO, WARNING, ERROR, CRITICAL)
-q, --quiet from config Suppress non-error CLI output and log events below ERROR
--log-json off Force JSON log output (useful in CI)
--log-file from config Write logs to the specified file path

BSG Options

Flag Default Description
--mode compressed Rendering mode: compressed, full, hierarchical
--budget 12000 Token budget for compressed mode

Patch Options

Flag Default Description
--scan off Auto-scan for changes
--dry-run off Preview changes without applying
--base-snapshot auto Use specific snapshot as base
--force-index-patch off Force traditional index-based patching
--diff none Apply patch from unified diff
files... none Patch explicit changed files

Output

.ctn/
├── index.json                   # Index metadata + staleness + persistence model
├── artifact_registry.db         # SQLite artifact registry (durable outputs)
├── file_cache.json              # Index file cache
├── file_hashes.json             # Content-hash tracker for incremental scans
├── metrics.json                 # Optional metrics output
├── interception_stats.json      # Rule interception matrix
├── evolution_ledger.json        # Failure synthesis ledger
├── snapshots/                   # Time Machine snapshots
│   └── batho_<project>_<sha>_<ts>.json
├── patches/                     # Patch operation history
│   ├── index.json
│   └── patch_<operation_id>.json
└── <index_id>/
    ├── graph.json               # Entities + relationships
    ├── bsg.json                 # Structured symbol graph
    ├── bsg_compressed.json      # LLM-ready compressed output
    ├── bsg_full.json            # Full textual BSG output
    ├── bsg_hierarchical.json    # Hierarchical textual BSG output
    └── context/
        ├── overview.md
        └── files.md

Default AST cache database location: .ctn/local/cache/ast_cache.db (configured by bsg.cache.path).

graph.json example
{
  "schema_version": "graph.v1",
  "entities": [
    {"id": "e1", "name": "login", "type": "function", "file": "auth.py", "start_line": 10, "end_line": 25}
  ],
  "relationships": [
    {"source_id": "e1", "target_id": "e2", "type": "IMPORTS"}
  ]
}
bsg.json example
{
  "schema_version": "bsg.v1",
  "nodes": [
    {
      "id": "e1",
      "type": "FUNCTION",
      "name": "login",
      "file": "src/auth.py",
      "start_line": 10,
      "end_line": 25
    }
  ],
  "edges": [],
  "indexes": {
    "nodes_by_file": {
      "src/auth.py": ["e1"]
    }
  }
}

Configuration

Batho works out of the box with zero config. For production use, configure with the unified root config file ./batho.yaml (or start from batho.yaml.example) plus optional environment overrides.

Configuration precedence:

  1. Built-in defaults
  2. ./batho.yaml
  3. Environment variables (override file values)
  4. CLI flags (override for a specific run)

Output behavior:

  • User-facing command output is written to stdout.
  • Warnings/errors and operational logs are written to stderr.

Core Config Areas

Area Keys What it controls
logging level, json_format, quiet, file, format Process-wide logging and CLI verbosity behavior
paths ctn_dir Artifact output directory
indexer max_file_size_kb, max_workers, max_indexed_files, ignore_*, metrics_output Base indexing limits and outputs
rules enabled, builtin_plugins, custom_rules_*, strict_validation Rule plugins and metadata enrichment
bsg.parallel enabled, max_workers, chunk_size Parallel file extraction
bsg.ignore enabled, file .bathoignore integration
bsg.cache enabled, path, max_size_mb, ttl_days AST cache behavior
bsg.incremental enabled, fallback_to_full, auto_detect_git Incremental indexing strategy
bsg.symbol_resolution enabled, fuzzy_matching, cache_symbols Cross-file symbol resolution
bsg.serialization method, compression, batch_size BSG render strategy
bsg.parsing error_recovery, partial_parsing, max_file_size_mb, skip_comments Parser behavior
bsg.query enabled, index_on_write, cache_enabled, cache_size, default_limit, query_timeout_ms Persistent query indexes
bsg.storage enabled, backend, registry_path, content_scope, cloud_sync_ready, mmap_enabled, retention.* Durable artifact registry and retention
hooks enabled, include Git client-side hook automation pointer

Environment Variables (Common)

Variable Default Description
BATHO_LOG_LEVEL INFO DEBUG, INFO, WARNING, ERROR
BATHO_LOG_JSON null Force JSON logs (true) or leave auto mode (unset)
BATHO_LOG_QUIET false Suppress non-error output globally
BATHO_LOG_FILE unset Optional log file path
BATHO_CTN_DIR .ctn Output directory
BATHO_MAX_FILE_SIZE_KB 500 Max file size to parse
BATHO_MAX_INDEXED_FILES 200000 Hard cap on indexed files
BATHO_INDEX_WORKERS 0 Worker threads (0 = auto)
BATHO_METRICS_OUTPUT .ctn/metrics.json Metrics output path
BATHO_RULES_ENABLED config value Enable BSG rule plugin stage
BATHO_RULES_CUSTOM_RULES_PATH unset YAML file containing custom BSG rules
BATHO_RULES_BUILTIN_PLUGINS bsg_core Comma-separated built-in plugin names
BATHO_RULES_DISABLED_RULES unset Comma-separated rule names to disable
BATHO_BSG_STORAGE_ENABLED true Enable durable artifact registry
BATHO_BSG_STORAGE_REGISTRY_PATH .ctn/artifact_registry.db Registry database path
BATHO_BSG_STORAGE_MMAP_ENABLED false Enable mmap reads for large persisted JSON
BATHO_BSG_QUERY_INDEX_ON_WRITE true Build query index at write time
BATHO_BSG_QUERY_CACHE_SIZE 256 Query service cache size

For the complete env override set, see batho/config.py.

Config File

# ./batho.yaml
logging:
  level: DEBUG
  json_format: true
  quiet: false
  file: .ctn/batho.log
  format: "%(message)s"

indexer:
  max_file_size_kb: 1000
  max_workers: 16
  ignore_patterns:
    - "**/vendor/**"
    - "**/dist/**"

flags:
  strict: true
  fail_on_warning: true

rules:
  enabled: true
  builtin_plugins: [bsg_core]
  disabled_rules: []
  custom_rules_path: ./bsg-rules.yaml
  custom_rules_inline:
    - name: payment-cluster
      entity_types: ["function", "method"]
      name_patterns: ["*payment*", "*invoice*"]
      metadata:
        bsg.cluster_hint: billing

  # Validation controls
  strict_validation: false
  fail_on_rule_error: false

bsg:
  parallel:
    enabled: true
    max_workers: 16
    chunk_size: 50
  cache:
    enabled: true
    path: .ctn/local/cache/ast_cache.db
    max_size_mb: 1024
    ttl_days: 30
  query:
    enabled: true
    index_on_write: true
    cache_enabled: true
    cache_size: 256
    default_limit: 200
  storage:
    enabled: true
    backend: sqlite
    registry_path: .ctn/artifact_registry.db
    content_scope: durable
    cloud_sync_ready: true
    mmap_enabled: false
    retention:
      enabled: true
      snapshot_ttl_days: 90
      patch_ttl_days: 90
      metrics_ttl_days: 30
      context_ttl_days: 90

Scenario Playbooks

1) Local Dev (fast feedback)

indexer:
  max_workers: 0
  max_file_size_kb: 500
bsg:
  incremental:
    enabled: true
  cache:
    enabled: true
batho index --root .
batho patch --root . --scan
batho bsg --root . --mode compressed --budget 12000

2) Large Monorepo (throughput)

indexer:
  max_file_size_kb: 2000
bsg:
  parallel:
    enabled: true
    max_workers: 16
  ignore:
    enabled: true
    file: .bathoignore
  storage:
    mmap_enabled: true
batho index --root /repo --snapshot
batho storage stats --root /repo
batho query --root /repo --relationship-type calls --limit 200

3) CI/CD (deterministic + observable)

logging:
  level: INFO
  json_format: true
indexer:
  metrics_output: .ctn/metrics.json
bsg:
  storage:
    enabled: true
batho index --root . --log-json --snapshot
batho stats --root .
batho storage verify --root .

4) Persistent Storage Hygiene (cloud-sync-ready v1)

# register existing artifacts
batho storage backfill --root .

# verify and repair drift
batho storage verify --root . --repair

# inspect registry + graph cache health
batho storage stats --root .

# rebuild query indexes
batho storage rebuild-indexes --root .

# retention dry-run / apply
batho storage cleanup --root .
batho storage cleanup --root . --apply

# deduplicate registry (dry-run first)
batho storage compact --root .
batho storage compact --root . --apply

BSG Rule Plugins

Batho now applies BSG rules through internal plugin modules, not the root rules folder.

  • Built-in rules are loaded from packaged plugins (default: bsg_core).
  • Custom rules can be defined inline in batho.yaml via rules.custom_rules_inline.
  • Custom rules can also be loaded from rules.custom_rules_path YAML files.
  • Rule actions currently focus on deterministic metadata enrichment for graph entities (for example bsg.category, bsg.scope_tier, bsg.service_tag).

Custom rules YAML accepts either a top-level list or a rules: list.

rules:
  - name: mark-test-files
    file_patterns: ["tests/**", "**/*_test.py"]
    metadata:
      bsg.category: TEST

  - name: derive-service-tag
    file_patterns: ["services/*/**"]
    actions:
      derive_service_tag: true

Using Batho with AI

Batho is built to power AI-assisted development. Here are common patterns:

Feed LLM Context

# Generate compressed bsg for LLM injection
batho bsg --root . --mode compressed --budget 12000
# → Output saved to .ctn/{index_id}/bsg_compressed.json
# → Load and inject into your LLM prompt as codebase context

Or programmatically:

import json
from pathlib import Path

# Load compressed bsg generated by CLI
with open('.ctn/{index_id}/bsg_compressed.json', 'r') as f:
    data = json.load(f)
    compressed_text = data['compressed_text']
    stats = data['stats']
# → Inject 'compressed_text' into your LLM prompt as codebase context

Codebase Q&A

# Find all functions that call 'authenticate'
for rel in graph.relationships:
    target = graph.get_entity(rel.target_id)
    if target and target.name == "authenticate":
        source = graph.get_entity(rel.source_id)
        print(f"{source.name} → authenticate  ({source.file})")

Use Batho as a Python Library (Custom Scripts)

Batho is not only a CLI. You can import it as a Python library to build custom automation scripts, CI workflows, and internal developer tools.

Public Python API

The batho package exports core APIs directly:

  • Indexing and graph: CodeGraphIndexer, InMemoryGraph, BSGMap
  • Time Machine: create_snapshot, list_snapshots, load_snapshot, diff_snapshots
  • Incremental patching: FileChange, FileChangeType, FileChangeTracker, incremental_patch
  • Git-aware change discovery: get_changed_file_status_since
  • Query layer: QueryService

Example: Index + Snapshot from a Script

from pathlib import Path

from batho import BSGMap, CodeGraphIndexer, create_snapshot

root = Path(".").resolve()
ctn_dir = root / ".ctn"
ctn_dir.mkdir(parents=True, exist_ok=True)

indexer = CodeGraphIndexer(cache_path=str(ctn_dir / "file_cache.json"), root=str(root))
graph = indexer.build_graph(root=str(root), snapshot_id="script-run")

bsg = BSGMap.build(graph, root=str(root))
snapshot_id = create_snapshot(ctn_dir, root, graph, bsg, label="nightly-script")

print({"entities": len(graph.entities), "relationships": len(graph.relationships), "snapshot": snapshot_id})

Example: Incremental Patch in Automation

from pathlib import Path

from batho import FileChangeTracker, incremental_patch

root = Path(".").resolve()
ctn_dir = root / ".ctn"
base_snapshot_id = "<existing_snapshot_id>"

tracker = FileChangeTracker(root)
hash_cache_path = ctn_dir / "file_hashes.json"
tracker.load(hash_cache_path)
changes = tracker.scan_for_changes(max_file_size_kb=500)
tracker.save(hash_cache_path)

if changes:
    result = incremental_patch(ctn_dir, base_snapshot_id, changes)
    print(result)
else:
    print("No changes detected")

Example: Query Indexed Data Programmatically

from pathlib import Path

from batho import QueryService

ctn_dir = Path(".ctn")
query = QueryService(ctn_dir)

functions = query.entities_by_type("function", limit=20)
for row in functions:
    print(f"{row['name']} -> {row['file']}")

Impact Analysis (Pre-Refactoring)

# Find every caller of a function before changing it
for rel in graph.relationships:
    if rel.target_id == target_id and rel.type.name == "CALLS":
        caller = graph.get_entity(rel.source_id)
        print(f"  Will be affected: {caller.name} in {caller.file}")

RAG / Vector Embedding

batho index --root /path/to/repo
batho bsg --root /path/to/repo --mode compressed
# → Embed .ctn/*/bsg_compressed.json chunks into your vector DB

Agentic AI

Autonomous agents can use Batho's structured graph to navigate codebases, resolve imports, and understand call chains — without reading every file.


Integrations

CI/CD (GitHub Actions)

name: Code Index
on: [push, pull_request]
jobs:
  index:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: pip install batho
      - run: batho index --root . --verbose --log-json --snapshot
      - run: batho stats --root .
      - uses: actions/upload-artifact@v4
        with:
          name: batho-output
          path: .ctn/

Pre-commit Hook

# .pre-commit-config.yaml
- repo: local
  hooks:
    - id: batho-index
      name: Batho Code Index
      entry: batho index --root .
      language: system
      pass_filenames: false
      always_run: true
    - id: batho-patch
      name: Batho Incremental Patch
      entry: batho patch --root . --scan
      language: system
      pass_filenames: false
      always_run: true

VS Code Task

{
  "version": "2.0.0",
  "tasks": [{
    "label": "Batho Index",
    "type": "shell",
    "command": "batho index --root ${workspaceFolder} --verbose --snapshot"
  },
  {
    "label": "Batho Patch",
    "type": "shell",
    "command": "batho patch --root ${workspaceFolder} --scan"
  }]
}

Security & Compliance

Guarantee Details
Parse-only Batho never executes your code — safe on untrusted repos
Binary detection Magic bytes + Shannon entropy analysis
Ignore rules Respects .gitignore and .bathoignore
Atomic writes Temp file + rename — no partial outputs on crash
Fully offline Zero network calls — runs air-gapped

For regulated environments, add SBOM and license checks in CI:

pip install cyclonedx-bom && cyclonedx-py -o sbom.xml
pip install pip-licenses && pip-licenses --allow-only "Apache Software License"

Performance

Repo Size Workers (auto) Typical Time
< 50 files 4 < 2s
50–200 files 8 2–5s
200–1K files 16 5–15s
1K+ files 32 varies

Tips for large monorepos (2M+ LOC):

  • Run on fast local SSD
  • Use --log-json to reduce console overhead
  • Add build artifacts to .bathoignore:
    node_modules/
    vendor/
    dist/
    build/
    __pycache__/
    

Architecture

batho/
├── batho_cli.py                  # CLI command entrypoints
└── batho/
    ├── __init__.py               # Public Python API exports
    ├── config.py                 # Configuration and env overrides
    ├── time_machine.py           # Snapshots, diffs, incremental patching
    ├── context/
    │   ├── codegraph.py          # Graph indexing and extraction pipeline
    │   ├── pipeline.py           # Parallel worker orchestration
    │   ├── bsg_map.py            # Multi-format BSG renderer
    │   ├── query.py              # Query service over persisted artifacts
    │   └── languages/            # Per-language tree-sitter extractors
    └── utils/
        ├── logging.py            # Structured logging
        ├── hash.py               # SHA-256 helpers
        └── ignore.py             # .gitignore / .bathoignore handling

Contributing

Batho is open source and welcomes contributions. Whether it's a bug report, a new language extractor, or a docs improvement — we'd love your help.

  1. Fork the repo
  2. Create a feature branch
  3. Run the test suite: uv run pytest
  4. Submit a pull request

License

Apache 2.0 — see LICENSE


🎉 Thank You!

Ready to get started? Install Batho and index your first project in 30 seconds.


🚀 Batho v1.0.0 - Code Intelligence for the AI Era
PyPI · Issues · Discussions · Full Documentation

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

batho-0.1.4.tar.gz (1.6 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

batho-0.1.4-py3-none-any.whl (516.1 kB view details)

Uploaded Python 3

File details

Details for the file batho-0.1.4.tar.gz.

File metadata

  • Download URL: batho-0.1.4.tar.gz
  • Upload date:
  • Size: 1.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for batho-0.1.4.tar.gz
Algorithm Hash digest
SHA256 08f4cab18b310d60d1accfaa97367da7c6b842f6f1868f4c5bc3f386471193de
MD5 e6cc97ca016cc196e3cc34de56b35820
BLAKE2b-256 1f8c4157e5556b20aabbb7fd09100d703064f97c0505d0958f97caf8e0049dc0

See more details on using hashes here.

File details

Details for the file batho-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: batho-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 516.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for batho-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 69eab1b0ca954e6667e6d3633b472ff59b4a9bcac56f1a8e4cbcbb9afca09ac7
MD5 b48e593bd48ec169317757d9678cf7f7
BLAKE2b-256 9e39e91ab2951f73359da4baa8738eb9a433f814eb7b5dca8fbf87ac82ee7d46

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page