Deterministic, token-efficient codebase packs and skeleton maps for AI review (Python)
Project description
anatomize
Generate deterministic, token-efficient maps and review bundles for Python repositories.
anatomize has two complementary workflows:
- Skeletons: structure-only “code maps” for navigation and architecture understanding.
- Packs: single-file bundles (repomix-style) for external review, with filtering and slicing.
If you want the full guide (modes, slicing, config, determinism guarantees), see docs/GUIDE.md.
Installation
pip install anatomize
Quick Start (CLI)
Generate skeletons
# Scaffold config for the common workflow “src detailed, tests minimal”
anatomize init --preset standard
# Generate all configured outputs from .anatomize.yaml (writes into .anatomy/*)
anatomize generate
# Or run ad-hoc generation for a specific source path
anatomize generate ./src
# Choose resolution level
anatomize generate ./src --level hierarchy --output .anatomy
anatomize generate ./src --level modules --output .anatomy
anatomize generate ./src --level signatures --output .anatomy
# Write multiple formats
anatomize generate ./src --format yaml --format json --format markdown --output .anatomy
Estimate tokens
anatomize estimate ./src --level modules
Validate (and fix) skeleton output
# Validate all configured outputs (from .anatomize.yaml)
anatomize validate
# Rewrite configured outputs to match regenerated content (strict, atomic-ish replacement)
anatomize validate --fix
# Or validate a specific directory against explicit sources
anatomize validate .anatomy/src --source ./src
Pack a repository into an AI-friendly bundle
# If --format is omitted, it is inferred from --output when the extension is known
anatomize pack . --output codebase.jsonl
anatomize pack . --output codebase.md
# Full bundle
anatomize pack . --format markdown --output codebase.md
# Minimal prefix (lower token overhead)
anatomize pack . --prefix minimal --output codebase.md
# Explain selection (why files were included/excluded)
# (writes `codebase.md.selection.json` by default)
anatomize pack . --explain-selection --output codebase.md
# Filter by globs
anatomize pack . --include "src/**" --ignore "**/__pycache__/**" --output src-only.md
# Forward dependency closure (entrypoint + everything it imports)
anatomize pack . --entry src/anatomize/cli.py --deps --output slice.md
# Reverse dependency closure (module + everything that imports it)
anatomize pack . --target src/anatomize/cli.py --reverse-deps --output importers.md
# Reverse + forward (importers plus what they import)
anatomize pack . --target src/anatomize/cli.py --reverse-deps --deps --output importers-and-deps.md
# Token-efficient Python compression (signatures/imports/constants)
anatomize pack . --compress --output compressed.md
# Make markdown robust to embedded ``` fences (default)
anatomize pack . --content-encoding fence-safe --output safe.md
# Maximum robustness (content is base64-encoded UTF-8)
anatomize pack . --content-encoding base64 --output safe.base64.md
# Split output into multiple files (markdown/plain only)
anatomize pack . --split-output 500kb --output codebase.md
# Hard cap output (bytes or tokens)
anatomize pack . --max-output 20_000t --output codebase.md
# Print a per-file content token tree to stdout
anatomize pack . --token-count-tree --output codebase.md
# JSONL (stream-friendly)
anatomize pack . --format jsonl --output codebase.jsonl
# Hybrid mode (summaries + selective fill; token-efficient)
# - defaults to markdown when --format and the output extension are not specified
# - Python files default to summary; non-Python defaults to metadata-only
anatomize pack . --mode hybrid --output hybrid.md
# Hybrid: include full content for a slice and fit within a hard token budget (JSONL only)
anatomize pack . --mode hybrid --format jsonl --max-output 50_000t --fit-to-max-output \
--content "src/pkg/**" --output hybrid.slice.jsonl
Reference-based usage slicing (requires Pyright language server):
anatomize pack . --target src/anatomize/cli.py --uses --slice-backend pyright --output uses.md
Python API
Generate skeletons in code
from anatomize import SkeletonGenerator
from anatomize.formats import OutputFormat, write_skeleton
gen = SkeletonGenerator(sources=["./src"])
skeleton = gen.generate(level="modules")
print("Modules:", skeleton.metadata.total_modules)
print("Classes:", skeleton.metadata.total_classes)
print("Functions:", skeleton.metadata.total_functions)
print("Estimated tokens:", skeleton.metadata.token_estimate)
write_skeleton(skeleton, ".anatomy", formats=[OutputFormat.YAML, OutputFormat.JSON])
Key exported objects
anatomize.SkeletonGenerator: orchestrates discovery + extraction.anatomize.formats.write_skeleton: writes YAML/JSON/Markdown plus schemas andmanifest.json.anatomize.validation.validate_skeleton_dir: strict validator with optionalfix.
Configuration (.anatomize.yaml)
The CLI can auto-discover .anatomize.yaml. Generation commands use config from the current working directory (or explicit --config). pack discovers config relative to the chosen ROOT when --config is not provided.
Minimal config:
output: .anatomy
sources:
- path: src
output: src
level: modules
- path: tests
output: tests
level: hierarchy
# Defaults applied to sources that omit fields
level: modules
formats: [yaml, json, markdown]
exclude:
- __pycache__/
- "*.pyc"
symlinks: forbid # forbid|files|dirs|all
workers: 0 # 0 = auto
pack:
format: markdown # markdown|plain|json|xml|jsonl (hybrid supports markdown|plain|jsonl)
mode: bundle # bundle|hybrid (hybrid is token-efficient summaries + selective fill)
prefix: standard # standard|minimal
output: anatomize-pack.md # if the extension is known, it must match `format`
include: []
ignore: []
ignore_files: []
respect_standard_ignores: true
symlinks: forbid # forbid|files|dirs|all
max_file_bytes: 1000000
workers: 0 # 0 = auto
token_encoding: cl100k_base
compress: false
content_encoding: fence-safe # verbatim|fence-safe|base64 (markdown disallows verbatim)
line_numbers: false
no_structure: false
no_files: false
max_output: null # e.g. "500kb" or "20_000t"
split_output: null # e.g. "500kb" or "20_000t"
fit_to_max_output: false
# Hybrid representation rules (repeatable patterns). Precedence: meta < summary < content.
meta: []
summary: []
content: []
summary_config:
max_depth: 3
max_keys: 200
max_items: 200
max_headings: 200
python_roots: [] # defaults to ["src"] if present, else ["."]
slice_backend: imports # imports|pyright
uses_include_private: false
pyright_langserver_cmd: "pyright-langserver --stdio"
Exclude patterns use gitignore-like semantics and are applied relative to each configured root.
Tip: anatomize init --preset standard scaffolds .anatomize.yaml with the common pattern “src detailed, tests minimal”.
Output artifacts
Skeleton output directory
write_skeleton(...) and anatomize generate ... --output DIR create:
hierarchy.yaml|json|md/modules.*/signatures.*depending on selected formats and levelschemas/*.jsonembedded with the packagemanifest.json(SHA-256 per output file and metadata for validation)
When anatomize generate runs from .anatomize.yaml, it writes one skeleton directory per configured source:
.anatomy/src/....anatomy/tests/...
Pack output file(s)
anatomize pack writes one or more files depending on splitting:
anatomize-pack.md(or.txt|.json|.xml)- if split:
anatomize-pack.1.md,anatomize-pack.2.md, …
Each pack artifact starts with a lightweight, deterministic overview (and, if enabled, a structure tree) before file blocks/records.
Token reporting:
- Artifact tokens: exact tokens of the written output file(s) (returned by the Python API).
- Content tokens: tokens of file contents only (returned by the Python API; useful for budgeting).
Pack artifacts intentionally do not embed token counts (agents don’t need them; they waste tokens).
Determinism and strictness
- Deterministic ordering (paths and symbols sorted).
- No timestamps in outputs.
- Parse failures are hard failures (no partial output).
- Validation is strict;
--fixreplaces output with regenerated content.
Development
python -m venv .venv
. .venv/bin/activate
python -m pip install -U pip
python -m pip install -e ".[dev]"
python -m ruff check .
python -m mypy -p anatomize
python -m pytest
Optional local benchmark:
.venv/bin/python scripts/bench_pack.py . --compress --workers 0
Tests
Tests are indexed via pytest markers in pyproject.toml and documented in tests/README.md:
unit: fast, isolated testsintegration: filesystem-level testse2e: CLI-level tests
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file anatomize-0.2.1.tar.gz.
File metadata
- Download URL: anatomize-0.2.1.tar.gz
- Upload date:
- Size: 90.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3b3b64696fd79c13a68ee82c02a907bd002770f673f7e93c28856d85e6330e05
|
|
| MD5 |
5348e61ccb57c7045fd3b190dba6cb41
|
|
| BLAKE2b-256 |
7e8d55d0d72d8cb2924c6006e45f0a5fb8c1611e7ba951ce4ae2805673ac42e8
|
Provenance
The following attestation bundles were made for anatomize-0.2.1.tar.gz:
Publisher:
publish-pypi.yml on BradSegal/anatomize
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
anatomize-0.2.1.tar.gz -
Subject digest:
3b3b64696fd79c13a68ee82c02a907bd002770f673f7e93c28856d85e6330e05 - Sigstore transparency entry: 893594255
- Sigstore integration time:
-
Permalink:
BradSegal/anatomize@4e3b10e834b86f9d3359c3a2bd78eda8344a8f0c -
Branch / Tag:
refs/tags/v0.2.1 - Owner: https://github.com/BradSegal
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@4e3b10e834b86f9d3359c3a2bd78eda8344a8f0c -
Trigger Event:
push
-
Statement type:
File details
Details for the file anatomize-0.2.1-py3-none-any.whl.
File metadata
- Download URL: anatomize-0.2.1-py3-none-any.whl
- Upload date:
- Size: 83.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
afb0a66cc3d1b0abbb25ef875f7539fe7b434015867ee1e56ee49aa5881d6395
|
|
| MD5 |
9d529e8231c57492ec11fdcbcfc5237a
|
|
| BLAKE2b-256 |
bdb477ce92af47bdac3f3d90c672278c88eb0ecd89786bc5778c07eb427249fd
|
Provenance
The following attestation bundles were made for anatomize-0.2.1-py3-none-any.whl:
Publisher:
publish-pypi.yml on BradSegal/anatomize
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
anatomize-0.2.1-py3-none-any.whl -
Subject digest:
afb0a66cc3d1b0abbb25ef875f7539fe7b434015867ee1e56ee49aa5881d6395 - Sigstore transparency entry: 893594277
- Sigstore integration time:
-
Permalink:
BradSegal/anatomize@4e3b10e834b86f9d3359c3a2bd78eda8344a8f0c -
Branch / Tag:
refs/tags/v0.2.1 - Owner: https://github.com/BradSegal
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@4e3b10e834b86f9d3359c3a2bd78eda8344a8f0c -
Trigger Event:
push
-
Statement type: