Skip to main content

A Rust-inspired, Typer-powered CLI for reliably managing project documentation.

Project description

croc

License: MIT Python 3.13+ uv Ruff ty

ids are owners, links are borrows, croc check is the borrow checker.

A Rust-inspired, Typer-powered CLI for reliably managing project documentation.

croc treats a markdown doc tree the way Rust treats memory: ids are owners, links are borrows, croc check is the borrow checker, and croc rename is an atomic refactor. Move a file and every reference keeps working. Introduce a dangling link and the commit is refused.


The problem

A thoughts/ tree grows branching paths, nested directories with self.md files, and files that reference other files by path. When a file moves, every referrer has to be updated. The usual options all fail:

  • By hand: grep for every reference and hope you didn't miss one.
  • With sed: risk rewriting substring matches inside prose or code blocks.
  • With Obsidian/Notion: silent link rot whenever someone edits outside the editor.

None of these prevent the broken intermediate state — the window where main has dangling refs and no one has noticed. croc check closes that window by refusing the commit; croc rename makes the refactor an atomic transaction.

The idea

Replace path-based references with stable ids. A reference like [[id:registry-pattern]] resolves through a derived index of every id in the tree. When a file moves, the id travels with it; every link still works. When an id changes, one command rewrites every referrer atomically.

Tree-as-memory

Rust concept croc concept
Ownership Each .md has a unique id in frontmatter
Move semantics mv relocates bytes; id travels with the file
&T (borrow) [[id:X]] — strong link
Weak<T> [[see:X]] — soft citation, may dangle
Lifetimes Strong links may not outlive their target
Newtype pattern DocId and DocPath are distinct types
Borrow checker croc check refuses trees with broken invariants
Validate-then-commit Rewrites simulated in memory before any disk write

Quick start

Requires Python >=3.13 and uv.

# Install
uv sync

# Check the included example
uv run croc check examples/thoughts

# Print the derived id → path index
uv run croc index examples/thoughts

# Scaffold a doc tree from a source directory (plain markdown)
uv run croc crawl src/
# Same, but also adopt into croc shape — ready for `croc check`
uv run croc crawl src/ --adopt

# Adopt croc on a repo with plain markdown (preview first)
uv run croc init --adopt --dry-run path/to/docs/
uv run croc init --adopt            path/to/docs/

# Rename an id; every referrer updates atomically
uv run croc rename old-id new-id --root path/to/docs/

# Move a file; id-based links mean zero references need rewriting
uv run croc move path/to/docs/a.md path/to/docs/subdir/ --root path/to/docs/

Commands

croc check <root>

Runs the borrow checker. Exit codes:

  • 0 — tree is sound
  • 1 — tree has violations (printed to stderr)
  • 2 — tree cannot be loaded (malformed frontmatter, missing root)

croc index <root>

Prints the derived id → path map as JSON. The index is never stored — it's a regenerable view over the tree, so it cannot drift.

croc move <src> <dst> [--root R] [--dry-run]

Relocates a file. Because ids are stable, zero references are rewritten. Runs a pre-check so you don't pile a move on a broken tree. Uses git mv when in a git repo, falls back to shutil.move.

croc rename <old-id> <new-id> [--root R] [--dry-run]

Rewrites every strong and weak reference in the tree, plus the owner's id field. Transactional:

  1. Pre-check the tree is sound.
  2. Plan the rewrite in memory.
  3. Simulate the plan: apply in memory, re-parse, re-check.
  4. Commit atomically per-file (temp + os.replace); snapshot-based rollback on FS failure.

If any step fails, nothing is written.

croc init [path] [--adopt] [--dry-run]

Creates a .croc.toml marker at path. With --adopt, brings every .md into the managed schema in one of three ways:

  • SCAFFOLD — no frontmatter. Prepend a fresh block.
  • AUGMENT — has frontmatter but missing required fields. Fill in id/title/kind/links while preserving every existing key and its order (foreign fields like type, mirrors, created, ... survive untouched).
  • SKIP — has frontmatter we can't safely modify (unterminated, invalid YAML, malformed existing id). The author fixes by hand.

Proposed ids are hierarchical — slugified relative path, not just the filename — so code-adjacent trees with lots of repeated stems (__init__.md, per-customer folders, etc.) don't collide:

Path Proposed id
foo.md (root) foo
sub/foo.md sub-foo
pkg/utils/__init__.md pkg-utils-init
regions/east/notes.md regions-east-notes
alerts/self.md alerts (directory-index convention)
self.md (root) root

Collisions (rare path-slug ambiguities, or foo.md at root competing with foo/self.md) are reported and the command refuses to write.

croc crawl <src> [-o OUT] [--adopt] [--file-types EXT ...] [--force] [--dry-run]

Scaffold a plain-markdown doc tree from a source directory. One .md stub per file, one self.md per directory. Output carries only a mirrors: breadcrumb in frontmatter — no id / kind / links — so the shape is compatible with croc's post-molt state and the adopt/molt cycle round-trips cleanly.

# Plain scaffold — output lands at ./thoughts/<src-name>/ by default
croc crawl src/

# Preview the plan
croc crawl src/ --dry-run

# One-step: scaffold + run `init --adopt` on the result → croc-checkable tree
croc crawl src/ --adopt

# Narrow discovery by extension (default mirrors every file git tracks)
croc crawl src/ --file-types .py --file-types .ts

Default discovery mirrors every file git is actively tracking (git ls-files). Dot-prefixed directories (.git, .venv, ...) and __pycache__ are always pruned. .gitignore is honored automatically when run inside a git repo; for repos without .gitignore discipline, use --file-types to narrow. Pass the global --include-untracked flag to also mirror draft files you haven't git added yet.

Two-step vs one-step. The default two-step flow — croc crawl src/ then croc init --adopt thoughts/src/ — gives you a plain-markdown tree you can edit by hand before adopting. The one-step --adopt variant is for cases where you just want a croc-checkable tree immediately. Both are idempotent; re-running is a no-op unless --force is passed.

Why shape-compatibility with molt matters. crawl emits the same frontmatter shape a file ends up with after molt (nothing but non-croc fields like mirrors:). That means the lifecycle crawl → (adopt → edit → check → molt)* is symmetric around crawl's output: you can stay plain, adopt when you want the checker, molt back for sharing, re-adopt later — and crawl sits cleanly outside the cycle. Try the bundled example:

uv run croc check examples/thoughts-from-code/thoughts

croc molt <root> [--dry-run]

Reverse adoption. Rewrites every [[id:X]] / [[see:X]] body ref back into [text](path.md) plain markdown, strips croc-specific frontmatter fields (id, kind, links), and removes .croc.toml. The tree must pass croc check first.

Before After
[[id:foo|foo]] [foo](foo.md)
[[id:target#section-x|Section X]] [Section X](target.md#section-x)
[[id:data-glossary]] (bare) [Data Glossary](data-glossary.md) (falls back to target's title)

Foreign frontmatter (title, type, mirrors, any custom keys) is preserved in original order. The molted tree renders correctly in GitHub, Obsidian, or any generic markdown tool. Re-adopt with croc init --adopt to come back under croc management; the round-trip is semantically equivalent.

croc refs <root> [--unresolved]

Walks the tree and reports every markdown-style path ref ([text](path.md)), showing whether each target resolves to a file under the root. Read-only; works on any markdown tree whether or not it's been adopted. Use as a health check before init --adopt --migrate-refs:

croc refs --unresolved path/to/docs/
# UNRESOLVED runbooks/onboarding.md: -> ghost.md
# 1 unresolved ref(s) across the tree

Exits 1 when any ref is unresolved. Great for CI on partially-migrated trees.

Ref migration (on init --adopt, default on)

Adoption rewrites markdown path refs in body text to the croc dialect by default:

Before After
[foo](foo.md) [[id:foo|foo]]
[Section X](target.md#section-x) [[id:target#section-x|Section X]]
[Data Glossary](../data_glossary.md) [[id:data-glossary|Data Glossary]]

Link text and anchors are preserved. Frontmatter links gets a strong entry for every migrated target (so Rule 5 — identity — is satisfied post-migration).

Pass --no-migrate-refs to adopt only the frontmatter shape and leave body content untouched — useful if you want to stage the migration separately.

Re-running on an adopted tree is safe. If a previously-adopted file grows new path-refs later (someone pastes a markdown link, a new doc lands), the next init --adopt reaches that file and migrates the new refs. Clean trees produce zero actions — the command is idempotent.

Unresolvable refs (target doesn't exist, or escapes the tree root, or uses non-lowercase .md extension) are left in place as raw markdown and surfaced as SKIP-REF notes. Brownfield trees always have some rot; adoption reports it rather than refusing to land.

Why not teach check to recognize path refs directly? Because path refs break on move — which is the exact failure mode croc exists to prevent. The checker's narrow [[id:X]] dialect IS the enforcement; loosening it would defeat the purpose.

--dry-run

Every mutating command (move, rename, init --adopt, init --adopt --migrate-refs) accepts --dry-run. It runs every validation and prints the plan but writes nothing.

--include-untracked / --no-include-untracked (global)

Global flag — name mirrors git stash --include-untracked. Controls which files croc considers when walking a tree. Takes effect only inside a git repo.

Mode Files walked
Default (--no-include-untracked) Tracked only — what git ls-files returns. Drafts you haven't git added yet are skipped.
--include-untracked Tracked + untracked-but-not-ignored. Useful while drafting new docs before committing.
Outside a git repo Flag has no effect; every file is walked.

Gitignored files are never touched when the walk is git-backed — same envelope in both modes. Applies to check, index, move, rename, init --adopt, crawl, molt, and refs.

# Default: only tracked files considered
croc check thoughts/

# Include in-progress drafts in the check
croc --include-untracked check thoughts/

Note: rename follows the same scope. Refs inside a filtered-out draft are not rewritten, so a draft containing [[id:old]] will still reference old after croc rename old new. Re-run with --include-untracked to update drafts, or let check catch the dangling ref when you eventually add the draft.

Concepts

Frontmatter

Every managed .md file has YAML frontmatter:

---
id: registry-pattern
title: Registry pattern
kind: leaf
links:
  - { to: design-index, strength: strong }
  - { to: obsidian-comparison, strength: weak }
---

The body can reference other docs: [[id:design-index]] or [[see:obsidian-comparison]].
Refs support optional anchors and display text: [[id:design-index#intro|the intro]].

Required fields: id, title, kind, links.

Ref dialect: [[id:X]], [[id:X#anchor]], [[id:X|display text]], [[id:X#anchor|display text]]. Only the id is load-bearing for invariant checking; the anchor and display text are preserved for renderers and consumers.

Id grammar: [A-Za-z0-9_.-]+. UUIDs, slugs, dotted namespaces all legal. Spaces and slashes aren't.

kind: self for directory index files (self.md), leaf for everything else.

Strong vs weak links

A strong link pins its target. If the target is deleted or renamed, the commit is refused.

links:
  - { to: adr-0042, strength: strong }

A weak link cites a target without pinning it. If the target is absent, the link is silently tolerated — it's the "see also" tier.

links:
  - { to: obsidian-comparison, strength: weak }

Use strong for load-bearing citations (a runbook referencing the ADR it implements). Use weak for breadcrumbs.

The five rules

croc check enforces:

  1. Ownership — every .md has a unique id.
  2. Schema — frontmatter has title, kind, links.
  3. No dangling ref — every [[id:X]] in body text resolves to a doc.
  4. Lifetime bound — strong links in frontmatter point to docs that exist.
  5. Identity stable — the set of strong links declared in frontmatter equals the set of [[id:X]] in the body.

Weak links are exempt from rules 3 and 4 by design.

Where croc fits (and doesn't)

Good fits

  • Engineering knowledge bases (ADRs, runbooks, postmortems that cite each other)
  • LLM/agent context stores where agents read and write the tree and need integrity guarantees
  • Compliance and audit trails where "the chain is unbroken" is the artifact
  • Internal dev docs at 50+ engineer companies where rot is the rule

Bad fits

  • Personal Zettelkasten — Obsidian's ergonomics (graph view, backlinks pane) beat a linter for daily use
  • Fast-moving drafts and brainstorming — the schema is friction before content exists
  • Teams that don't run CI or pre-commit hooks — the whole value is mechanical enforcement

Using croc in CI

As a pre-commit hook

Add to .pre-commit-config.yaml:

repos:
  - repo: local
    hooks:
      - id: croc-check
        name: croc check
        entry: uv run croc check path/to/docs/
        language: system
        pass_filenames: false
        files: ^path/to/docs/

Or as a plain .git/hooks/pre-commit:

#!/bin/sh
uv run croc check path/to/docs/ || exit 1

In GitHub Actions

- name: croc check
  run: |
    uv sync
    uv run croc check path/to/docs/

For contributors

Layout

croc/
├── croc/
│   ├── __init__.py
│   ├── check.py       # borrow checker; pure over list[Doc]
│   ├── crawl.py       # scaffold plain-markdown trees from source
│   └── ops.py         # transformations: move, rename, init, adopt, molt
├── main.py            # Typer CLI — thin wrapper around ops
├── tests/
│   ├── conftest.py    # shared fixtures (tmp_path trees)
│   ├── test_check.py  # parser + five rules
│   ├── test_cli.py    # Typer CLI surface + exit codes
│   ├── test_crawl.py  # plan/build, filters, adopt/molt cycle
│   └── test_ops.py    # move, rename, init, adopt, molt, dry-run
├── docs/design.md                    # full Rust-inspired rationale
├── examples/thoughts/                # canonical sample tree
├── examples/thoughts-from-code/      # crawl fixture (src + adopted output)
└── pyproject.toml

Design principles

Separation of concerns. check.py is verification (pure, no I/O). ops.py is transformation (parse → check → plan → simulate → commit). Each raises a typed exception — TreeError for parse failures, OpError for precondition failures — so the CLI can map cleanly to exit codes.

Validate-then-commit. No operation writes to disk until its plan has been simulated and re-checked in memory. If validation fails, nothing is half-committed; if the physical commit fails mid-sequence, a snapshot-based rollback restores the already-written files.

Newtype discipline. DocId and DocPath are distinct NewType aliases over str. The parser enforces the id grammar at the boundary, so runtime values match their declared types.

--dry-run is universal. Every mutating operation accepts dry_run=True and skips only the final commit step. The simulation machinery is the same either way, so dry-run and real runs exercise identical code paths.

Adding a new command

  1. Add the implementation to croc/ops.py. Raise OpError on precondition failures. Follow load → check → plan → simulate → commit.
  2. Wire the Typer command in main.py. Keep it thin — CLI only formats output and maps exceptions to exit codes.
  3. Add tests in tests/test_ops.py. Use the tmp_path + write_doc / sample_tree fixtures.
  4. Add a --dry-run flag if the command writes.

Running tests

uv sync --group dev
uv run pytest                # ~190 tests, ~0.3s
uv run pytest -v             # verbose
uv run pytest -k rename      # filter by name

The test suite encodes the guarantees as regressions. Notable cases:

  • test_failed_rename_leaves_tree_unchanged — fingerprints the tree, runs four failing renames back-to-back, asserts no file changed. Captures the validate-then-commit property.
  • test_post_adopt_check_passes — adopts a fresh unmanaged tree, then runs check. Proves init --adopt produces a sound tree out of the gate.
  • test_dry_run_writes_nothing (×3) — fingerprint-before / dry-run / fingerprint-after, applied uniformly across move, rename, and adopt.

Known limitations

  • YAML round-trip formatting. rename re-serializes frontmatter; inline flow style { to: X, strength: Y } may render as {to: X, strength: Y}. Cosmetic — swap yaml.dump for ruamel.yaml if formatting preservation matters.
  • No .crocignore. Trees with vendored READMEs or generated files need init --adopt pointed at a subdirectory.
  • Symlinked subtrees are not traversed. scan_symlinks emits warnings; the user decides whether to follow.

Releasing

Releases are cut through GitHub Releases — not a manual publish command. Publishing a release fires .github/workflows/publish.yml, which runs uv build && uv publish --trusted-publishing always inside the pypi environment via OIDC. There are no stored PyPI tokens to rotate.

The flow:

  1. Land changes on main. Ensure CI is green (style + test workflows).
  2. Bump version in pyproject.toml and __version__ in croc/__init__.py. Keep them in sync. Pre-1.0: minor bump for new features, patch bump for fixes.
  3. Promote the ## Unreleased section in CHANGELOG.md to ## X.Y.Z — YYYY-MM-DD. Add a fresh empty ## Unreleased above it.
  4. Commit as chore(release): vX.Y.Z. Duplicate the CHANGELOG section into the commit body so the commit stands alone.
  5. Push: git push origin main.
  6. Tag: git tag -a vX.Y.Z -m vX.Y.Z && git push origin vX.Y.Z.
  7. Create the GitHub Release from the tag: gh release create vX.Y.Z --notes-from-tag (or use the web UI). Publication triggers the publish workflow.

The make pypi target is a manual fallback only, not the canonical path. It runs uv build && uv publish locally and requires a PyPI token in the environment; it bypasses CI's clean build and the version/tag/changelog discipline above. Avoid it unless Trusted Publishing is down.

Further reading

  • docs/design.md — full design rationale and the Rust-to-croc mapping.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

croc_cli-0.4.0.tar.gz (73.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

croc_cli-0.4.0-py3-none-any.whl (35.7 kB view details)

Uploaded Python 3

File details

Details for the file croc_cli-0.4.0.tar.gz.

File metadata

  • Download URL: croc_cli-0.4.0.tar.gz
  • Upload date:
  • Size: 73.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.7 {"installer":{"name":"uv","version":"0.11.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for croc_cli-0.4.0.tar.gz
Algorithm Hash digest
SHA256 d91f169914cce09b26428f67ee6d371eec325b1453d25d585e8e9ecdbbf734af
MD5 ac022511566a53684c7ed3841e54f281
BLAKE2b-256 76f1cf0bfef8a10082e5b5dcd00f7c391fd5b8c5059b4fad72c1c7440c9a5d6c

See more details on using hashes here.

File details

Details for the file croc_cli-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: croc_cli-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 35.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.7 {"installer":{"name":"uv","version":"0.11.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for croc_cli-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4e37692a3f7429fa11eaaa7184096d98f8f8a5a25df3a5ddc00a608049df840d
MD5 25f8be76ed34c9736bd1e4791147d9ab
BLAKE2b-256 761992d974371f68ff58f146371856503e3e6625444ceafacc5530dd95da5dd8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page