Skip to main content

Token-optimized documentation generator for AI coding agents

Project description

L-SDF: Latent-Structured Documentation Format

PyPI version

Cuts AI input token costs significantly compared with reading source in navigation-heavy coding sessions. Works with Claude Code, Cursor, and Copilot today.

LSDF demo

L-SDF is an agent-first documentation format that maps codebases into compact index files AI agents can scan before opening source code — optimized for token density rather than human readability.

Token Economics & ROI

In a typical coding session, source code is re-sent to the API across many turns. L-SDF replaces that with a compact structural map agents can scan first.

Example from a typical Python repository (21 files, ~110K tokens of source, ~8K tokens of L-SDF indices), measured over a 50-turn session:

Scenario Session Cost Savings with L-SDF
Source code, no caching $5.81 90%
Source code, with prompt caching $2.03 73%
L-SDF indices + caching $0.55

Modern agents (Claude Code, Cursor, Codex, Copilot) use prompt caching, so the middle row is the realistic baseline — L-SDF can cut input token costs by ~70% compared with reading source, even with prompt caching, in navigation-heavy coding sessions.

Assumptions: Claude Sonnet input pricing ($3/M tokens, $0.30/M cached read, $3.75/M cache write); 80% prompt-cache hit rate; 20% of turns drill into source for ~10K uncached tokens with L-SDF; and without L-SDF, agents incur an additional 15% raw-source orientation overhead on top of drilldowns. Output tokens identical across scenarios and excluded. Numbers vary with repo size, agent behavior, and model choice. Savings apply to input tokens only; output tokens and tool call costs are not modeled.

The Philosophy: Agent-First vs. Human-First

Human-first documentation includes prose and formatting valuable to readers, but expensive when repeatedly loaded into AI coding sessions. L-SDF is designed for agents:

  • Sigils as hard anchors: Symbols like @, !, and ~ give agents stable structural anchors without inferring meaning from prose.
  • Compact context: L-SDF fits a repo-level architecture map into a small context window, available before the agent opens any source file.

The Hello World Example

Here is what L-SDF does to a typical Python file. Given examples/helloworld/hello.py:

"""Minimal hello-world CLI example.

Usage:
- Call `Greeter().say_hello(name)` to greet one name and return the message.
- Call `Greeter().greet(names)` to greet a list of non-empty names in order.
- Call `run()` to parse command-line arguments and execute the CLI flow.
"""
import sys

DEFAULT_NAME = "World"

class Greeter:
    def say_hello(self, name: str) -> str:
        if not name:
            raise ValueError("Name must not be empty")
        message = f"Hello, {name}!"
        print(message)
        return message

    def greet(self, names: list[str]) -> list[str]:
        return [self.say_hello(n) for n in names if n.strip()]

def parse(argv: list[str]) -> list[str]:
    names = [a.strip() for a in argv if a.strip()]
    return names if names else [DEFAULT_NAME]

def run() -> None:
    Greeter().greet(parse(sys.argv[1:]))

if __name__ == "__main__":
    run()

Running lsdf gen examples/helloworld produces two index files.

INDEX.lsdf — compact navigation map (what exists):

@hello.py
 ~sys
 @Greeter
  !say_hello
  !greet
 !parse
 !run

INDEX.detail.lsdf — compact contract and call-edge map (how to call it):

@hello.py
 ~sys
 @Greeter
  !say_hello(name:s):s
  !greet(names:[s]):[s] > say_hello
 !parse(argv:[s]):[s]
 !run > Greeter.greet,parse

INDEX.lsdf keeps only the navigation skeleton. INDEX.detail.lsdf adds compact signatures and call edges while still omitting implementation bodies. Module docstrings are not extracted into detail indices, so high-level usage notes can stay in the source file without bloating the agent-facing view. self is omitted, () is omitted for zero-argument functions, and standard type aliases replace verbose names (s=str, a=Any, [s]=list[str], q[s]=Sequence[str], l[...]=Literal[...]).

Source (hello.py) INDEX.lsdf INDEX.detail.lsdf
Tokens ~221 ~15 ~34
Savings ~15× fewer ~6.5× fewer

This example uses a very small source file, so the detail index has less room to compress. In a more typical repository, L-SDF index files are often about 10-20x smaller than the source they summarize.

An agent navigating the repo reads INDEX.lsdf first. It only opens INDEX.detail.lsdf when it needs signatures or call edges, and opens hello.py only when it needs the implementation body.


Quick Start

Status: Draft v1.1 format. Current generator supports Python repositories. Other language generators are welcome.

1. Install

A. For Users (Global Access)

To use L-SDF across any project on your system, install it as a global utility. This ensures the lsdf command is available regardless of which specific project environment you have active.

Install pipx first if you do not already have it. The recommended approach is to use your operating system's package manager. For example, on Ubuntu or Debian:

sudo apt install pipx
pipx ensurepath

Then install the L-SDF CLI tool:

pipx install lsdf-core

Verify the installation:

lsdf --help

B. For Contributors (Local Repo / Editable Install)

If you have this repository checked out locally and want changes in your working tree to be reflected immediately in the CLI, install it in editable mode with pipx:

pipx ensurepath
cd ~/github/lsdf-core
# force reinstall even if lsdf-core is already installed
pipx install -e . --force

If you want to modify the L-SDF source code or run the test suite:

conda env create -f environment.yml
conda activate lsdf-dev
pytest tests/
# or, without pytest:
PYTHONPATH=. python3 -m unittest tests.test_core -v

2. Initialize Any Repo

Now, you can navigate to any other project and bootstrap it with L-SDF support:

# 1. Move to your target project
cd ~/github/my-other-project

# 2. Initialize (creates .lsdf/, .lsdfignore, and project.lsdf)
lsdf init

This creates:

  • project.lsdf: A high-level root manifest that records the detected stack, important top-level directories, and major frameworks. For example:

    ^my-other-project:Python
     @docs:documentation
     @scripts:automation
     @src:main-code
     @tests:test-suite
     ~[Pydantic,Pytest]
     !myapp=src.cli:main
    $lsdf:1.1.0
    
  • .lsdf/lsdf_instructions.md: The protocol instruction for AI agents — loaded into agent config files automatically.

    lsdf init automatically appends it to any agent config files it finds (CLAUDE.md, AGENTS.md, .cursorrules, .github/copilot-instructions.md, CONVENTIONS.md). Files that don't exist are skipped; files that already contain the instructions are left untouched. Re-running lsdf init is safe.

    If you add a new agent config file later, re-run lsdf init to append the instructions automatically. For agent tools not in the list above, append manually:

    cat .lsdf/lsdf_instructions.md >> <your-agent-config-file>
    
  • .lsdf/lsdf_spec.md: The compact syntax reference agents can consult without loading the full SPEC.md.

  • .lsdfignore: A file to prevent the indexer from wasting tokens on folders like node_modules or __pycache__.

If your project's top-level structure or stack changes later, run lsdf init again to refresh project.lsdf.

To also add a GitHub Actions workflow that auto-regenerates indices on every push, pass --ci:

lsdf init --ci

This adds .github/workflows/update-lsdf.yml. On every push it installs lsdf-core from PyPI, regenerates INDEX.lsdf and INDEX.detail.lsdf files, and commits any changes back to the branch. Requires GitHub Actions to have write permission on the repository. Re-running lsdf init --ci is safe — it will not overwrite an existing workflow.

3. Generate Indices

Scan your source code to generate or update INDEX.lsdf and INDEX.detail.lsdf maps in your source directories.

lsdf gen . --recursive

Run lsdf stats after your first generation to see exactly how much you're saving on your next AI coding session.


Index Drift and Sync

A stale index is worse than no index. If an agent trusts an out-of-date index, it can generate code against the wrong signatures just as confidently as if they were correct. Drift is the failure mode you have to design against.

L-SDF gives you three layers of defense:

1. Auto-regeneration after each structural edit.

After any structural edit, the AI agent is instructed to run lsdf gen <dir>. You should do the same when making structural edits manually.

2. lsdf sync as an enforcement check. Run it in CI or as a pre-commit hook:

lsdf sync . --check

The exit code is non-zero if any index file is out of date relative to source. Wire this into your CI’s required checks and stale indices stop reaching main.

3. Auto-regeneration on each push via lsdf init --ci.

This gives you the strongest enforcement, but it requires write permissions on the branch and may create noisy history. Use it in repos where index accuracy matters more than a perfectly clean commit log.


AI Agent Integration

L-SDF works with your existing AI tools by providing them with a "map" to read before they ever touch your source code.

The Agent Workflow

  1. Read project.lsdf at the root.
  2. Read the nearest INDEX.lsdf to navigate structure (what exists).
  3. If signatures or contracts are needed, read INDEX.detail.lsdf (how to call it, call edges).
  4. Open source files only when implementation bodies are required.
  5. After structural edits, update both index files with lsdf gen <dir>.

Compare Agent Behavior

You can compare agent behavior with and without LSDF guidance.

Suggested Prompts

List the main entry points, pipeline stages, and external dependencies in src. Do it once using LSDF files first, and once by reading raw source only. Show the files opened and tokens used in both cases.

Find all functions in src that accept a Pydantic model, TypedDict, or dataclass-like schema as input. Do it with and without LSDF guidance. Show the files opened and tokens used in both cases.

If we rename a core function in src, what other functions, routes, or callers would likely need updates? Answer once using LSDF files first, and once using raw source only. Show the files opened and tokens used in both cases.


The L-SDF Spec

In L-SDF, sigils act as single-character semantic tags. Instead of wasting tokens on verbose words like class, function, or import, the AI reads a single character and instantly understands the architectural role of the line.

The L-SDF Sigil Table

Sigil Name Meaning / Purpose Python Equivalent
^ Root Project-level stack, global configuration, or environment. pyproject.toml / env
@ Entity A structural boundary like a file, class, module, or service. hello.py / class User:
! Function Logic flow, method, function, or executable step. def login():
~ Dependency External requirements, imports, or libraries. import requests
? Schema Data types, interfaces, variable shapes, or database models. pydantic.BaseModel
$ Annotation Important comments, notes, docstrings, or rationale. # TODO: handle legacy fallback
# Route API endpoint, webhook, or URL path. @app.get("/users")

Note: sigils like #, @, and ! may resemble host-language syntax, but the overlap is only cosmetic: sigils live in dedicated .lsdf files and are interpreted by the L-SDF format, not by the host language parser.

See SPEC.md for the full specification.


CLI Commands

  • lsdf init: Bootstrap a repo for L-SDF.
  • lsdf gen: Generate or update INDEX.lsdf and INDEX.detail.lsdf from source code.
  • lsdf stats: Estimate session cost and savings.
  • lsdf sync: Verify that indices match the current source code.
  • lsdf trans: Translate .lsdf to Markdown.
  • lsdf clean: Remove generated indices and optional bootstrap files.

See docs/CLI.md for more details.


Current Limitations

  • The current generator supports Python repositories.
  • The format is Draft v1.1 and may evolve before a stable 2.0 spec.
  • Generated call edges are structural hints, not a complete static-analysis call graph.

License

MIT


Contributing

L-SDF is an open standard. We welcome new generators for different languages (Go, Rust, TS.)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lsdf_core-1.1.6.tar.gz (45.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lsdf_core-1.1.6-py3-none-any.whl (29.4 kB view details)

Uploaded Python 3

File details

Details for the file lsdf_core-1.1.6.tar.gz.

File metadata

  • Download URL: lsdf_core-1.1.6.tar.gz
  • Upload date:
  • Size: 45.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for lsdf_core-1.1.6.tar.gz
Algorithm Hash digest
SHA256 36c32f1c2245b8c711cdd124820899ee85a7ef6e52b3086a55e70aba18024482
MD5 09a3429a2c09a27201cb75ffda891ed7
BLAKE2b-256 f0dfd6372f53d8e0bdb67d1f72d6638c14763406c121c8a46f29868e1f6b38df

See more details on using hashes here.

Provenance

The following attestation bundles were made for lsdf_core-1.1.6.tar.gz:

Publisher: publish.yml on ec1980/lsdf-core

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file lsdf_core-1.1.6-py3-none-any.whl.

File metadata

  • Download URL: lsdf_core-1.1.6-py3-none-any.whl
  • Upload date:
  • Size: 29.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for lsdf_core-1.1.6-py3-none-any.whl
Algorithm Hash digest
SHA256 ee29b03109e406f734d698a65692e77bc7207e991105dd8f76de25119537fe49
MD5 6dd45d366ff08e853c07c1fc8f4276c5
BLAKE2b-256 14fd6b7c65bea62ddb9ba2b1381ed3d19ac2992e404e0941d6da5a41a425c6b8

See more details on using hashes here.

Provenance

The following attestation bundles were made for lsdf_core-1.1.6-py3-none-any.whl:

Publisher: publish.yml on ec1980/lsdf-core

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page