Skip to main content

Plain-text source listing generator for AI context

Project description

Listing Generator

A tool for building dense contexts from source code: traverses projects, filters and normalizes files, then assembles them into a single clean Markdown document — perfect for ChatGPT/Copilot/Gemini/Claude and other LLM assistants.

In short: you store selection rules in lg-cfg/ (YAML + context templates), and LG renders "ready-to-paste" text or returns a JSON report with token statistics.


Why and Who Is It For

Target audience: developers, team leads, and technical writers who engage in dialogues with AI agents about real code, perform reviews, assign tasks, capture iteration context, while model window size is limited.

Why: modern agents work noticeably better when they see exactly the needed code with minimal noise: no junk from node_modules/, logs, generated files, huge binaries, etc. Manual preparation of such context is painful. LG automates:

  • selection of relevant files (by filters and extensions),
  • light normalization (e.g., Markdown headers, "trivial" __init__.py),
  • assembly into a single document with visible file markers,
  • .gitignore awareness,
  • changes mode (only modified files),
  • templates and contexts (section insertions and nested templates),
  • size/token estimation and shares ("who's eating the prompt").

There are many ways to form prompts and attach relevant code snippets: from manual copying to context embedding features in IDEs with integrated AI chats. LG differs by doing this systematically and reproducibly: rules are stored in the repository, not in your head or AI conversation history.

You describe what and how goes into the prompt in advance (through sections and templates). This enforces discipline, allows you to "tune" density and avoid overflowing the model window, as well as reproduce successful queries through saved templates.


What a "Healthy" AI Agent Workflow Looks Like

  1. Describe rules in the repository Create lg-cfg/sections.yaml and additional *.sec.yaml as needed. These describe sections (file sets + filters). Use *.tpl.md and *.ctx.md for templates and contexts.

  2. Build context Render: either a "section" (virtual context of one file set), or a "context" (template that can include multiple sections and other templates).

  3. Iteratively compress Check token statistics (who has the "heaviest share"), move secondary content to separate sections, include on demand. For "small updates" use --mode changes.

  4. Save successful prompts Contexts and templates (*.ctx.md and *.tpl.md) are your "well-working" query formats: reproducible, versionable, with variants for different tasks and agents.


Quick Start

Installation and Running

Requires Python ≥ 3.10.

Installation:

# Install from project directory
pip install -e .

Verification:

# Check via module
python -m lg.cli --version

# Or via installed command
listing-generator --version

Environment and cache check:

python -m lg.cli diag
python -m lg.cli diag --rebuild-cache

What Goes in lg-cfg/

Important: the configuration directory is always named lg-cfg/.

Example structure:

lg-cfg/
├─ sections.yaml           # sections file (can be in any directory)
├─ additional.sec.yaml     # additional section set (can have many)
├─ intro.tpl.md            # template (can have many, in any subfolders)
├─ onboarding.ctx.md       # context (can have many, in any subfolders)
└─ sub-fold/
   ├─ sections.yaml        # another sections.yaml (sections get sub-fold/ prefix)
   └─ extra.sec.yaml

Sections

  • sections.yaml — sections file. Can be in lg-cfg/ root and in any subdirectories.
    • In root: sections without prefix (e.g., docs, src)
    • In subdirectories: sections with directory prefix (e.g., adapters/src from lg-cfg/adapters/sections.yaml)
  • *.sec.yaml — additional section sets (fragments).

A section describes:

  • which file extensions to consider,
  • allow/block filters over the tree,
  • policy for empty files, code-fence, and language adapters.

Minimal example:

# Section for project documentation
docs:
  extensions: [".md"]
  markdown:
    # Normalize headings to H2 (outside fenced blocks), remove single H1 at start
    max_heading_level: 2
  filters:
    mode: allow            # default-deny within section
    allow:
      - "/README.md"
      - "/docs/**"

# Core-model submodule sources
core-model-src:
  extensions: [".py", ".md", ".yaml", ".json", ".toml"]
  skip_empty: true
  markdown:
    max_heading_level: 3
  filters:
    mode: allow
    allow:
      - "/core-model/**"
    children:
      core-model:
        mode: block
        block:
          - "**/.pytest_cache/**"
          - "/ROADMAP.md"

# Separate section for roadmap (as text)
core-model-roadmap:
  extensions: [".md"]
  filters:
    mode: allow
    allow:
      - "/core-model/ROADMAP.md"

Filters: How They Work

  • Rule tree — default-allow (mode: block) or default-deny (mode: allow).
  • At each level: first block, then (if node is allow) — strict check against allow. If mode: allow and path doesn't match local allow, it's immediately rejected.
  • block is always stronger than allow.
  • Project's .gitignore is respected.
  • LG also carefully doesn't descend into subtrees that won't yield anything (early pruner).

Contexts and Templates

  • Contexts: *.ctx.md (top-level documents).
  • Templates: *.tpl.md (fragments for insertion).

Example:

# Project Introduction

${tpl:intro}

## Core-model module source code

${core-model-src}

## Additional section

${sub-fold/extra/bar}

## Current task

${task}

Sections from root lg-cfg/sections.yaml are accessible directly (${docs}). Sections from subdirectory sections.yaml files have directory prefix (e.g., ${adapters/src} from lg-cfg/adapters/sections.yaml). Fragments use hierarchical paths: file sub-fold/extra.sec.yaml → section bar${sub-fold/extra/bar}.

Context-dependent references: From templates in subdirectories, you can use short names. Example: from lg-cfg/adapters/overview.ctx.md you can write ${src} and it will resolve to adapters/src.

Special placeholder ${task} inserts text from --task argument:

  • ${task} — simple insertion (empty string if not specified)
  • ${task:prompt:"default text"} — with default value
  • {% if task %}...{% endif %} — conditional block insertion

More details: templates.md.


Language Adapters

Listing Generator uses adapters for different languages and formats. They help "optimize" listings: remove junk, normalize headings, filter paragraphs, or even strip function bodies leaving only signatures. Adapter settings are specified right in section YAML — globally for the section or targeted to specific paths via targets.

Configuration Example

core:
  extensions: [".py", ".md"]
  skip_empty: true

  # Global rules for entire section
  python:
    strip_function_bodies: false

  markdown:
    max_heading_level: 2

  # Local overrides for specific folders and files
  targets:
    - match: "/pkg/**.py"
      python:
        strip_function_bodies: true      # only signatures in this folder

    - match: ["/docs/**.md", "/notes/*.md"]
      markdown:
        drop:
          sections:
            - match: { kind: regex, pattern: "^(License|Changelog|Contributing)$", flags: "i" }

In this example, the core section describes two languages. For Python, stripping function bodies is globally disabled, but inside the /pkg/ folder it's enabled. For Markdown, a general heading level is set, but in /docs/ and /notes/ paragraphs will additionally be filtered by specified patterns.

The match key accepts either a string or a list of glob patterns. When multiple rules match, the more specific (longer and more concrete) one wins; if equal — the later one in the list. This allows neatly layering local "overrides" on top of section settings.

Separate empty file policy (skip_empty at section level and empty_policy in adapters) works as if it's part of language options: the section sets the general strategy, and the adapter can refine it if needed. Possible values: empty_policy: inherit|include|exclude.


Available Adapters

Markdown

  • Normalize headings (remove lone H1, shift levels).
  • Systematically drop entire sections by headings (with subtree).
  • Remove YAML front matter at the beginning.
  • Insert placeholders in place of removed content (optionally).

More details: markdown.md.

Programming Languages

More details: adapters.md.


Token Statistics

To facilitate the process of optimizing listings and contexts, LG provides a summary report on token usage.

LG supports several open-source tokenization libraries (tiktoken, tokenizers, sentencepiece) and requires explicit specification of tokenization parameters on each run.

More details: tokenizers.md.


Adaptive Capabilities

All methods for creating universal templates and section configurations are described in the Adaptive Capabilities section.


CLI Options

General format:

listing-generator <command> <target> [--mode MODESET:MODE] [--tags TAG1,TAG2] [<additional_flags>]

# For render/report, tokenization parameters are required:
listing-generator render|report <target> \
  --lib <tiktoken|tokenizers|sentencepiece> \
  --encoder <encoder_name> \
  --ctx-limit <tokens>

Where <target>:

  • ctx:<name> — takes file lg-cfg/<name>.ctx.md (subfolders supported).
  • sec:<id> — virtual context of a single section (canonical ID).
  • <name> — searches first as ctx:<name>, otherwise as sec:<id>.

Commands:

  • render — output final text only (Markdown).
  • reportJSON report (format v5): statistics, files, context block.
  • list contexts|sections|tokenizer-libs|encoders — list available entities (JSON).
  • diag — environment/cache/config diagnostics (JSON), has --rebuild-cache.

Tokenization parameters:

  • --lib — tokenization library (tiktoken, tokenizers, sentencepiece)
  • --encoder — encoder/model name (e.g.: cl100k_base, gpt2, google/gemma-2-2b)
  • --ctx-limit — context window size in tokens (e.g.: 128000, 200000)

Examples:

# Render context from template with tokenization for GPT-4
listing-generator render ctx:onboarding \
  --lib tiktoken \
  --encoder cl100k_base \
  --ctx-limit 128000 > prompt.md

# Render "section only" (no template)
listing-generator render sec:core-model-src \
  --lib tiktoken \
  --encoder cl100k_base \
  --ctx-limit 128000 > prompt.md

# Same but only changed files in working tree
listing-generator render ctx:onboarding \
  --lib tiktoken \
  --encoder cl100k_base \
  --ctx-limit 128000 \
  --mode vcs:branch-changes > prompt.md

# JSON report with token stats for GPT-4o
listing-generator report ctx:onboarding \
  --lib tiktoken \
  --encoder o200k_base \
  --ctx-limit 200000 > report.json

# Report for Gemini using sentencepiece
listing-generator report ctx:onboarding \
  --lib sentencepiece \
  --encoder google/gemma-2-2b \
  --ctx-limit 1000000 > report.json

# Render context with current task description
listing-generator render ctx:dev \
  --lib tiktoken --encoder cl100k_base --ctx-limit 128000 \
  --task "Implement result caching"

# Multi-line task via stdin
echo -e "Tasks:\n- Fix bug #123\n- Add tests" | \
  listing-generator render ctx:dev --lib tiktoken --encoder cl100k_base --ctx-limit 128000 --task -

# Task from file
listing-generator render ctx:dev \
  --lib tiktoken --encoder cl100k_base --ctx-limit 128000 \
  --task @.current-task.txt

# Diagnostics
listing-generator diag
listing-generator diag --rebuild-cache

# Lists
listing-generator list contexts
listing-generator list sections
listing-generator list tokenizer-libs
listing-generator list encoders --lib tiktoken
listing-generator list encoders --lib tokenizers

How LG Renders Documents

  • If all files are Markdown/plain text, LG simply concatenates their content.

  • Otherwise:

    • with code-fence (default): blocks by languages, grouped in order of occurrence; inside each block — file marker # —— FILE: path ——, then content.
    • without code-fence: linear document with marker before each file.

This makes the prompt readable for humans and convenient for agents: it's clear where each fragment comes from.


Cache and Performance

LG uses file cache .lg-cache:

  • Processed cache — adapter results + their metadata.
  • Raw/Processed tokens — saved token counts (by model/mode).
  • Rendered tokens — count of final document ("with glue") and "sections-only".

Cache keys consider tool version, file fingerprint, adapter config, group composition, etc. Management: listing-generator diag, listing-generator diag --rebuild-cache. Can disable cache via LG_CACHE=0.


Practical Tips for "Dense" Contexts

  • Keep sections small and thematic. Better several sections than one "everything about everything".
  • Strict allow nodes use where full content predictability is needed.
  • Markdown templates apply as prompt "frame": brief intro, tasks, section placeholders.
  • changes mode — best friend for patch iterations and code review via LLM.
  • Watch shares (promptShare/ctxShare) in report: helps distribute "holding cost".
  • Normalize headings (max_heading_level) — makes reading long contexts easier.
  • Don't drag secrets. Configure block for artifacts/keys/secrets/binaries.

IDE/Plugin Integration

In most cases you'll run LG through integration (VS Code / JetBrains, etc.). Nevertheless, all selection/template logic lives in the repository (lg-cfg/), so:

  • reviewing and evolving rules is simple (via PRs),
  • transferring successful prompts between projects — trivial,
  • same configuration works in CLI and IDE.

License

Listing Generator is licensed under the Apache License, Version 2.0.
See the LICENSE file for the full license text.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

listing_generator-0.10.1.tar.gz (316.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

listing_generator-0.10.1-py3-none-any.whl (408.9 kB view details)

Uploaded Python 3

File details

Details for the file listing_generator-0.10.1.tar.gz.

File metadata

  • Download URL: listing_generator-0.10.1.tar.gz
  • Upload date:
  • Size: 316.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for listing_generator-0.10.1.tar.gz
Algorithm Hash digest
SHA256 8c4f61d4b9613deaf7b0fe4756bf89af117ce55ee2d214447c1eee9cb2a42697
MD5 a8e687d5d42f0389c7c8685acf8520f6
BLAKE2b-256 217afbf15efd88e14e4db13711d1900bb019ff85ea5f9c2999c032e7f9901d4c

See more details on using hashes here.

Provenance

The following attestation bundles were made for listing_generator-0.10.1.tar.gz:

Publisher: release.yml on Max-Moro/lg-cli

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file listing_generator-0.10.1-py3-none-any.whl.

File metadata

File hashes

Hashes for listing_generator-0.10.1-py3-none-any.whl
Algorithm Hash digest
SHA256 a4b4343e4fb99d2b47a16eba6865ac93e20cade54868fad5e6be2a1fba278ee7
MD5 b2ff83c036a264c3bbaf2aba6a18d3c2
BLAKE2b-256 b409f090135233957b95eb6aef030033cba9dab1aad68004f30a084491886b14

See more details on using hashes here.

Provenance

The following attestation bundles were made for listing_generator-0.10.1-py3-none-any.whl:

Publisher: release.yml on Max-Moro/lg-cli

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page