Skip to main content

Generates LLM context by scraping and summarizing documentation for Python libraries listed in a requirements.txt file.

Project description

LLM-Min: Generate Compact Docs for LLMs

License: MIT

Problem: Large Language Models (LLMs) work best with focused, concise context. Feeding them entire documentation websites is inefficient and often counterproductive.

Solution: llm-min-generator automatically crawls Python library documentation and uses Google Gemini to generate compact, structured summaries (llm-min.txt) optimized for LLM consumption. It also saves the full crawled text (llm-full.txt) for reference.

Stop wasting tokens! Give your LLMs the focused context they need.

Key Features

  • Automated Crawling: Finds and scrapes official Python package docs.
  • LLM-Powered Summarization: Creates concise, structured summaries using the PCS (Progressive Compaction Strategy) via Google Gemini.
  • Flexible Input: Process packages from requirements.txt, folders, or direct input.
  • Easy Integration: Use via CLI or the Python LLMMinClient.
  • Organized Output: Saves results neatly per package (output_dir/package_name/).

Quick Start

1. Installation:

Using pip (Recommended for users):

pip install llm-min

For Development/Contribution (Using uv):

# Clone (if you haven't already)
# git clone <repository_url>
# cd llm-min-generator

# Install dependencies (using uv)
python -m venv .venv
source .venv/bin/activate # or .venv\Scripts\activate on Windows
uv pip install -r requirements.txt
uv pip install -e .

# Install browser binaries for crawling
playwright install

# Optional: Install pre-commit hooks for development
# uv pip install pre-commit
# pre-commit install

2. Configure API Key:

  • Recommended: Copy .env.example to .env and add your GEMINI_API_KEY. The application will automatically load it.
  • Alternatively: You can provide the key directly using the --gemini-api-key CLI flag or pass it as the api_key parameter when initializing LLMMinClient in Python.

3. Generate Docs (CLI Example):

Process packages from a requirements file and save to my_llm_docs:

llm-min-generator -f path/to/your/requirements.txt -o my_llm_docs
  • Use -pkg "requests\ntyper" for direct package input.
  • Use -d /path/to/project to find requirements.txt in a folder.
  • See llm-min-generator --help for more options (crawl depth, chunk size, etc.).

4. Generate Docs (Python Client Example):

from llm_min.client import LLMMinClient
import os

# Assumes GEMINI_API_KEY is in .env or environment
try:
    client = LLMMinClient()

    # Example: Compact existing text content
    long_text = "Your very long documentation text here..."
    subject = "My Custom Library"
    compacted_text = client.compact(content=long_text, subject=subject)

    print(f"--- Compacted {subject} ---")
    print(compacted_text)

    # You can also use client.process_package("package_name")
    # or client.process_requirements("path/to/requirements.txt")
    # See client documentation for details.

except (ValueError, FileNotFoundError) as e:
    print(f"Error initializing client (API Key or PCS Guide missing?): {e}")
except Exception as e:
    print(f"An error occurred: {e}")

Output

For each package, you'll get:

output_dir/
└── package_name/
    ├── llm-full.txt  # Raw crawled content
    └── llm-min.txt   # Compacted PCS content for LLMs

What is PCS (Packed Code Syntax)?

PCS is a highly condensed, machine-centric format designed for representing code structure and essential metadata with maximum information density. It uses single-character codes, minimal delimiters, and positional context to create a compact, single-string representation optimized for LLM context windows.

Think of it as a "minified" version of code documentation, focusing purely on the structural elements and relationships an LLM needs to understand an API or library, discarding natural language explanations. The full specification can be found in docs/pcs-guide.md.

Contributing

Contributions are welcome! See CONTRIBUTING.md (if available) or focus on improving discovery, compaction, LLM support, or tests.

License

MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_min-0.1.2.tar.gz (39.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llm_min-0.1.2-py3-none-any.whl (29.2 kB view details)

Uploaded Python 3

File details

Details for the file llm_min-0.1.2.tar.gz.

File metadata

  • Download URL: llm_min-0.1.2.tar.gz
  • Upload date:
  • Size: 39.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for llm_min-0.1.2.tar.gz
Algorithm Hash digest
SHA256 cf0db436f07bb27cc5bba7707096339ecb348ad879a12ff89927a920abbddc58
MD5 e43fde6fba6279de67d881296dead38c
BLAKE2b-256 c1cccede59ddfcddde3b5331adc014e45c7b6e813aaca228bd429d0174f6345d

See more details on using hashes here.

Provenance

The following attestation bundles were made for llm_min-0.1.2.tar.gz:

Publisher: publish.yml on marv1nnnnn/llm-min.txt

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file llm_min-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: llm_min-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 29.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for llm_min-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 fa847d473ab45d1201f0fc8a1994d80185368f941a45b72f5a9a018c1b3ff442
MD5 78a24d444745b1ed3f9c9216c28362f4
BLAKE2b-256 ba82e2bf0e15930ee66e6b26bf50626a6400d58aea1a12091643379f9a288832

See more details on using hashes here.

Provenance

The following attestation bundles were made for llm_min-0.1.2-py3-none-any.whl:

Publisher: publish.yml on marv1nnnnn/llm-min.txt

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page