Skip to main content

Generates LLM context by scraping and summarizing documentation for Python libraries listed in a requirements.txt file.

Project description

LLM-Min: Generate Compact Docs for LLMs

License: MIT

Problem: Large Language Models (LLMs) work best with focused, concise context. Feeding them entire documentation websites is inefficient and often counterproductive.

Solution: llm-min-generator automatically crawls Python library documentation and uses Google Gemini to generate compact, structured summaries (llm-min.txt) optimized for LLM consumption. It also saves the full crawled text (llm-full.txt) for reference.

Stop wasting tokens! Give your LLMs the focused context they need.

Key Features

  • Automated Crawling: Finds and scrapes official Python package docs.
  • LLM-Powered Summarization: Creates concise, structured summaries using the PCS (Progressive Compaction Strategy) via Google Gemini.
  • Flexible Input: Process packages from requirements.txt, folders, or direct input.
  • Easy Integration: Use via CLI or the Python LLMMinClient.
  • Organized Output: Saves results neatly per package (output_dir/package_name/).

Quick Start

1. Installation:

Using pip (Recommended for users):

pip install llm-min

For Development/Contribution (Using uv):

# Clone (if you haven't already)
# git clone <repository_url>
# cd llm-min-generator

# Install dependencies (using uv)
python -m venv .venv
source .venv/bin/activate # or .venv\Scripts\activate on Windows
uv pip install -r requirements.txt
uv pip install -e .

# Install browser binaries for crawling
playwright install

# Optional: Install pre-commit hooks for development
# uv pip install pre-commit
# pre-commit install

2. Configure API Key:

  • Recommended: Copy .env.example to .env and add your GEMINI_API_KEY. The application will automatically load it.
  • Alternatively: You can provide the key directly using the --gemini-api-key CLI flag or pass it as the api_key parameter when initializing LLMMinClient in Python.

3. Generate Docs (CLI Example):

Process packages from a requirements file and save to my_llm_docs:

llm-min-generator -f path/to/your/requirements.txt -o my_llm_docs
  • Use -pkg "requests\ntyper" for direct package input.
  • Use -d /path/to/project to find requirements.txt in a folder.
  • See llm-min-generator --help for more options (crawl depth, chunk size, etc.).

4. Generate Docs (Python Client Example):

from llm_min.client import LLMMinClient
import os

# Assumes GEMINI_API_KEY is in .env or environment
try:
    client = LLMMinClient()

    # Example: Compact existing text content
    long_text = "Your very long documentation text here..."
    subject = "My Custom Library"
    compacted_text = client.compact(content=long_text, subject=subject)

    print(f"--- Compacted {subject} ---")
    print(compacted_text)

    # You can also use client.process_package("package_name")
    # or client.process_requirements("path/to/requirements.txt")
    # See client documentation for details.

except (ValueError, FileNotFoundError) as e:
    print(f"Error initializing client (API Key or PCS Guide missing?): {e}")
except Exception as e:
    print(f"An error occurred: {e}")

Output

For each package, you'll get:

output_dir/
└── package_name/
    ├── llm-full.txt  # Raw crawled content
    └── llm-min.txt   # Compacted PCS content for LLMs

What is PCS (Packed Code Syntax)?

PCS is a highly condensed, machine-centric format designed for representing code structure and essential metadata with maximum information density. It uses single-character codes, minimal delimiters, and positional context to create a compact, single-string representation optimized for LLM context windows.

Think of it as a "minified" version of code documentation, focusing purely on the structural elements and relationships an LLM needs to understand an API or library, discarding natural language explanations. The full specification can be found in docs/pcs-guide.md.

Contributing

Contributions are welcome! See CONTRIBUTING.md (if available) or focus on improving discovery, compaction, LLM support, or tests.

License

MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_min-0.1.3.tar.gz (39.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llm_min-0.1.3-py3-none-any.whl (29.2 kB view details)

Uploaded Python 3

File details

Details for the file llm_min-0.1.3.tar.gz.

File metadata

  • Download URL: llm_min-0.1.3.tar.gz
  • Upload date:
  • Size: 39.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for llm_min-0.1.3.tar.gz
Algorithm Hash digest
SHA256 58372af033c38807d3ee0e99931f16f3b9a7ffeb211bb403e5ba8369db4dad14
MD5 dc8a0b98d9504297550ad66f0dcd3d93
BLAKE2b-256 98ebb4d3a5eca82324285bc96bd1da89872c1998671e1da211f36e9f44525c8d

See more details on using hashes here.

Provenance

The following attestation bundles were made for llm_min-0.1.3.tar.gz:

Publisher: publish.yml on marv1nnnnn/llm-min.txt

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file llm_min-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: llm_min-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 29.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for llm_min-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 49d5f0ee845444891b036f0aa7ce094456200194515fd8966759a90f860cbe5f
MD5 e763a589dce1ffce66215243d85490b0
BLAKE2b-256 4a888fe6e107281a8461c5868c67cd725dfa876cee7ea936c12830ffbf94ff44

See more details on using hashes here.

Provenance

The following attestation bundles were made for llm_min-0.1.3-py3-none-any.whl:

Publisher: publish.yml on marv1nnnnn/llm-min.txt

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page