Generates LLM context by scraping and summarizing documentation for Python libraries listed in a requirements.txt file.
Project description
LLM-Min: Generate Compact Docs for LLMs
Problem: Large Language Models (LLMs) work best with focused, concise context. Feeding them entire documentation websites is inefficient and often counterproductive.
Solution: llm-min-generator automatically crawls Python library documentation and uses Google Gemini to generate compact, structured summaries (llm-min.txt) optimized for LLM consumption. It also saves the full crawled text (llm-full.txt) for reference.
Stop wasting tokens! Give your LLMs the focused context they need.
Key Features
- Automated Crawling: Finds and scrapes official Python package docs.
- LLM-Powered Summarization: Creates concise, structured summaries using the PCS (Progressive Compaction Strategy) via Google Gemini.
- Flexible Input: Process packages from
requirements.txt, folders, or direct input. - Easy Integration: Use via CLI or the Python
LLMMinClient. - Organized Output: Saves results neatly per package (
output_dir/package_name/).
Quick Start
1. Installation:
Using pip (Recommended for users):
pip install llm-min
For Development/Contribution (Using uv):
# Clone (if you haven't already)
# git clone <repository_url>
# cd llm-min-generator
# Install dependencies (using uv)
python -m venv .venv
source .venv/bin/activate # or .venv\Scripts\activate on Windows
uv pip install -r requirements.txt
uv pip install -e .
# Install browser binaries for crawling
playwright install
# Optional: Install pre-commit hooks for development
# uv pip install pre-commit
# pre-commit install
2. Configure API Key:
- Recommended: Copy
.env.exampleto.envand add yourGEMINI_API_KEY. The application will automatically load it. - Alternatively: You can provide the key directly using the
--gemini-api-keyCLI flag or pass it as theapi_keyparameter when initializingLLMMinClientin Python.
3. Generate Docs (CLI Example):
Process packages from a requirements file and save to my_llm_docs:
llm-min-generator -f path/to/your/requirements.txt -o my_llm_docs
- Use
-pkg "requests\ntyper"for direct package input. - Use
-d /path/to/projectto findrequirements.txtin a folder. - See
llm-min-generator --helpfor more options (crawl depth, chunk size, etc.).
4. Generate Docs (Python Client Example):
from llm_min.client import LLMMinClient
import os
# Assumes GEMINI_API_KEY is in .env or environment
try:
client = LLMMinClient()
# Example: Compact existing text content
long_text = "Your very long documentation text here..."
subject = "My Custom Library"
compacted_text = client.compact(content=long_text, subject=subject)
print(f"--- Compacted {subject} ---")
print(compacted_text)
# You can also use client.process_package("package_name")
# or client.process_requirements("path/to/requirements.txt")
# See client documentation for details.
except (ValueError, FileNotFoundError) as e:
print(f"Error initializing client (API Key or PCS Guide missing?): {e}")
except Exception as e:
print(f"An error occurred: {e}")
Output
For each package, you'll get:
output_dir/
└── package_name/
├── llm-full.txt # Raw crawled content
└── llm-min.txt # Compacted PCS content for LLMs
What is PCS (Packed Code Syntax)?
PCS is a highly condensed, machine-centric format designed for representing code structure and essential metadata with maximum information density. It uses single-character codes, minimal delimiters, and positional context to create a compact, single-string representation optimized for LLM context windows.
Think of it as a "minified" version of code documentation, focusing purely on the structural elements and relationships an LLM needs to understand an API or library, discarding natural language explanations. The full specification can be found in docs/pcs-guide.md.
Contributing
Contributions are welcome! See CONTRIBUTING.md (if available) or focus on improving discovery, compaction, LLM support, or tests.
License
MIT License.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llm_min-0.1.3.tar.gz.
File metadata
- Download URL: llm_min-0.1.3.tar.gz
- Upload date:
- Size: 39.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
58372af033c38807d3ee0e99931f16f3b9a7ffeb211bb403e5ba8369db4dad14
|
|
| MD5 |
dc8a0b98d9504297550ad66f0dcd3d93
|
|
| BLAKE2b-256 |
98ebb4d3a5eca82324285bc96bd1da89872c1998671e1da211f36e9f44525c8d
|
Provenance
The following attestation bundles were made for llm_min-0.1.3.tar.gz:
Publisher:
publish.yml on marv1nnnnn/llm-min.txt
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
llm_min-0.1.3.tar.gz -
Subject digest:
58372af033c38807d3ee0e99931f16f3b9a7ffeb211bb403e5ba8369db4dad14 - Sigstore transparency entry: 204506475
- Sigstore integration time:
-
Permalink:
marv1nnnnn/llm-min.txt@ca9eae796f22ad77810f201dcacecc71cf584a45 -
Branch / Tag:
refs/tags/v0.1.3 - Owner: https://github.com/marv1nnnnn
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@ca9eae796f22ad77810f201dcacecc71cf584a45 -
Trigger Event:
push
-
Statement type:
File details
Details for the file llm_min-0.1.3-py3-none-any.whl.
File metadata
- Download URL: llm_min-0.1.3-py3-none-any.whl
- Upload date:
- Size: 29.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
49d5f0ee845444891b036f0aa7ce094456200194515fd8966759a90f860cbe5f
|
|
| MD5 |
e763a589dce1ffce66215243d85490b0
|
|
| BLAKE2b-256 |
4a888fe6e107281a8461c5868c67cd725dfa876cee7ea936c12830ffbf94ff44
|
Provenance
The following attestation bundles were made for llm_min-0.1.3-py3-none-any.whl:
Publisher:
publish.yml on marv1nnnnn/llm-min.txt
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
llm_min-0.1.3-py3-none-any.whl -
Subject digest:
49d5f0ee845444891b036f0aa7ce094456200194515fd8966759a90f860cbe5f - Sigstore transparency entry: 204506478
- Sigstore integration time:
-
Permalink:
marv1nnnnn/llm-min.txt@ca9eae796f22ad77810f201dcacecc71cf584a45 -
Branch / Tag:
refs/tags/v0.1.3 - Owner: https://github.com/marv1nnnnn
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@ca9eae796f22ad77810f201dcacecc71cf584a45 -
Trigger Event:
push
-
Statement type: