Generate clean, readable PDFs from raw text or LLM output
Project description
langchain-pdf
Generate clean, readable, professional PDFs from raw text or Large Language Model (LLM) output.
langchain-pdf is designed for developers who want deterministic, well-formatted documents instead of messy markdown or broken PDFs.
✨ Why langchain-pdf?
Large Language Models often generate:
- markdown artifacts (
**bold**,---,1.lists) - inconsistent spacing
- duplicated headings
- orphan bullets
- blank pages in PDFs
langchain-pdf fixes all of that.
It introduces a proper document pipeline:
LLM Output → Normalize → Parse → Render → PDF
🚀 Features
- 🧠 Robust text normalization (handles messy LLM output)
- 📚 Structured document parsing (headings, paragraphs, bullets)
- 🖨️ Professional PDF rendering
- 🛑 No blank pages or orphan content
- 🔗 LangChain integration (Gemini ,OpenAI , Anthropic supported)
- 💻 CLI support (no Python code required)
- 🧪 Windows-tested (PowerShell friendly)
- 📦 Open-source & extensible
📄 Sample Outputs
Want to see what the generated PDFs look like?
👉 Check out the sample outputs here:
docs/outputs/
📦 Installation
Clone the repository
git clone https://github.com/your-username/langchain-pdf.git
cd langchain-pdf
Create and activate a virtual environment
python -m venv venv
Windows
venv\Scripts\activate
macOS / Linux
source venv/bin/activate
Install dependencies
pip install -r requirements.txt
pip install -e .
Set ONE of the following environment variables:
OPENAI_API_KEY(OpenAI)GOOGLE_API_KEYorGEMINI_API_KEY(Google Gemini)ANTHROPIC_API_KEY(Anthropic)
🔐 Environment Setup (for AI generation)
Create a .env file in the project root:
GOOGLE_API_KEY=your_gemini_api_key_here
OPENAI_API_KEY=your_gemini_api_key_here
ANTHROPIC_API_KEY=your_gemini_api_key_here
Optional LLM Providers
OpenAI:
pip install langchain-openai
Google Gemini:
pip install langchain-google-genai
Anthropic:
pip install langchain-anthropic
.envis ignored by Git and should never be committed.
🖥️ CLI Usage
1️⃣ Convert a text file to PDF
python -m langchain_pdf.cli input.txt output.pdf
Optional title:
python -m langchain_pdf.cli input.txt output.pdf --title "My Document"
2️⃣ Generate a PDF using LangChain (Gemini)
python -m langchain_pdf.cli \
--topic "Generative AI with LangChain" \
--out reports/course.pdf
This will:
- generate content using Gemini
- normalize messy output
- create a clean PDF automatically
3️⃣ Help
python -m langchain_pdf.cli --help
🧠 How It Works (Architecture)
┌──────────────┐
│ LLM / Text │
└──────┬───────┘
↓
┌──────────────┐
│ Normalizer │ ← removes markdown, noise, duplicates
└──────┬───────┘
↓
┌──────────────┐
│ Parser │ ← converts text → document blocks
└──────┬───────┘
↓
┌──────────────┐
│ Renderer │ ← layout-safe PDF rendering
└──────┬───────┘
↓
┌──────────────┐
│ PDF File │
└──────────────┘
📁 Project Structure
docs/
├── outputs/
│ ├── course_overview_sample.pdf
│ ├── resume_sample.pdf
│ └── README.md
langchain-pdf/
│
├── langchain_pdf/ # Core library
| ├──assets/
| ├──fonts/
| ├── DejaVuSans.ttf
| ├── DejaVuSans-Bold.ttf
| ├── LICENSE.txt
│ ├── __init__.py
│ ├── exporter.py
│ ├── normalizer.py
│ ├── parser.py
│ ├── renderer.py
│ ├── templates.py
│ └── cli.py
│
├── examples/ # Usage examples (not packaged)
│ ├── llm_factory.py
│ └── langchain_example.py
│
├── tests/ # Tests (optional)
│
├── README.md
├── requirements.txt
├── pyproject.toml
└── .env.example
🧪 Example Use Cases
- Generate course PDFs from LLMs
- Convert AI-generated reports into readable documents
- Create resumes, study material, or technical notes
- Build SaaS features that export PDFs
- Automate documentation pipelines
🤔 Is this made with AI?
Yes — and engineered by a human.
AI helps generate content.
langchain-pdf ensures that content is structured, readable, and professional.
The value is not generation — it’s control.
🛠️ Extending the Project
Planned / easy extensions:
- Support for local LLMs (Ollama)
- Batch PDF generation
- Themes (fonts, spacing)
- DOCX export
- Stream / stdin input
🤝 Contributing
Contributions are welcome.
If you:
- improve normalization
- add render themes
- support new LLMs
feel free to open a PR.
📜 License
MIT License — free to use, modify, and distribute.
⭐ Final Note
If you are tired of broken PDFs from AI output, langchain-pdf is built for you.
🔤 Fonts & Attribution
This project bundles the Inter font for consistent, readable PDF output.
Inter is licensed under the SIL Open Font License (OFL 1.1)
Font copyright © The Inter Project Authors.
The font license is included in:
langchain_pdf/assets/fonts/LICENSE.txt
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file langchain_pdf-0.2.0.tar.gz.
File metadata
- Download URL: langchain_pdf-0.2.0.tar.gz
- Upload date:
- Size: 681.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
493e96450f3949afca8233c392d7b4329ea77a6088526e3124e33295dbad0674
|
|
| MD5 |
26d3e172c7f35cde7d928922af704ae1
|
|
| BLAKE2b-256 |
307d1171dfd20b32b021b6cca46c75ea4e1ead7405160f74c5ab5d01f68021ab
|
File details
Details for the file langchain_pdf-0.2.0-py3-none-any.whl.
File metadata
- Download URL: langchain_pdf-0.2.0-py3-none-any.whl
- Upload date:
- Size: 682.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cc084e8228b1e7c1af9a61972402c53d71642ff2c605ef532e11f6b025ac3e13
|
|
| MD5 |
faa7ba1fb230bc0aa77f772f64213765
|
|
| BLAKE2b-256 |
80372a5e354931f6560b9da9e03a07ceef79100f5a3c14a07df4c0c49155dc89
|