A RAG-based cheat sheet generator for books and papers

These details have not been verified by PyPI

Project links

Project description

ReadAnyBook 📚

A RAG-based cheat sheet generator that transforms books and papers into structured, 12-page LaTeX cheat sheets.

Features

Multi-format Document Support: PDF, EPUB, HTML, LaTeX, Markdown
Intelligent Chunking: Math-aware and code-aware text splitting
Hybrid Retrieval: Dense embeddings + BM25 with reciprocal rank fusion
Multi-pass Generation: Separate extraction for concepts, formulas, algorithms, and models
LaTeX Output: Professional cheat sheets compiled to PDF
Multiple LLM Backends: HuggingFace, Ollama, vLLM, OpenAI-compatible APIs
Vector Store Options: ChromaDB, Qdrant, Weaviate

Quick Start

Installation

# Basic installation
pip install readanybook

# With CLI support
pip install readanybook[cli]

# With all features
pip install readanybook[all]

From Source

git clone https://github.com/readanybook/readanybook.git
cd readanybook
pip install -e ".[dev]"

Usage

Command Line

# Generate a cheat sheet from a PDF
read-any-book build document.pdf -o cheatsheet.pdf

# Use a specific profile
read-any-book build document.pdf --profile math_paper

# Index a document
read-any-book index document.pdf --collection my_collection

# Search indexed documents
read-any-book search "gradient descent" --collection my_collection

Python API

from readanybook import CheatSheetPipeline, Settings

# Initialize pipeline
settings = Settings()
pipeline = CheatSheetPipeline(settings)

# Process document
pipeline.ingest("textbook.pdf")
pipeline.index(collection_name="textbook")

# Generate cheat sheet
content = pipeline.generate_content()
cheat_sheet = pipeline.build(content, "output/cheatsheet.pdf")

print(f"Generated: {cheat_sheet.pdf_path}")

REST API

# Start the API server
uvicorn readanybook.api:app --host 0.0.0.0 --port 8000

# Upload a document
curl -X POST "http://localhost:8000/upload" \
  -F "file=@document.pdf" \
  -F "collection_name=my_docs"

# Generate cheat sheet
curl -X POST "http://localhost:8000/generate" \
  -H "Content-Type: application/json" \
  -d '{"collection_name": "my_docs", "title": "My Cheat Sheet"}'

Configuration

Create a config.yaml file or use environment variables:

# Embedding model
embedding:
  model_name: "BAAI/bge-base-en-v1.5"
  device: "cuda"

# Vector store
vectordb:
  store_type: "chroma"
  persist_directory: "./data/chroma"

# LLM settings
llm:
  backend: "ollama"
  model_name: "llama3:8b"

# Retrieval
retrieval:
  mode: "hybrid"
  top_k: 15
  
# LaTeX output
latex:
  columns: 2
  font_size: 10
  paper_size: "a4paper"

Configuration Profiles

Use built-in profiles for different document types:

# For technical books
read-any-book build book.pdf --profile technical_book

# For math papers
read-any-book build paper.pdf --profile math_paper

# For non-technical books
read-any-book build novel.pdf --profile nontechnical_book

Architecture

ReadAnyBook follows a modular pipeline architecture with clear separation between ingestion, retrieval, and generation layers.

System Design

┌─────────────┐    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐
│  Ingestion  │───▶│  Chunking   │───▶│  Indexing   │───▶│  Retrieval  │───▶│ Generation  │───▶│   LaTeX     │
│  (PDF/EPUB) │    │  (Math-     │    │ (Embeddings)│    │  (Hybrid    │    │  (LLM +     │    │  (Compile   │
│             │    │   aware)    │    │             │    │   Search)   │    │   RAG)      │    │   to PDF)   │
└─────────────┘    └─────────────┘    └──────┬──────┘    └──────┬──────┘    └──────┬──────┘    └─────────────┘
                                             │                  │                  │
                                             ▼                  ▼                  ▼
                                      ┌─────────────┐    ┌─────────────┐    ┌─────────────┐
                                      │ Vector DB   │    │ BM25 Index  │    │ LLM Backend │
                                      │ (ChromaDB)  │    │             │    │  (Ollama/   │
                                      │             │    │             │    │   HF/vLLM)  │
                                      └─────────────┘    └─────────────┘    └─────────────┘

Key Components

Component	Description
Ingestion	Multi-format document parsing (PDF, EPUB, HTML, LaTeX)
Chunking	Hierarchical, semantic, or fixed-size with math/code awareness
Indexing	BGE/E5 embeddings stored in ChromaDB/Qdrant/Weaviate
Retrieval	Hybrid dense+sparse search with RRF fusion and cross-encoder reranking
Generation	Multi-pass extraction: concepts, formulas, algorithms, models
Output	Jinja2 LaTeX templates compiled to 12-page PDF

Package Structure

readanybook/
├── core/           # Domain logic
│   ├── ingestion.py    # Document parsing
│   ├── chunking.py     # Text splitting
│   ├── indexing.py     # Embedding & indexing
│   ├── retrieval.py    # Hybrid retrieval
│   ├── models.py       # LLM clients
│   ├── prompts.py      # Prompt templates
│   └── pipeline.py     # Main orchestrator
├── generation/     # Content generation
│   ├── concepts.py     # Concept extraction
│   ├── formulas.py     # Formula extraction
│   ├── algorithms.py   # Algorithm synthesis
│   ├── models_theory.py # Model summarization
│   └── latex_builder.py # LaTeX generation
├── evaluation/     # Quality metrics
│   ├── rag_eval.py     # RAG evaluation
│   └── metrics.py      # Content metrics
├── infra/          # Infrastructure
│   ├── settings.py     # Configuration
│   ├── vectordb.py     # Vector stores
│   ├── logging.py      # Logging
│   └── tracing.py      # Observability
├── api/            # REST API
├── cli/            # Command line interface
├── templates/      # LaTeX templates
└── config/         # Default configs

Design Principles

Hexagonal Architecture: Domain services isolated from external adapters
Configuration-Driven: All behavior controlled via Pydantic settings
Pluggable Backends: LLM, vector store, and embedding model abstractions
Observability: Structured logging and tracing throughout

📄 Full documentation: See docs/architecture.pdf for the complete software architecture document.

Requirements

Python 3.10+
PyTorch 2.0+
LaTeX distribution (for PDF compilation)
- TeX Live, MiKTeX, or Tectonic

LaTeX Installation

# Ubuntu/Debian
sudo apt install texlive-full

# macOS
brew install --cask mactex

# Or use Tectonic (lightweight)
cargo install tectonic

Development

# Install dev dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Format code
black readanybook tests
isort readanybook tests

# Type check
mypy readanybook

# Lint
ruff check readanybook

Examples

See the examples directory for:

Processing academic papers
Creating ML textbook cheat sheets
Custom template usage
API integration examples

License

MIT License - see LICENSE for details.

Contributing

Contributions welcome! Please read CONTRIBUTING.md first.

Acknowledgments

Built with 🤗 Transformers, ChromaDB, and FastAPI
Inspired by the need for better study materials

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.23

Jan 11, 2026

0.1.22

Jan 11, 2026

0.1.21

Jan 11, 2026

0.1.20

Jan 11, 2026

0.1.19

Jan 11, 2026

0.1.18

Jan 11, 2026

0.1.17

Jan 11, 2026

0.1.16

Jan 11, 2026

0.1.15

Jan 11, 2026

0.1.14

Jan 11, 2026

0.1.13

Jan 11, 2026

0.1.12

Jan 11, 2026

0.1.11

Jan 11, 2026

0.1.10

Jan 11, 2026

0.1.9

Jan 11, 2026

0.1.8

Jan 11, 2026

This version

0.1.7

Jan 11, 2026

0.1.6

Jan 11, 2026

0.1.5

Jan 11, 2026

0.1.4

Jan 11, 2026

0.1.3

Jan 11, 2026

0.1.2

Jan 11, 2026

0.1.1

Jan 11, 2026

0.1.0

Jan 11, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

readanybook-0.1.7.tar.gz (83.9 kB view details)

Uploaded Jan 11, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

readanybook-0.1.7-py3-none-any.whl (93.7 kB view details)

Uploaded Jan 11, 2026 Python 3

File details

Details for the file readanybook-0.1.7.tar.gz.

File metadata

Download URL: readanybook-0.1.7.tar.gz
Upload date: Jan 11, 2026
Size: 83.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for readanybook-0.1.7.tar.gz
Algorithm	Hash digest
SHA256	`18773f92cccd324ab16197d791359a5d9ef5e05dec62bcc3fc9768ae1eeba922`
MD5	`8ad777ab641028334a084ac4e2d4cede`
BLAKE2b-256	`5ee5b5b967e1bb493b96f05cdf65eadf981287c6ea5fc0deeef853b2ed24ef62`

See more details on using hashes here.

File details

Details for the file readanybook-0.1.7-py3-none-any.whl.

File metadata

Download URL: readanybook-0.1.7-py3-none-any.whl
Upload date: Jan 11, 2026
Size: 93.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for readanybook-0.1.7-py3-none-any.whl
Algorithm	Hash digest
SHA256	`567c5ca13fa288cea1997cf9c8a38827b6afab073e3a438da6a7e8fbd0750031`
MD5	`f4324cf22e36c2f9d23e7e5e38e69a32`
BLAKE2b-256	`c9e68548dc27d51859f92cc6087e0142af27dd24e25e9bd6ba322aeabc24d116`

See more details on using hashes here.

readanybook 0.1.7

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

ReadAnyBook 📚

Features

Quick Start

Installation

From Source

Usage

Command Line

Python API

REST API

Configuration

Configuration Profiles

Architecture

System Design

Key Components

Package Structure

Design Principles

Requirements

LaTeX Installation

Development

Examples

License

Contributing

Acknowledgments

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes