A smart PDF splitter that uses AI to extract chapters.
Project description
Folix ✂️
A smart, AI-powered PDF splitter.
Folix is a CLI tool designed to split large PDF textbooks and documents into clean, individual chapter files. Unlike standard splitters that blindly cut pages, Folix uses Mistral AI to parse the Table of Contents, automatically calculate page offsets, and handle complex layouts (like double‑column indices) with ease.
🚀 Features
-
📚 Smart Chapter Extraction Automatically detects chapters using native PDF bookmarks (ToC).
-
🤖 AI‑Powered Fallback If bookmarks are missing, Folix reads the visual Table of Contents page and uses Mistral AI to identify chapters.
-
🧠 Intelligent Offset Calculation Automatically aligns printed page numbers with the physical PDF structure .
-
👁️ Physical Layout Analysis Correctly parses multi‑column Tables of Contents that confuse standard PDF tools.
-
🔍 Interactive Inspection Visualizes the document structure so you can choose exactly which hierarchy level (Part, Chapter, Section) to extract.
-
🛠️ Zero‑Config CLI Simple commands for extracting, merging, and inspecting PDFs.
📦 Installation
Option A: Install via PyPI (Recommended)
pip install folix
Option B: Install from Source
git clone https://github.com/yourusername/folix.git
cd folix
pip install .
🔑 Setup (AI Features)
Folix works out‑of‑the‑box for PDFs that include standard bookmarks. For scanned books or files without metadata, you’ll need a free Mistral AI API key to enable automatic chapter detection.
1. Get an API Key
Sign up at https://console.mistral.ai (generous free tier available).
2. Set the Environment Variable
Mac / Linux
export MISTRAL_API_KEY="your_actual_key_here"
Windows (PowerShell)
$env:MISTRAL_API_KEY="your_actual_key_here"
📖 Usage
1. Extract Chapters
The primary command. Folix first attempts bookmark‑based extraction; if none are found, it automatically falls back to AI detection.
folix extract <file_name>
Options:
--level 1→ Extract top‑level items (e.g. Parts)--level 2→ Extract chapters
2. Interactive Mode
If you’re unsure how the document is structured, run extraction normally and Folix will guide you.
folix extract <file_name>
Example Output:
📘 Analyzing structure of: complex_book.pdf
--------------------------------------------------------------------------------
Lvl | Count | Samples (First 3 items)
--------------------------------------------------------------------------------
1 | 5 | Part I, Part II, Part III...
2 | 32 | 1. Introduction, 2. The Basics, 3. Advanced Topics...
--------------------------------------------------------------------------------
Select a Level to extract (or 'q' to quit):
3. Merge PDFs
Combine multiple PDFs into a single file.
folix merge <pdf_names> -output <output_file_name>
4. Manual Split
Split a page range manually.
folix split input.pdf --start <start_page> --end <end_page> --output <output_file_name>
🛠️ How It Works
Folix uses a three‑stage fallback system to ensure accurate chapter extraction:
-
Metadata Scan Detects native PDF bookmarks (Table of Contents).
-
AI Analysis If metadata is missing, Folix locates the visual Contents page, cleans the extracted text to reduce token usage, and sends it to Mistral AI for chapter identification.
-
Visual Anchor & Offset Alignment
- The AI may say: "Chapter 1 starts on page 1"
- Folix scans the PDF to find where "Chapter 1" physically appears (e.g. page 18)
- A global offset is calculated and applied to all chapters, ensuring precise cuts
🤝 Contributing
Contributions are welcome!
-
Fork the repository
-
Create your feature branch:
git checkout -b feature/amazing-feature
-
Commit your changes:
git commit -m "Add some amazing feature"
-
Push to the branch:
git push origin feature/amazing-feature
-
Open a Pull Request
📄 License
Distributed under the MIT License. See LICENSE
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file folix-1.0.1.tar.gz.
File metadata
- Download URL: folix-1.0.1.tar.gz
- Upload date:
- Size: 9.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2d0bae88b2df5e98db9fe44c4a821ae24b43e1b3f4a5d3984baf1b5c777c9f29
|
|
| MD5 |
afb9d77c0885d51ca5c3d76faa537c90
|
|
| BLAKE2b-256 |
4d227529d485509cc23e0836a96f69aad6ceb0eb2ed9e5d21fb7c87471cd2fba
|
File details
Details for the file folix-1.0.1-py3-none-any.whl.
File metadata
- Download URL: folix-1.0.1-py3-none-any.whl
- Upload date:
- Size: 9.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1b3eb262988290b325dd6b25865860de7bec4fd6bdcf115e8ae131ef6659f569
|
|
| MD5 |
078b7b9ead20751adabe8974a6bcbecb
|
|
| BLAKE2b-256 |
6494c66e2bc8535544e2539b795f9c73db1ee89c206eef4c9e7cf0e51d3ccd48
|