Pre-compile office documents into compact knowledge graphs for LLM sessions
Project description
Foliograph
Pre-compile office documents into compact knowledge graphs for LLM sessions.
Inspired by Graphify for code, Foliograph does the same for office documents: .docx, .pdf, .pptx, .md, and .txt.
Instead of loading entire documents into every session, you build the graph once and navigate by index. Token costs drop by 60-90% on document-heavy projects.
The skill generates the graph. You keep the graph. The only thing you install is
SKILL.md.
Get started in 3 steps (claude.ai)
No terminal. No installation. Works entirely in your browser.
Step 1: Download SKILL.md
Download SKILL.md from this repository, then click the Raw button and save the file.
Step 2: Add it to your Claude Project
- Go to claude.ai and open or create a Project
- Click the project name at the top of the left sidebar
- Click Add content (or the + icon next to Files)
- Upload
SKILL.md - That is it. The skill is now active for every conversation in this Project.
Step 3: Use it
Upload any .docx, .pdf, .pptx, .md, or .txt file into a conversation and say:
foliograph this
You will get FOLIO_TIPS.md with your document map, key concepts, token savings, and ready-made commands. Ask for a visual dashboard with:
Show me an executive dashboard
What you need:
- A claude.ai account (Free, Pro, or Team)
- A Project (available on all plans)
- The
SKILL.mdfile from this repo
Demo
See Foliograph in action: Watch the Demo Video
The Problem
Every new LLM session on a large document project starts blind. You paste the whole chapter, the whole spec, the whole report, because you don't know what the model will need. By message three you've burned most of your context window on content the model never touched.
Foliograph fixes this structurally:
Without Foliograph:
Session start -> paste Chapter 4 (8,000 tokens) -> ask one question -> done
Next session -> paste Chapter 4 again (8,000 tokens) -> ...
With Foliograph:
Session start -> load FOLIO_GRAPH.md (~400 tokens) -> "load Chapter 4 § The Swarm Model"
-> fetch only that section (~600 tokens) -> done
The graph is built once. Every subsequent session pays only the index cost.
Quickstart (Python CLI)
pip install foliograph
foliograph build my_project/ --name "My Project"
This produces three files in your working directory:
| File | Purpose |
|---|---|
FOLIO_GRAPH.md |
Structural skeleton of every document: headings, summaries, word counts, figures, tables |
FOLIO_INDEX.md |
Concept to location index (168+ entries for a typical book) |
FOLIO_SESSION.md |
Copy-paste session starter prompt for any LLM |
Installation
# Core (no heavy dependencies)
pip install foliograph
# With Python library support for each format
pip install "foliograph[docx]"
pip install "foliograph[pdf]"
pip install "foliograph[pptx]"
pip install "foliograph[all]"
CLI usage
# Single file
foliograph build report.docx --name "Q3 Report"
# Multiple files
foliograph build chapter1.docx chapter2.docx appendix.pdf --name "My Book"
# Entire directory (recursive)
foliograph build ./manuscript/ --output ./graph/ --name "My Book"
# Check for drift against existing graph
foliograph check --graph FOLIO_GRAPH.md
# Fetch a specific section to stdout
foliograph fetch "chapter4.docx § The Swarm Model"
# Token savings stats
foliograph stats FOLIO_GRAPH.md
# Generate HTML savings dashboard
foliograph stats-html FOLIO_GRAPH.md
Python API
from foliograph.builder import build
from foliograph.extractor import extract
# Build graph from a list of files or directories
outputs = build(
sources=["chapter1.docx", "appendix.pdf", "./slides/"],
output_dir="./graph/",
project_name="My Project",
)
# Extract a single document
rec = extract("report.docx")
print(rec.title)
print(rec.total_words)
for section in rec.sections:
print(f" {' ' * section.level}{section.title} ({section.word_count}w)")
How to use the graph in a session
- Start every session by pasting the content of
FOLIO_SESSION.md - Ask questions by concept: "What does the book say about Channel Siloing?"
- Load sections on demand: "Load escalation_intelligence.md § The Swarm Model"
- Never reload a section you've already discussed in the session
Supported formats
| Format | Extension | Extraction method |
|---|---|---|
| Word Document | .docx |
extract-text / python-docx |
.pdf |
pdftotext / pdfminer.six |
|
| PowerPoint | .pptx |
extract-text / python-pptx |
| Markdown | .md |
Native parser |
| Plain Text | .txt |
Native parser |
Output format
FOLIO_GRAPH.md (structure map)
### `chapter4.docx` [DOCX]
**Title:** The Swarm Model
**Words:** 2,847
**Structure:**
- **The Swarm Model**
> Replacing the Hierarchy with Parallel Expert Engagement.
- **Why Sequential Escalation Fails at Scale** (187w)
> The sequential model has a structural bottleneck at every tier boundary.
- **How AI Assembles the Swarm** (312w)
> Swarm assembly uses four criteria evaluated simultaneously.
**Key Terms:** Algorithmic Friction, Agent Churn, Escalation Debt, Feedback Loop
FOLIO_INDEX.md (concept index)
### S
- **Sentiment Drift** -> `chapter2.docx § Signal 1: Sentiment Drift`
- **Swarm Model** -> `chapter4.docx § The Swarm Model`
Real-world example
The examples/sample/ directory contains a worked example showing Foliograph output on a plain markdown document. Open FOLIO_GRAPH.md and FOLIO_INDEX.md to see the structure.
Architecture
foliograph/
├── extractor.py # Per-format extraction -> DocumentRecord
├── builder.py # DocumentRecord[] -> FOLIO_GRAPH.md + FOLIO_INDEX.md
├── relationships.py # Cross-document relationship mapping
├── drift.py # Graph drift detection
├── stats_html.py # Token savings HTML dashboard
└── cli.py # foliograph build / check / fetch / stats
Contributing
Contributions welcome. The most valuable additions are:
- Better named-entity extraction
.xlsxsupport (sheet names, column headers, key cell ranges)- Google Docs / Notion export support
foliograph updatecommand for incremental rebuilds
Open an issue before starting a large feature. Some of these are already in progress.
git clone https://github.com/prasad-m-k/foliograph
cd foliograph
pip install -e ".[dev]"
pytest tests/
License
MIT. See LICENSE.
Author
Prasad MK Research: ssrn.com/author=10270516
Acknowledgements
Foliograph is directly inspired by Graphify by Safi Shamsi, which demonstrated the same approach for codebases. The core insight is to pay the indexing cost once, query from the graph every session, and that insight belongs to that project. Foliograph extends it to office documents and to claude.ai chat environments where no terminal or IDE is available.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file foliograph-0.4.0.tar.gz.
File metadata
- Download URL: foliograph-0.4.0.tar.gz
- Upload date:
- Size: 34.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
95e84e4342b6ae45007fdfc2f6e6721cb07caaf0387b74b40386bd9834a897ef
|
|
| MD5 |
43ade40fb33566eefb409fc3e34335f4
|
|
| BLAKE2b-256 |
e16b8b8575410ae11cf72d4bf3ed5da916d451c131edfa2c8250e776af5e94a2
|
File details
Details for the file foliograph-0.4.0-py3-none-any.whl.
File metadata
- Download URL: foliograph-0.4.0-py3-none-any.whl
- Upload date:
- Size: 32.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
239a66ae6771031d4200ae84d0ae4ad819c46ceec1111bf903af5b0f429a3829
|
|
| MD5 |
82bda19d9cdb578335ce8f1504ebb8ea
|
|
| BLAKE2b-256 |
8d6242148dadaeee757110ba95730a864414521429d896193c431c850dd5c870
|