Your consulting portfolio, searchable and AI-ready.
Project description
folio
Your consulting portfolio, searchable and AI-ready.
What It Does
Folio converts PPTX, PPT, and PDF presentations into structured Markdown with YAML frontmatter, slide images, and optional LLM-powered analysis. Every conversion preserves three layers: verbatim text, slide images at configurable DPI, and per-slide classification with evidence grounding.
Folio tracks versions automatically -- re-converting an updated deck increments the version, detects per-slide changes, and preserves history. Open library/ as an Obsidian vault for automatic frontmatter indexing.
Install
pip install folio-love
The CLI command is folio.
For agent-friendly setup (Cursor, Claude Code), see Agentic Setup.
Or install from source:
git clone https://github.com/ohjonathan/folio.love.git
cd folio.love
pip install -e .
Anthropic support is included by default. For OpenAI or Google Gemini, install with extras:
pip install "folio-love[llm]" # from PyPI
pip install -e ".[llm]" # from source
Prerequisites
- Python 3.10+
- LibreOffice or Microsoft PowerPoint (for PPTX/PPT conversion)
- Poppler (for PDF image extraction)
# macOS
brew install --cask libreoffice
brew install poppler
# Ubuntu/Debian
sudo apt install libreoffice poppler-utils
Managed macOS (no LibreOffice)
If your machine blocks LibreOffice, Folio can use Microsoft PowerPoint as the renderer. Set pptx_renderer: powerpoint in folio.yaml, run batch jobs from Terminal.app, and keep a dedicated PowerPoint session with no unrelated presentations open. See Managed Mac workflow for the full workflow.
If neither renderer is available, export the deck to PDF manually and run folio convert deck.pdf.
Quick Start
First conversion
folio convert deck.pptx
✓ deck.pptx
24 slides → library/deck/deck.md
Version: 1 | ID: evidence_20260306_deck
With LLM analysis
export ANTHROPIC_API_KEY=sk-ant-...
folio convert deck.pptx --passes 2
Without a valid API key, analysis is skipped gracefully -- the conversion still completes.
Commands
folio convert
Convert a single deck to Folio markdown.
# Basic
folio convert deck.pptx
# With client and engagement metadata
folio convert deck.pptx --client Acme --engagement "DD Q1 2026"
# Deep analysis (two-pass, selective re-analysis of dense slides)
folio convert deck.pptx --passes 2
# Force fresh analysis, ignore cache
folio convert deck.pptx --no-cache
# Full metadata
folio convert deck.pptx \
--client Acme \
--engagement "DD Q1 2026" \
--subtype research \
--industry "retail,ecommerce" \
--tags "market-sizing,tam" \
--note "Updated risk figures"
Flags
| Flag | Description |
|---|---|
--client |
Client name (used in output path and frontmatter) |
--engagement |
Engagement identifier |
--note, -n |
Version note (e.g. "Updated per client feedback") |
--target, -t |
Override output directory |
--passes, -p |
Analysis depth: 1 = standard, 2 = deep (selective second pass on dense slides) |
--no-cache |
Force re-analysis; fresh results replace cached entries |
--subtype |
Evidence subtype: research, data_extract, external_report, benchmark |
--industry |
Industry tags, comma-separated |
--tags |
Manual tags to merge with auto-generated, comma-separated |
--llm-profile |
Override the configured LLM profile for this run |
folio batch
Batch convert all matching files in a directory.
# Convert all PPTX files in a directory
folio batch ./materials --client Acme
# Convert PDFs instead
folio batch ./pdfs --pattern "*.pdf" --client Acme
# Disable PowerPoint restart automation
folio batch ./materials --no-dedicated-session
Accepts the same flags as convert (--client, --engagement, --passes, --llm-profile, etc.). Default pattern is *.pptx. On macOS with PowerPoint, --dedicated-session (the default) enables periodic restart during long batch runs.
folio status
Show library health -- which decks are current, stale, or missing their source file. When an entity registry exists, status also reports the total entity count.
folio status
folio status Acme # scope to a client
folio status --refresh # re-check source hashes
Stale means the source file changed since the last conversion -- re-run folio convert on it. Missing means the source file can no longer be found at the original path.
folio scan
Scan configured source roots for new, stale, or missing files.
folio scan
folio scan --scope ClientA
Requires sources entries in folio.yaml (see Configuration).
folio refresh
Re-convert stale decks in the library.
folio refresh
folio refresh --scope ClientA/DD_Q1_2026
folio refresh --all # re-convert everything in scope, not just stale
folio promote
Promote a deck's curation level (L0 → L1 → L2 → L3).
folio promote <deck_id> L1
Validates required metadata per level (e.g. L1 requires client and tags). Use folio status to find deck IDs.
folio entities
Manage the entity registry -- people, departments, systems, and processes mentioned across your library.
# List all entities, grouped by type
folio entities
# Filter by type or show only unconfirmed
folio entities --type person
folio entities --unconfirmed
# JSON export
folio entities --json
# Show detail for a specific entity
folio entities show "Alice Chen"
folio entities show "Engineering" --type department
# Import from a CSV org chart
folio entities import org_chart.csv
# Confirm or reject auto-extracted entities
folio entities confirm "Jane Smith"
folio entities reject "Jnae Smith" # typo cleanup
CSV import format -- a name column is required; all others are optional:
| Column | Description |
|---|---|
name |
Person's canonical name (required) |
title |
Job title |
department |
Department name (auto-created if new) |
reports_to |
Manager name (resolved to registry key) |
aliases |
Semicolon-separated alternate names |
client |
Associated client |
The entity registry is stored as entities.json alongside registry.json in the library root. Imported entities are confirmed automatically; entities extracted during future ingest passes will be marked as needing confirmation.
Global flags: --verbose / -v (debug logging), --config / -c (path to folio.yaml)
Output Structure
library/
└── Acme/
└── dd_q1_2026/
└── market_overview/
├── market_overview.md # Full markdown with frontmatter
├── slides/
│ ├── slide-001.png
│ ├── slide-002.png
│ └── ...
├── .analysis_cache.json # LLM response cache
├── .texts_cache.json # Text extraction cache
└── version_history.json # Full version log
Example output (condensed):
---
id: acme_dd_q1_2026_evidence_20260306_market_overview
title: Market Overview
type: evidence
subtype: research
status: active
client: Acme
engagement: DD Q1 2026
version: 2
tags:
- ecommerce
- market-sizing
---
# Market Overview
**Source:** `/materials/market_overview.pptx`
**Version:** 2 | **Converted:** 2026-03-06
---
## Slide 1

### Text (Verbatim)
> Total Addressable Market: $4.2B
> Source: Industry Report 2025
### Analysis
**Slide Type:** data_heavy
**Framework:** TAM/SAM/SOM
**Key Data:** TAM $4.2B, SAM $1.8B, SOM $340M
**Evidence:**
- **TAM figure of $4.2B (high):** "Total Addressable Market: $4.2B" *(title)*
---
Configuration
Folio looks for folio.yaml by walking up from the current directory. All fields are optional.
# folio.yaml
library_root: ./library # Where converted decks are written
sources: # Optional; organize source directories
- name: materials
path: /path/to/source/decks
target_prefix: ""
llm:
profiles:
high_quality_anthropic:
provider: anthropic
model: claude-sonnet-4-20250514
api_key_env: ANTHROPIC_API_KEY
base_url_env: ANTHROPIC_BASE_URL # Optional enterprise gateway
fast_openai:
provider: openai
model: gpt-4o-mini
api_key_env: OPENAI_API_KEY
base_url_env: OPENAI_BASE_URL # Optional enterprise gateway
backup_google:
provider: google
model: gemini-2.5-pro
api_key_env: GEMINI_API_KEY
base_url_env: GEMINI_BASE_URL # Optional enterprise gateway
routing:
default:
primary: high_quality_anthropic
fallbacks: []
convert:
primary: high_quality_anthropic
fallbacks: [backup_google]
conversion:
image_dpi: 150 # Slide image resolution (px/in)
image_format: png
libreoffice_timeout: 60 # Seconds before conversion times out
default_passes: 1 # 1 = standard, 2 = deep
density_threshold: 2.0 # Pass 2 density trigger
pptx_renderer: auto # auto | libreoffice | powerpoint
With no folio.yaml, Folio uses sensible defaults: output goes to ./library, images render at 150 DPI, and analysis runs a single Anthropic-backed pass if ANTHROPIC_API_KEY is set.
| Environment Variable | Purpose |
|---|---|
ANTHROPIC_API_KEY |
Anthropic credentials (included in base install) |
OPENAI_API_KEY |
OpenAI credentials (requires folio-love[llm]) |
GEMINI_API_KEY |
Google Gemini credentials (requires folio-love[llm]) |
ANTHROPIC_BASE_URL |
Optional Anthropic-compatible gateway URL |
OPENAI_BASE_URL |
Optional OpenAI-compatible gateway URL |
GEMINI_BASE_URL |
Optional Gemini-compatible gateway URL |
Enterprise Gateways and Preflight Warnings
If you route Folio through an enterprise AI gateway, keep the gateway URL in an
environment variable and reference it from the profile with base_url_env.
If the env var is unset or blank, Folio silently falls back to the SDK default
endpoint.
Folio now runs a warning-only model preflight once per selected profile per conversion run. This checks whether the configured model appears usable before the first expensive pass. The probe is bounded and uses the same runtime guardrails as normal model calls. A warning does not block conversion; it simply surfaces blocked or unavailable models earlier.
Scanned and Image-Only PDFs
When a deck has no extractable text, Folio marks that text validation was
unavailable instead of treating the deck as if evidence validation failed.
Those decks still surface review flags, but they no longer get the old blanket
0.59 confidence cap just because the source is scanned.
Oversized PDF Page Fallback
Large architecture diagrams and poster-sized PDF pages can exceed Pillow safety limits at the requested DPI. Folio now backs off DPI per page before hitting that limit. If a page still cannot be rendered safely, conversion fails with a specific oversized-image error instead of a generic rendering failure.
OpenAI GPT-5 Compatibility
GPT-5 OpenAI chat models use a slightly different request shape from GPT-4.x
and GPT-4o. Folio handles that automatically by using
max_completion_tokens and omitting temperature for gpt-5* models while
preserving the existing request shape for non-GPT-5 models.
How It Works
Input (.pptx/.ppt/.pdf)
│
├─ Normalize ──→ Convert to PDF via LibreOffice or PowerPoint
│
├─ Images ─────→ Extract slide images, detect blank slides
│
├─ Text ───────→ Extract structured text per slide, reconcile count
│
├─ Analysis ───→ LLM classification + evidence extraction (cached)
│ Pass 2: selective re-analysis of dense slides
│
├─ Tracking ───→ Version detection, per-slide change diffing
│
└─ Assembly ───→ YAML frontmatter + Markdown output (atomic write)
Each stage is independent and testable. LLM analysis results are cached per-slide -- re-conversion only re-analyzes changed slides. Blank slides are detected via image histogram analysis and excluded from deep analysis.
Version Tracking
Re-converting an updated deck increments the version and records which slides were added, modified, or removed.
folio convert deck.pptx --note "Updated risk figures"
✓ deck.pptx
24 slides → library/deck/deck.md
Version: 2 | ID: evidence_20260306_deck
Modified: slides 3, 7, 12
Added: slides 24
Use folio status to find stale decks -- where the source file has changed since the last conversion.
Version history is recorded in both the markdown output and version_history.json:
| Version | Date | Changes | Note |
|---|---|---|---|
| v2 | 2026-03-06 | 3 modified, 1 added | Updated risk figures |
| v1 | 2026-03-01 | Initial (23 slides) | -- |
Development
python3 -m venv .venv
.venv/bin/python -m pip install --upgrade pip
.venv/bin/python -m pip install -e ".[dev]"
.venv/bin/python -m pytest tests/ -v
.venv/bin/python -m pytest --cov=folio
The test suite depends on dev-only packages such as python-pptx and reportlab, so run it from the project virtualenv after installing .[dev] rather than from an arbitrary system Python.
folio/
├── cli.py # Click CLI (convert, batch, status, scan, refresh, promote, entities)
├── config.py # FolioConfig + folio.yaml loading
├── converter.py # Pipeline orchestrator
├── entity_import.py # CSV org chart → entity registry
├── pipeline/
│ ├── normalize.py # PPTX/PPT → PDF
│ ├── images.py # PDF → slide images + blank detection
│ ├── text.py # Structured text extraction + reconciliation
│ └── analysis.py # LLM analysis + caching
├── output/
│ ├── frontmatter.py # YAML frontmatter (v2 schema)
│ └── markdown.py # Markdown assembly
└── tracking/
├── entities.py # Entity registry (people, departments, systems, processes)
├── registry.py # Deck registry + atomic JSON writes
├── sources.py # Source file tracking + staleness
└── versions.py # Version detection + change sets
Roadmap
- Entity resolution at ingest time -- automatically match names mentioned in slides to the entity registry (PR B, planned).
- Search and retrieval (
folio search) -- not yet implemented. Today, converted decks are searchable via Obsidian, grep, or any tool that reads Markdown + YAML frontmatter.
License
Apache 2.0
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file folio_love-0.4.0.tar.gz.
File metadata
- Download URL: folio_love-0.4.0.tar.gz
- Upload date:
- Size: 225.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
79d8985cab7cbf84d30da49f4cfa25d64931385599b2aec4ff94c5e96d55109a
|
|
| MD5 |
d2c557224341ec81ff6edeacc3827ebf
|
|
| BLAKE2b-256 |
e5cdb82c0b669a139161ec7576bd9a74ead990c6836e033b89754a1b50c1e0fc
|
Provenance
The following attestation bundles were made for folio_love-0.4.0.tar.gz:
Publisher:
publish.yml on ohjonathan/folio.love
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
folio_love-0.4.0.tar.gz -
Subject digest:
79d8985cab7cbf84d30da49f4cfa25d64931385599b2aec4ff94c5e96d55109a - Sigstore transparency entry: 1202672558
- Sigstore integration time:
-
Permalink:
ohjonathan/folio.love@d68fd8d055d7cf3ad492021c28fe067780b3348b -
Branch / Tag:
refs/tags/v0.4.0 - Owner: https://github.com/ohjonathan
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@d68fd8d055d7cf3ad492021c28fe067780b3348b -
Trigger Event:
push
-
Statement type:
File details
Details for the file folio_love-0.4.0-py3-none-any.whl.
File metadata
- Download URL: folio_love-0.4.0-py3-none-any.whl
- Upload date:
- Size: 241.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
94361773211c2e358f78e84c59ced480dafd1ae31b30f3e2e0bd011e8685ad6d
|
|
| MD5 |
ccfcc9094986c36f7700e986a803b116
|
|
| BLAKE2b-256 |
15a7705505ac47da475d2e3e8e924316563d00828dd6bac1d37512a004b3b179
|
Provenance
The following attestation bundles were made for folio_love-0.4.0-py3-none-any.whl:
Publisher:
publish.yml on ohjonathan/folio.love
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
folio_love-0.4.0-py3-none-any.whl -
Subject digest:
94361773211c2e358f78e84c59ced480dafd1ae31b30f3e2e0bd011e8685ad6d - Sigstore transparency entry: 1202672565
- Sigstore integration time:
-
Permalink:
ohjonathan/folio.love@d68fd8d055d7cf3ad492021c28fe067780b3348b -
Branch / Tag:
refs/tags/v0.4.0 - Owner: https://github.com/ohjonathan
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@d68fd8d055d7cf3ad492021c28fe067780b3348b -
Trigger Event:
push
-
Statement type: