Your consulting portfolio, searchable and AI-ready.
Project description
folio
Your consulting portfolio, searchable and AI-ready.
What It Does
Turn consulting decks into structured, searchable markdown -- with version tracking and optional AI analysis.
Folio converts PPTX, PPT, and PDF presentations into Markdown with YAML frontmatter, slide images, and optional LLM-powered analysis. Every conversion preserves three layers: exact verbatim text, slide images at configurable DPI, and per-slide analysis with evidence grounding.
Folio tracks versions automatically -- re-converting an updated deck increments the version, detects per-slide changes, and preserves history. Open library/ as an Obsidian vault and frontmatter is indexed automatically.
Quick Start
Prerequisites
- Python 3.10+
- LibreOffice or Microsoft PowerPoint (for PPTX/PPT conversion)
- Poppler (for PDF image extraction)
# macOS
brew install --cask libreoffice
brew install poppler
# Ubuntu/Debian
sudo apt install libreoffice poppler-utils
If you're on a managed macOS laptop that blocks LibreOffice, Folio can use
Microsoft PowerPoint as the PPTX/PPT renderer. The current PowerPoint path opens
decks via Launch Services (open -a "Microsoft PowerPoint" ...) and then exports
to PDF. In batch mode, Folio can also restart PowerPoint periodically during
long PPTX runs when --dedicated-session is enabled (the default).
For managed-mac usage:
- Run batch jobs from
Terminal.app - Use a dedicated PowerPoint session with no unrelated presentations open
- See docs/guides/managed_mac_workflow.md for the full workflow and PDF fallback guidance
You can force a specific renderer with pptx_renderer: powerpoint in
folio.yaml. If neither renderer is available, export the deck to PDF in
PowerPoint and run folio convert deck.pdf.
Install
pip install folio-love
The installed CLI command remains folio.
Or install from source:
git clone https://github.com/ohjonathan/folio.love.git
cd folio.love
pip install -e .
Anthropic support is included in the base install. If you want to use OpenAI or Google Gemini, install the optional provider SDKs too:
pip install "folio-love[llm]"
From source:
pip install -e ".[llm]"
First conversion
folio convert deck.pptx
✓ deck.pptx
24 slides → library/deck/deck.md
Version: 1 | ID: evidence_20260306_deck
Enable LLM analysis
export ANTHROPIC_API_KEY=sk-ant-...
folio convert deck.pptx --passes 2
Folio now supports Anthropic, OpenAI, and Google Gemini for slide analysis.
Configure named profiles in folio.yaml, then either use the default convert
route or override it per run with --llm-profile.
llm:
profiles:
high_quality_anthropic:
provider: anthropic
model: claude-sonnet-4-20250514
api_key_env: ANTHROPIC_API_KEY
fast_openai:
provider: openai
model: gpt-4o-mini
api_key_env: OPENAI_API_KEY
backup_google:
provider: google
model: gemini-2.5-pro
api_key_env: GEMINI_API_KEY
routing:
default:
primary: high_quality_anthropic
fallbacks: []
convert:
primary: high_quality_anthropic
fallbacks: [backup_google]
# Uses llm.routing.convert
folio convert deck.pptx --passes 2
# Force a specific profile for this run (disables route fallbacks)
folio convert deck.pptx --llm-profile fast_openai
Without a valid provider SDK or API key, analysis is skipped gracefully. The tool still completes conversion and writes provider-aware pending-analysis messages into the markdown output.
Commands
folio convert
Convert a single deck to Folio markdown.
# Basic
folio convert deck.pptx
# With client and engagement metadata
folio convert deck.pptx --client Acme --engagement "DD Q1 2026"
# Deep analysis (two-pass, selective re-analysis of dense slides)
folio convert deck.pptx --passes 2
# Force fresh analysis, ignore cache
folio convert deck.pptx --no-cache
# Full metadata
folio convert deck.pptx \
--client Acme \
--engagement "DD Q1 2026" \
--subtype research \
--industry "retail,ecommerce" \
--tags "market-sizing,tam" \
--note "Updated risk figures"
Flags
| Flag | Description |
|---|---|
--client |
Client name (used in output path and frontmatter) |
--engagement |
Engagement identifier |
--note, -n |
Version note (e.g. "Updated per client feedback") |
--target, -t |
Override output directory |
--passes, -p |
Analysis depth: 1 = standard, 2 = deep (selective second pass on dense slides) |
--no-cache |
Force re-analysis; fresh results replace cached entries |
--subtype |
Evidence subtype: research, data_extract, external_report, benchmark |
--industry |
Industry tags, comma-separated |
--tags |
Manual tags to merge with auto-generated, comma-separated |
--llm-profile |
Override the configured LLM profile for this command |
folio batch
Batch convert all matching files in a directory.
# Automated PPTX conversion
folio batch ./materials --client Acme
# PDF mitigation workflow (not Tier 1)
folio batch ./pdfs --pattern "*.pdf" --client Acme
# Skip restart automation if other presentations are open in PowerPoint
folio batch ./materials --no-dedicated-session
Converting 3 files...
✓ overview.pptx (18 slides, 4.1s)
✓ financials.pptx (32 slides, 7.8s)
✓ appendix.pptx (12 slides, 2.9s)
Automated PPTX: 3 succeeded, 0 failed
Accepts the same flags as convert (--client, --engagement, --passes, --llm-profile, etc.). Default pattern is *.pptx.
batch also supports --dedicated-session/--no-dedicated-session for the
PowerPoint restart workflow on managed macOS. Operator-exported PDF batches are
supported, but they are mitigation-only and do not count toward Tier 1 automated
conversion goals.
folio status
Show library health -- which decks are current, stale, or missing their source file.
folio status
folio status Acme # scope to a client
Library: 5 decks
✓ Current: 3
⚠ Stale: 1
✗ Missing source: 1
Stale:
Acme/dd_q1_2026/financials/financials.md
Missing:
Acme/dd_q1_2026/appendix/appendix.md (source: /materials/appendix.pptx)
Stale means the source file changed since the last conversion -- re-run folio convert on it. Missing means the source file can no longer be found at the original path.
Global flags: --verbose / -v (debug logging), --config / -c (path to folio.yaml)
Output Structure
library/
└── Acme/
└── dd_q1_2026/
└── market_overview/
├── market_overview.md # Full markdown with frontmatter
├── slides/
│ ├── slide-001.png
│ ├── slide-002.png
│ └── ...
├── .analysis_cache.json # LLM response cache
├── .texts_cache.json # Text extraction cache
└── version_history.json # Full version log
Example output (condensed):
---
id: acme_dd_q1_2026_evidence_20260306_market_overview
title: Market Overview
type: evidence
subtype: research
status: active
source: /materials/market_overview.pptx
source_hash: a1b2c3d4e5f6
version: 2
created: 2026-03-01T10:00:00Z
modified: 2026-03-06T14:30:00Z
client: Acme
engagement: DD Q1 2026
industry:
- ecommerce
- retail
frameworks:
- TAM/SAM/SOM
tags:
- ecommerce
- market-sizing
- retail
_llm_metadata:
convert:
requested_profile: high_quality_anthropic
profile: high_quality_anthropic
provider: anthropic
model: claude-sonnet-4-20250514
fallback_used: false
status: executed
pass2:
status: skipped
reason: pass_disabled
---
# Market Overview
**Source:** `/materials/market_overview.pptx`
**Version:** 2 | **Converted:** 2026-03-06
**Status:** △ Current
---
## Slide 1

### Text (Verbatim)
> Total Addressable Market: $4.2B
> Source: Industry Report 2025
### Analysis
**Slide Type:** data_heavy
**Framework:** TAM/SAM/SOM
**Key Data:** TAM $4.2B, SAM $1.8B, SOM $340M
**Main Insight:** Market sizing shows serviceable segment at 43% of TAM
**Evidence:**
- **TAM figure of $4.2B (high):** "Total Addressable Market: $4.2B" *(title)*
- **SAM represents 43% of TAM (medium, pass 2):** "SAM $1.8B" *(body)* [unverified]
---
Configuration
Folio looks for folio.yaml by walking up from the current directory. All
fields are optional, and the example below shows a multi-provider setup rather
than the minimal default config.
# folio.yaml — example multi-provider configuration
library_root: ./library # Where converted decks are written
sources: # Optional; organize source directories
- name: materials
path: /path/to/source/decks
target_prefix: ""
llm:
profiles:
high_quality_anthropic:
provider: anthropic
model: claude-sonnet-4-20250514
api_key_env: ANTHROPIC_API_KEY
fast_openai:
provider: openai
model: gpt-4o-mini
api_key_env: OPENAI_API_KEY
backup_google:
provider: google
model: gemini-2.5-pro
api_key_env: GEMINI_API_KEY
routing:
default:
primary: high_quality_anthropic
fallbacks: []
convert:
primary: high_quality_anthropic
fallbacks: [backup_google]
conversion:
image_dpi: 150 # Slide image resolution (px/in)
image_format: png
libreoffice_timeout: 60 # Seconds before conversion times out
default_passes: 1 # 1 = standard, 2 = deep
density_threshold: 2.0 # Pass 2 density trigger
pptx_renderer: auto # auto | libreoffice | powerpoint
Legacy shorthand is still supported for Anthropic-only setups:
llm:
provider: anthropic
model: claude-sonnet-4-20250514
With no folio.yaml, Folio uses these defaults: output goes to ./library,
images render at 150 DPI, and analysis runs a single Anthropic-backed pass if
ANTHROPIC_API_KEY is present.
| Environment Variable | Purpose |
|---|---|
ANTHROPIC_API_KEY |
Anthropic profile credentials |
OPENAI_API_KEY |
OpenAI profile credentials (pip install "folio-love[llm]" or source pip install -e ".[llm]") |
GEMINI_API_KEY |
Google Gemini profile credentials (pip install "folio-love[llm]" or source pip install -e ".[llm]") |
How It Works
Input (.pptx/.ppt/.pdf)
│
├─ Normalize ──→ Convert to PDF
│ LibreOffice (headless) or PowerPoint on macOS
│ PowerPoint path: Launch Services open + AppleScript export
│ PDF input: direct copy + warning heuristics
│
├─ Images ─────→ Extract slide images, detect blank slides
│
├─ Text ───────→ Extract structured text per slide, reconcile count
│
├─ Analysis ───→ Route-based LLM classification + evidence extraction (cached)
│ Optional transient fallback to backup profiles
│ Pass 2: selective re-analysis of dense slides
│
├─ Tracking ───→ Version detection, per-slide change diffing
│
└─ Assembly ───→ YAML frontmatter + Markdown output (atomic write)
Each stage is independent and testable. LLM analysis results are cached per-slide -- re-conversion only re-analyzes changed slides. Blank slides are detected via image histogram analysis and excluded from deep analysis.
Version Tracking
Re-converting an updated deck increments the version and records which slides were added, modified, or removed.
folio convert deck.pptx --note "Updated risk figures"
✓ deck.pptx
24 slides → library/deck/deck.md
Version: 2 | ID: evidence_20260306_deck
Modified: slides 3, 7, 12
Added: slides 24
Use folio status to find stale decks -- where the source file has changed since the last conversion.
Version history is recorded in both the markdown output and version_history.json:
| Version | Date | Changes | Note |
|---|---|---|---|
| v2 | 2026-03-06 | 3 modified, 1 added | Updated risk figures |
| v1 | 2026-03-01 | Initial (23 slides) | -- |
Development
pip install -e ".[dev]"
.venv/bin/python -m build
.venv/bin/python -m twine check dist/*
.venv/bin/python -m pytest
.venv/bin/python -m pytest --cov=folio
folio/
├── cli.py # Click CLI (convert, batch, status)
├── config.py # FolioConfig + folio.yaml loading
├── converter.py # Pipeline orchestrator
├── pipeline/
│ ├── normalize.py # PPTX/PPT → PDF
│ ├── images.py # PDF → slide images + blank detection
│ ├── text.py # Structured text extraction + reconciliation
│ └── analysis.py # LLM analysis + caching
├── output/
│ ├── frontmatter.py # YAML frontmatter (v2 schema)
│ └── markdown.py # Markdown assembly
└── tracking/
├── sources.py # Source file tracking + staleness
└── versions.py # Version detection + change sets
Validation
Folio has been validated against a 50-deck corpus of real consulting presentations
(Tier 1 gate: 50/50 automated PPTX conversion, zero silent failures).
Validation reports, session logs, and chat logs are preserved on the
docs-internal branch.
Roadmap
Search and retrieval (folio search) is planned but not yet implemented. Today, converted decks are searchable via Obsidian, grep, or any tool that reads Markdown + YAML frontmatter.
License
Apache 2.0
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file folio_love-0.1.0.tar.gz.
File metadata
- Download URL: folio_love-0.1.0.tar.gz
- Upload date:
- Size: 133.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3c661773499b01629ce3c2cf335f17614d49191e1a780a3e787241007b0030f5
|
|
| MD5 |
dafdf181455a4646b59ce685eb8326f5
|
|
| BLAKE2b-256 |
a1646a2f0c7e4a3ba8c7ee16a5a4f2cd161d6165de698346fe0f8fac4652f3e5
|
File details
Details for the file folio_love-0.1.0-py3-none-any.whl.
File metadata
- Download URL: folio_love-0.1.0-py3-none-any.whl
- Upload date:
- Size: 140.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
37e3cff247d63dc1c66d06fbda69e873a3d286d5a756e92fcddc3c44d6cac5c3
|
|
| MD5 |
684e5f0d5692d79ff996d7f36724bc24
|
|
| BLAKE2b-256 |
ad5f0199c901da20f6aa886a410be15f2fe346819eeb3f0ff7ca6ade7bef1cca
|