Skip to main content

Your consulting portfolio, searchable and AI-ready.

Project description

folio

Your consulting portfolio, searchable and AI-ready.

Python 3.10+ License: Apache 2.0

What It Does

Folio converts PPTX, PPT, and PDF presentations into structured Markdown with YAML frontmatter, slide images, and optional LLM-powered analysis. Every conversion preserves three layers: verbatim text, slide images at configurable DPI, and per-slide classification with evidence grounding.

Folio tracks versions automatically -- re-converting an updated deck increments the version, detects per-slide changes, and preserves history. Open library/ as an Obsidian vault for automatic frontmatter indexing.

Install

pip install folio-love

The CLI command is folio.

For agent-friendly setup (Cursor, Claude Code), see Agentic Setup.

Or install from source:

git clone https://github.com/ohjonathan/folio.love.git
cd folio.love
pip install -e .

Anthropic support is included by default. For OpenAI or Google Gemini, install with extras:

pip install "folio-love[llm]"        # from PyPI
pip install -e ".[llm]"              # from source

Prerequisites

  • Python 3.10+
  • LibreOffice or Microsoft PowerPoint (for PPTX/PPT conversion)
  • Poppler (for PDF image extraction)
# macOS
brew install --cask libreoffice
brew install poppler

# Ubuntu/Debian
sudo apt install libreoffice poppler-utils
Managed macOS (no LibreOffice)

If your machine blocks LibreOffice, Folio can use Microsoft PowerPoint as the renderer. Set pptx_renderer: powerpoint in folio.yaml, run batch jobs from Terminal.app, and keep a dedicated PowerPoint session with no unrelated presentations open. See Managed Mac workflow for the full workflow.

If neither renderer is available, export the deck to PDF manually and run folio convert deck.pdf.

Quick Start

First conversion

folio convert deck.pptx
✓ deck.pptx
  24 slides → library/deck/deck.md
  Version: 1 | ID: evidence_20260306_deck

With LLM analysis

export ANTHROPIC_API_KEY=sk-ant-...
folio convert deck.pptx --passes 2

Without a valid API key, analysis is skipped gracefully -- the conversion still completes.

Commands

folio convert

Convert a single deck to Folio markdown.

# Basic
folio convert deck.pptx

# With client and engagement metadata
folio convert deck.pptx --client Acme --engagement "DD Q1 2026"

# Deep analysis (two-pass, selective re-analysis of dense slides)
folio convert deck.pptx --passes 2

# Force fresh analysis, ignore cache
folio convert deck.pptx --no-cache

# Full metadata
folio convert deck.pptx \
  --client Acme \
  --engagement "DD Q1 2026" \
  --subtype research \
  --industry "retail,ecommerce" \
  --tags "market-sizing,tam" \
  --note "Updated risk figures"

Flags

Flag Description
--client Client name (used in output path and frontmatter)
--engagement Engagement identifier
--note, -n Version note (e.g. "Updated per client feedback")
--target, -t Override output directory
--passes, -p Analysis depth: 1 = standard, 2 = deep (selective second pass on dense slides)
--no-cache Force re-analysis; fresh results replace cached entries
--subtype Evidence subtype: research, data_extract, external_report, benchmark
--industry Industry tags, comma-separated
--tags Manual tags to merge with auto-generated, comma-separated
--llm-profile Override the configured LLM profile for this run

folio batch

Batch convert all matching files in a directory.

# Convert all PPTX files in a directory
folio batch ./materials --client Acme

# Convert PDFs instead
folio batch ./pdfs --pattern "*.pdf" --client Acme

# Disable PowerPoint restart automation
folio batch ./materials --no-dedicated-session

Accepts the same flags as convert (--client, --engagement, --passes, --llm-profile, etc.). Default pattern is *.pptx. On macOS with PowerPoint, --dedicated-session (the default) enables periodic restart during long batch runs.

folio status

Show library health -- which decks are current, stale, or missing their source file.

folio status
folio status Acme        # scope to a client
folio status --refresh   # re-check source hashes

Stale means the source file changed since the last conversion -- re-run folio convert on it. Missing means the source file can no longer be found at the original path.

folio scan

Scan configured source roots for new, stale, or missing files.

folio scan
folio scan --scope ClientA

Requires sources entries in folio.yaml (see Configuration).

folio refresh

Re-convert stale decks in the library.

folio refresh
folio refresh --scope ClientA/DD_Q1_2026
folio refresh --all     # re-convert everything in scope, not just stale

folio promote

Promote a deck's curation level (L0 → L1 → L2 → L3).

folio promote <deck_id> L1

Validates required metadata per level (e.g. L1 requires client and tags). Use folio status to find deck IDs.

Global flags: --verbose / -v (debug logging), --config / -c (path to folio.yaml)

Output Structure

library/
└── Acme/
    └── dd_q1_2026/
        └── market_overview/
            ├── market_overview.md        # Full markdown with frontmatter
            ├── slides/
            │   ├── slide-001.png
            │   ├── slide-002.png
            │   └── ...
            ├── .analysis_cache.json      # LLM response cache
            ├── .texts_cache.json         # Text extraction cache
            └── version_history.json      # Full version log

Example output (condensed):

---
id: acme_dd_q1_2026_evidence_20260306_market_overview
title: Market Overview
type: evidence
subtype: research
status: active
client: Acme
engagement: DD Q1 2026
version: 2
tags:
- ecommerce
- market-sizing
---

# Market Overview

**Source:** `/materials/market_overview.pptx`
**Version:** 2 | **Converted:** 2026-03-06

---

## Slide 1

![Slide 1](slides/slide-001.png)

### Text (Verbatim)

> Total Addressable Market: $4.2B
> Source: Industry Report 2025

### Analysis

**Slide Type:** data_heavy
**Framework:** TAM/SAM/SOM
**Key Data:** TAM $4.2B, SAM $1.8B, SOM $340M

**Evidence:**
- **TAM figure of $4.2B (high):** "Total Addressable Market: $4.2B" *(title)*

---

Configuration

Folio looks for folio.yaml by walking up from the current directory. All fields are optional.

# folio.yaml
library_root: ./library              # Where converted decks are written

sources:                             # Optional; organize source directories
  - name: materials
    path: /path/to/source/decks
    target_prefix: ""

llm:
  profiles:
    high_quality_anthropic:
      provider: anthropic
      model: claude-sonnet-4-20250514
      api_key_env: ANTHROPIC_API_KEY
      base_url_env: ANTHROPIC_BASE_URL   # Optional enterprise gateway

    fast_openai:
      provider: openai
      model: gpt-4o-mini
      api_key_env: OPENAI_API_KEY
      base_url_env: OPENAI_BASE_URL      # Optional enterprise gateway

    backup_google:
      provider: google
      model: gemini-2.5-pro
      api_key_env: GEMINI_API_KEY
      base_url_env: GEMINI_BASE_URL      # Optional enterprise gateway

  routing:
    default:
      primary: high_quality_anthropic
      fallbacks: []
    convert:
      primary: high_quality_anthropic
      fallbacks: [backup_google]

conversion:
  image_dpi: 150                     # Slide image resolution (px/in)
  image_format: png
  libreoffice_timeout: 60            # Seconds before conversion times out
  default_passes: 1                  # 1 = standard, 2 = deep
  density_threshold: 2.0             # Pass 2 density trigger
  pptx_renderer: auto                # auto | libreoffice | powerpoint

With no folio.yaml, Folio uses sensible defaults: output goes to ./library, images render at 150 DPI, and analysis runs a single Anthropic-backed pass if ANTHROPIC_API_KEY is set.

Environment Variable Purpose
ANTHROPIC_API_KEY Anthropic credentials (included in base install)
OPENAI_API_KEY OpenAI credentials (requires folio-love[llm])
GEMINI_API_KEY Google Gemini credentials (requires folio-love[llm])
ANTHROPIC_BASE_URL Optional Anthropic-compatible gateway URL
OPENAI_BASE_URL Optional OpenAI-compatible gateway URL
GEMINI_BASE_URL Optional Gemini-compatible gateway URL

Enterprise Gateways and Preflight Warnings

If you route Folio through an enterprise AI gateway, keep the gateway URL in an environment variable and reference it from the profile with base_url_env. If the env var is unset or blank, Folio silently falls back to the SDK default endpoint.

Folio now runs a warning-only model preflight once per selected profile per conversion run. This checks whether the configured model appears usable before the first expensive pass. The probe is bounded and uses the same runtime guardrails as normal model calls. A warning does not block conversion; it simply surfaces blocked or unavailable models earlier.

Scanned and Image-Only PDFs

When a deck has no extractable text, Folio marks that text validation was unavailable instead of treating the deck as if evidence validation failed. Those decks still surface review flags, but they no longer get the old blanket 0.59 confidence cap just because the source is scanned.

Oversized PDF Page Fallback

Large architecture diagrams and poster-sized PDF pages can exceed Pillow safety limits at the requested DPI. Folio now backs off DPI per page before hitting that limit. If a page still cannot be rendered safely, conversion fails with a specific oversized-image error instead of a generic rendering failure.

OpenAI GPT-5 Compatibility

GPT-5 OpenAI chat models use a slightly different request shape from GPT-4.x and GPT-4o. Folio handles that automatically by using max_completion_tokens and omitting temperature for gpt-5* models while preserving the existing request shape for non-GPT-5 models.

How It Works

Input (.pptx/.ppt/.pdf)
  │
  ├─ Normalize ──→ Convert to PDF via LibreOffice or PowerPoint
  │
  ├─ Images ─────→ Extract slide images, detect blank slides
  │
  ├─ Text ───────→ Extract structured text per slide, reconcile count
  │
  ├─ Analysis ───→ LLM classification + evidence extraction (cached)
  │                 Pass 2: selective re-analysis of dense slides
  │
  ├─ Tracking ───→ Version detection, per-slide change diffing
  │
  └─ Assembly ───→ YAML frontmatter + Markdown output (atomic write)

Each stage is independent and testable. LLM analysis results are cached per-slide -- re-conversion only re-analyzes changed slides. Blank slides are detected via image histogram analysis and excluded from deep analysis.

Version Tracking

Re-converting an updated deck increments the version and records which slides were added, modified, or removed.

folio convert deck.pptx --note "Updated risk figures"
✓ deck.pptx
  24 slides → library/deck/deck.md
  Version: 2 | ID: evidence_20260306_deck
  Modified: slides 3, 7, 12
  Added: slides 24

Use folio status to find stale decks -- where the source file has changed since the last conversion.

Version history is recorded in both the markdown output and version_history.json:

Version Date Changes Note
v2 2026-03-06 3 modified, 1 added Updated risk figures
v1 2026-03-01 Initial (23 slides) --

Development

python3 -m venv .venv
.venv/bin/python -m pip install --upgrade pip
.venv/bin/python -m pip install -e ".[dev]"
.venv/bin/python -m pytest tests/ -v
.venv/bin/python -m pytest --cov=folio

The test suite depends on dev-only packages such as python-pptx and reportlab, so run it from the project virtualenv after installing .[dev] rather than from an arbitrary system Python.

folio/
├── cli.py              # Click CLI (convert, batch, status, scan, refresh, promote)
├── config.py           # FolioConfig + folio.yaml loading
├── converter.py        # Pipeline orchestrator
├── pipeline/
│   ├── normalize.py    # PPTX/PPT → PDF
│   ├── images.py       # PDF → slide images + blank detection
│   ├── text.py         # Structured text extraction + reconciliation
│   └── analysis.py     # LLM analysis + caching
├── output/
│   ├── frontmatter.py  # YAML frontmatter (v2 schema)
│   └── markdown.py     # Markdown assembly
└── tracking/
    ├── sources.py      # Source file tracking + staleness
    └── versions.py     # Version detection + change sets

Roadmap

Search and retrieval (folio search) is planned but not yet implemented. Today, converted decks are searchable via Obsidian, grep, or any tool that reads Markdown + YAML frontmatter.

License

Apache 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

folio_love-0.2.0.tar.gz (142.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

folio_love-0.2.0-py3-none-any.whl (149.5 kB view details)

Uploaded Python 3

File details

Details for the file folio_love-0.2.0.tar.gz.

File metadata

  • Download URL: folio_love-0.2.0.tar.gz
  • Upload date:
  • Size: 142.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for folio_love-0.2.0.tar.gz
Algorithm Hash digest
SHA256 0e6520d57ec47d0b84073d6038df19a7b8ec3887347f326becbe3db2eadabde6
MD5 3105bcc1d1d8c528849cfb5d8c036545
BLAKE2b-256 e6df51d525bf6a27582b2bbb4bbb5a8f88000b6f4cada527c95cdc2f7de42b42

See more details on using hashes here.

Provenance

The following attestation bundles were made for folio_love-0.2.0.tar.gz:

Publisher: publish.yml on ohjonathan/folio.love

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file folio_love-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: folio_love-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 149.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for folio_love-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d4087df6415344c0aca19719025c07a8d780008954e785797702e8bb6afc4d3a
MD5 b0430ad9d17dec81f592041e4fab7c9b
BLAKE2b-256 447ad94f259e1efc5a5764c300c03155c167205b55864205d5a3107ed1e50f1e

See more details on using hashes here.

Provenance

The following attestation bundles were made for folio_love-0.2.0-py3-none-any.whl:

Publisher: publish.yml on ohjonathan/folio.love

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page