Skip to main content

Your consulting portfolio, searchable and AI-ready.

Project description

folio

Your consulting portfolio, searchable and AI-ready.

Python 3.10+ License: Apache 2.0

What It Does

Folio converts PPTX, PPT, and PDF presentations into structured Markdown with YAML frontmatter, slide images, and optional LLM-powered analysis. Every conversion preserves three layers: verbatim text, slide images at configurable DPI, and per-slide classification with evidence grounding.

Folio tracks versions automatically -- re-converting an updated deck increments the version, detects per-slide changes, and preserves history. Open library/ as an Obsidian vault for automatic frontmatter indexing.

Install

pip install folio-love

The CLI command is folio.

For agent-friendly setup (Cursor, Claude Code), see Agentic Setup.

Or install from source:

git clone https://github.com/ohjonathan/folio.love.git
cd folio.love
pip install -e .

Anthropic support is included by default. For OpenAI or Google Gemini, install with extras:

pip install "folio-love[llm]"        # from PyPI
pip install -e ".[llm]"              # from source

Prerequisites

  • Python 3.10+
  • LibreOffice or Microsoft PowerPoint (for PPTX/PPT conversion)
  • Poppler (for PDF image extraction)
# macOS
brew install --cask libreoffice
brew install poppler

# Ubuntu/Debian
sudo apt install libreoffice poppler-utils
Managed macOS (no LibreOffice)

If your machine blocks LibreOffice, Folio can use Microsoft PowerPoint as the renderer. Set pptx_renderer: powerpoint in folio.yaml, run batch jobs from Terminal.app, and keep a dedicated PowerPoint session with no unrelated presentations open. See Managed Mac workflow for the full workflow.

If neither renderer is available, export the deck to PDF manually and run folio convert deck.pdf.

Quick Start

First conversion

folio convert deck.pptx
✓ deck.pptx
  24 slides → library/deck/deck.md
  Version: 1 | ID: evidence_20260306_deck

With LLM analysis

export ANTHROPIC_API_KEY=sk-ant-...
folio convert deck.pptx --passes 2

Without a valid API key, analysis is skipped gracefully -- the conversion still completes.

Commands

folio convert

Convert a single deck to Folio markdown.

# Basic
folio convert deck.pptx

# With client and engagement metadata
folio convert deck.pptx --client Acme --engagement "DD Q1 2026"

# Deep analysis (two-pass, selective re-analysis of dense slides)
folio convert deck.pptx --passes 2

# Force fresh analysis, ignore cache
folio convert deck.pptx --no-cache

# Full metadata
folio convert deck.pptx \
  --client Acme \
  --engagement "DD Q1 2026" \
  --subtype research \
  --industry "retail,ecommerce" \
  --tags "market-sizing,tam" \
  --note "Updated risk figures"

Flags

Flag Description
--client Client name (used in output path and frontmatter)
--engagement Engagement identifier
--note, -n Version note (e.g. "Updated per client feedback")
--target, -t Override output directory
--passes, -p Analysis depth: 1 = standard, 2 = deep (selective second pass on dense slides)
--no-cache Force re-analysis; fresh results replace cached entries
--subtype Evidence subtype: research, data_extract, external_report, benchmark
--industry Industry tags, comma-separated
--tags Manual tags to merge with auto-generated, comma-separated
--llm-profile Override the configured LLM profile for this run

folio batch

Batch convert all matching files in a directory.

# Convert all PPTX files in a directory
folio batch ./materials --client Acme

# Convert PDFs instead
folio batch ./pdfs --pattern "*.pdf" --client Acme

# Disable PowerPoint restart automation
folio batch ./materials --no-dedicated-session

Accepts the same flags as convert (--client, --engagement, --passes, --llm-profile, etc.). Default pattern is *.pptx. On macOS with PowerPoint, --dedicated-session (the default) enables periodic restart during long batch runs.

folio status

Show library health -- which decks are current, stale, or missing their source file. When an entity registry exists, status also reports the total entity count.

folio status
folio status Acme        # scope to a client
folio status --refresh   # re-check source hashes

Stale means the source file changed since the last conversion -- re-run folio convert on it. Missing means the source file can no longer be found at the original path.

folio scan

Scan configured source roots for new, stale, or missing files.

folio scan
folio scan --scope ClientA

Requires sources entries in folio.yaml (see Configuration).

folio refresh

Re-convert stale decks in the library.

folio refresh
folio refresh --scope ClientA/DD_Q1_2026
folio refresh --all     # re-convert everything in scope, not just stale

folio promote

Promote a deck's curation level (L0 → L1 → L2 → L3).

folio promote <deck_id> L1

Validates required metadata per level (e.g. L1 requires client and tags). Use folio status to find deck IDs.

folio entities

Manage the entity registry -- people, departments, systems, and processes mentioned across your library.

# List all entities, grouped by type
folio entities

# Filter by type or show only unconfirmed
folio entities --type person
folio entities --unconfirmed

# JSON export
folio entities --json

# Show detail for a specific entity
folio entities show "Alice Chen"
folio entities show "Engineering" --type department

# Import from a CSV org chart
folio entities import org_chart.csv

# Confirm or reject auto-extracted entities
folio entities confirm "Jane Smith"
folio entities reject "Jnae Smith"   # typo cleanup

CSV import format -- a name column is required; all others are optional:

Column Description
name Person's canonical name (required)
title Job title
department Department name (auto-created if new)
reports_to Manager name (resolved to registry key)
aliases Semicolon-separated alternate names
client Associated client

The entity registry is stored as entities.json alongside registry.json in the library root. Imported entities are confirmed automatically; entities extracted during future ingest passes will be marked as needing confirmation.

Global flags: --verbose / -v (debug logging), --config / -c (path to folio.yaml)

Output Structure

library/
└── Acme/
    └── dd_q1_2026/
        └── market_overview/
            ├── market_overview.md        # Full markdown with frontmatter
            ├── slides/
            │   ├── slide-001.png
            │   ├── slide-002.png
            │   └── ...
            ├── .analysis_cache.json      # LLM response cache
            ├── .texts_cache.json         # Text extraction cache
            └── version_history.json      # Full version log

Example output (condensed):

---
id: acme_dd_q1_2026_evidence_20260306_market_overview
title: Market Overview
type: evidence
subtype: research
status: active
client: Acme
engagement: DD Q1 2026
version: 2
tags:
- ecommerce
- market-sizing
---

# Market Overview

**Source:** `/materials/market_overview.pptx`
**Version:** 2 | **Converted:** 2026-03-06

---

## Slide 1

![Slide 1](slides/slide-001.png)

### Text (Verbatim)

> Total Addressable Market: $4.2B
> Source: Industry Report 2025

### Analysis

**Slide Type:** data_heavy
**Framework:** TAM/SAM/SOM
**Key Data:** TAM $4.2B, SAM $1.8B, SOM $340M

**Evidence:**
- **TAM figure of $4.2B (high):** "Total Addressable Market: $4.2B" *(title)*

---

Configuration

Folio looks for folio.yaml by walking up from the current directory. All fields are optional.

# folio.yaml
library_root: ./library              # Where converted decks are written

sources:                             # Optional; organize source directories
  - name: materials
    path: /path/to/source/decks
    target_prefix: ""

llm:
  profiles:
    high_quality_anthropic:
      provider: anthropic
      model: claude-sonnet-4-20250514
      api_key_env: ANTHROPIC_API_KEY
      base_url_env: ANTHROPIC_BASE_URL   # Optional enterprise gateway

    fast_openai:
      provider: openai
      model: gpt-4o-mini
      api_key_env: OPENAI_API_KEY
      base_url_env: OPENAI_BASE_URL      # Optional enterprise gateway

    backup_google:
      provider: google
      model: gemini-2.5-pro
      api_key_env: GEMINI_API_KEY
      base_url_env: GEMINI_BASE_URL      # Optional enterprise gateway

  routing:
    default:
      primary: high_quality_anthropic
      fallbacks: []
    convert:
      primary: high_quality_anthropic
      fallbacks: [backup_google]

conversion:
  image_dpi: 150                     # Slide image resolution (px/in)
  image_format: png
  libreoffice_timeout: 60            # Seconds before conversion times out
  default_passes: 1                  # 1 = standard, 2 = deep
  density_threshold: 2.0             # Pass 2 density trigger
  pptx_renderer: auto                # auto | libreoffice | powerpoint

With no folio.yaml, Folio uses sensible defaults: output goes to ./library, images render at 150 DPI, and analysis runs a single Anthropic-backed pass if ANTHROPIC_API_KEY is set.

Environment Variable Purpose
ANTHROPIC_API_KEY Anthropic credentials (included in base install)
OPENAI_API_KEY OpenAI credentials (requires folio-love[llm])
GEMINI_API_KEY Google Gemini credentials (requires folio-love[llm])
ANTHROPIC_BASE_URL Optional Anthropic-compatible gateway URL
OPENAI_BASE_URL Optional OpenAI-compatible gateway URL
GEMINI_BASE_URL Optional Gemini-compatible gateway URL

Enterprise Gateways and Preflight Warnings

If you route Folio through an enterprise AI gateway, keep the gateway URL in an environment variable and reference it from the profile with base_url_env. If the env var is unset or blank, Folio silently falls back to the SDK default endpoint.

Folio now runs a warning-only model preflight once per selected profile per conversion run. This checks whether the configured model appears usable before the first expensive pass. The probe is bounded and uses the same runtime guardrails as normal model calls. A warning does not block conversion; it simply surfaces blocked or unavailable models earlier.

Scanned and Image-Only PDFs

When a deck has no extractable text, Folio marks that text validation was unavailable instead of treating the deck as if evidence validation failed. Those decks still surface review flags, but they no longer get the old blanket 0.59 confidence cap just because the source is scanned.

Oversized PDF Page Fallback

Large architecture diagrams and poster-sized PDF pages can exceed Pillow safety limits at the requested DPI. Folio now backs off DPI per page before hitting that limit. If a page still cannot be rendered safely, conversion fails with a specific oversized-image error instead of a generic rendering failure.

OpenAI GPT-5 Compatibility

GPT-5 OpenAI chat models use a slightly different request shape from GPT-4.x and GPT-4o. Folio handles that automatically by using max_completion_tokens and omitting temperature for gpt-5* models while preserving the existing request shape for non-GPT-5 models.

How It Works

Input (.pptx/.ppt/.pdf)
  │
  ├─ Normalize ──→ Convert to PDF via LibreOffice or PowerPoint
  │
  ├─ Images ─────→ Extract slide images, detect blank slides
  │
  ├─ Text ───────→ Extract structured text per slide, reconcile count
  │
  ├─ Analysis ───→ LLM classification + evidence extraction (cached)
  │                 Pass 2: selective re-analysis of dense slides
  │
  ├─ Tracking ───→ Version detection, per-slide change diffing
  │
  └─ Assembly ───→ YAML frontmatter + Markdown output (atomic write)

Each stage is independent and testable. LLM analysis results are cached per-slide -- re-conversion only re-analyzes changed slides. Blank slides are detected via image histogram analysis and excluded from deep analysis.

Version Tracking

Re-converting an updated deck increments the version and records which slides were added, modified, or removed.

folio convert deck.pptx --note "Updated risk figures"
✓ deck.pptx
  24 slides → library/deck/deck.md
  Version: 2 | ID: evidence_20260306_deck
  Modified: slides 3, 7, 12
  Added: slides 24

Use folio status to find stale decks -- where the source file has changed since the last conversion.

Version history is recorded in both the markdown output and version_history.json:

Version Date Changes Note
v2 2026-03-06 3 modified, 1 added Updated risk figures
v1 2026-03-01 Initial (23 slides) --

Development

python3 -m venv .venv
.venv/bin/python -m pip install --upgrade pip
.venv/bin/python -m pip install -e ".[dev]"
.venv/bin/python -m pytest tests/ -v
.venv/bin/python -m pytest --cov=folio

The test suite depends on dev-only packages such as python-pptx and reportlab, so run it from the project virtualenv after installing .[dev] rather than from an arbitrary system Python.

folio/
├── cli.py              # Click CLI (convert, batch, status, scan, refresh, promote, entities)
├── config.py           # FolioConfig + folio.yaml loading
├── converter.py        # Pipeline orchestrator
├── entity_import.py    # CSV org chart → entity registry
├── pipeline/
│   ├── normalize.py    # PPTX/PPT → PDF
│   ├── images.py       # PDF → slide images + blank detection
│   ├── text.py         # Structured text extraction + reconciliation
│   └── analysis.py     # LLM analysis + caching
├── output/
│   ├── frontmatter.py  # YAML frontmatter (v2 schema)
│   └── markdown.py     # Markdown assembly
└── tracking/
    ├── entities.py     # Entity registry (people, departments, systems, processes)
    ├── registry.py     # Deck registry + atomic JSON writes
    ├── sources.py      # Source file tracking + staleness
    └── versions.py     # Version detection + change sets

Roadmap

  • Entity resolution at ingest time -- automatically match names mentioned in slides to the entity registry (PR B, planned).
  • Search and retrieval (folio search) -- not yet implemented. Today, converted decks are searchable via Obsidian, grep, or any tool that reads Markdown + YAML frontmatter.

License

Apache 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

folio_love-0.4.0.tar.gz (225.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

folio_love-0.4.0-py3-none-any.whl (241.2 kB view details)

Uploaded Python 3

File details

Details for the file folio_love-0.4.0.tar.gz.

File metadata

  • Download URL: folio_love-0.4.0.tar.gz
  • Upload date:
  • Size: 225.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for folio_love-0.4.0.tar.gz
Algorithm Hash digest
SHA256 79d8985cab7cbf84d30da49f4cfa25d64931385599b2aec4ff94c5e96d55109a
MD5 d2c557224341ec81ff6edeacc3827ebf
BLAKE2b-256 e5cdb82c0b669a139161ec7576bd9a74ead990c6836e033b89754a1b50c1e0fc

See more details on using hashes here.

Provenance

The following attestation bundles were made for folio_love-0.4.0.tar.gz:

Publisher: publish.yml on ohjonathan/folio.love

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file folio_love-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: folio_love-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 241.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for folio_love-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 94361773211c2e358f78e84c59ced480dafd1ae31b30f3e2e0bd011e8685ad6d
MD5 ccfcc9094986c36f7700e986a803b116
BLAKE2b-256 15a7705505ac47da475d2e3e8e924316563d00828dd6bac1d37512a004b3b179

See more details on using hashes here.

Provenance

The following attestation bundles were made for folio_love-0.4.0-py3-none-any.whl:

Publisher: publish.yml on ohjonathan/folio.love

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page