Extract structured knowledge from conversation transcripts
Project description
minutes
Distill any conversation into structured knowledge.
Local-first CLI for extracting decisions, ideas, questions, action items, concepts, and key terms from any conversation transcript. Works with Claude Code sessions, meeting transcripts, plain text, and markdown. Runs entirely offline with a local LLM—nothing leaves your machine.
Quick Start
# Using uv (recommended)
uv tool install take-minutes
# Or with pip
pip install take-minutes
# Run it
minutes process my-session.jsonl
See output in ./output/:
- Markdown file with structured knowledge
- SQLite index for future searches
- Optionally: semantic embeddings for cross-session discovery
What It Extracts
| Category | Description | Example |
|---|---|---|
| Decisions | What was decided and why | "Use SQLite instead of PostgreSQL for MVP (reason: no schema migrations needed)" |
| Ideas | Concepts, suggestions, opportunities | "Implement quiet hours to prevent 2am notifications" |
| Questions | Open issues needing resolution | "What's the deployment target—Raspberry Pi or cloud?" |
| Action Items | Tasks assigned with owners | "Write health endpoint monitor (owner: ops-pragmatist, deadline: Phase 1)" |
| Concepts | Key technical or business ideas | "3-tiered autonomy model: tier 1 (audit), tier 2 (approval), tier 3 (confirmation)" |
| Terms | Abbreviations, jargon, domain terms | "EDA = Event-Driven Architecture" |
Prerequisites
minutes uses a local LLM for extraction. You need an OpenAI-compatible inference endpoint:
- Ollama: ollama.com — simple local LLM runner
- LM Studio: lmstudio.ai — GUI-based local inference
- vLLM: docs.vllm.ai — high-throughput serving engine
- OpenAI API or any OpenAI-compatible provider (set
GATEWAY_URLenv var)
For best results: use a 4B–7B model (e.g., Qwen 2.5 7B, Llama 3 8B).
Installation
Core (extraction only)
# Using uv (recommended)
uv tool install take-minutes
# Or with pip
pip install take-minutes
With semantic search
# Using uv
uv tool install "take-minutes[search]"
# Or with pip
pip install "take-minutes[search]"
One-line setup (downloads embedding model)
minutes setup
The setup command pre-downloads the embedding model (~420MB) so you aren't surprised by a mid-run download.
Usage
Process a single file
# Extract from Claude Code session
minutes process session.jsonl
# Extract from meeting transcript
minutes process meeting.txt -o ./my-minutes
# Skip deduplication check (force reprocess)
minutes process session.jsonl --no-dedup
# Verbose output for debugging
minutes process session.jsonl -v
Output:
output/YYYY-MM-DD-HH-MM-SS.md— structured knowledge in markdownoutput/minutes.db— SQLite index with full-text searchoutput/sessions.json— metadata log for easy inspection
Batch process Claude Code sessions
Scan ~/.claude/projects/ and extract from all main-thread sessions:
# Process sessions from last 2 weeks, sorted by date (newest first)
minutes batch
# Filter by project key (substring match)
minutes batch --project persistence
# Change time range (ISO date or relative: 7d, 2w, 1m)
minutes batch --since 2w --sort size
# Dry run: show what would be processed
minutes batch --dry-run --min-size 100KB
# Skip embedding generation
minutes batch --no-embed
Output:
~/.claude/minutes/{project_key}/— project-specific minutes~/.claude/minutes/{project_key}/minutes.db— indexed extractions
Search across all processed sessions
# Keyword search
minutes search "budget decision"
# Filter by category (decision, idea, question, action_item, concept, term)
minutes search "authentication" --category decision
# Vector (semantic) search
minutes search "how do we handle failures?" --mode vector
# Hybrid (keyword + vector) search
minutes search "persistence strategy" --mode hybrid --limit 5
# Search specific project
minutes search "deployment" --project persistence
Returns ranked results with scores, context, and source session.
View configuration
minutes config
minutes config --env
Shows active settings: gateway URL, model, output directory, chunking parameters.
Supported Formats
- Claude Code JSONL (native) —
~/.claude/projects/*/session.jsonl - Plain text / Markdown — conversation transcripts, meeting notes
- Coming soon: ChatGPT export, Codex CLI, Cline, Cursor
Auto-detection: process command detects format by file extension or content. Override with --format:
minutes process transcript.srt --format text
Configuration
Create a .env file in your working directory:
# LLM backend
GATEWAY_URL=http://localhost:8800/v1
GATEWAY_MODEL=qwen3-4b
# Output
OUTPUT_DIR=./minutes-output/
# Chunking (for long transcripts)
MAX_CHUNK_SIZE=12000 # tokens per chunk
CHUNK_OVERLAP=200 # token overlap between chunks
# Retry logic
MAX_RETRIES=3
# Glossary (optional YAML file for cross-referencing)
GLOSSARY_PATH=./glossary.yaml
# Prompts (optional file paths for custom extraction prompts)
SYSTEM_PROMPT=./prompts/system.txt
EXTRACTION_PROMPT=./prompts/extraction.txt
# Debug
VERBOSE=true
View active config:
minutes config
Architecture
Input (JSONL / TXT / MD)
↓
Parser (extract messages/text)
↓
Chunker (split into LLM-friendly chunks)
↓
LLM Extraction (local model extracts structured data)
↓
Deduplication (fuzzy match across chunks)
↓
SQLite Index (FTS5 full-text search)
↓
Embeddings (sentence-transformers, optional)
↓
Semantic Search (FAISS + embeddings)
How It Works
- Parse: Read input file (JSONL, plaintext, markdown) and extract dialogue or transcript text
- Chunk: Split text into overlapping chunks to fit LLM context window (~12K tokens by default)
- Extract: Send each chunk to local LLM with structured schema; collect decisions, ideas, questions, action items, concepts, terms
- Deduplicate: Fuzzy-match extracted items across chunks; keep unique items
- Index: Store in SQLite with FTS5 indexing for keyword search; optionally generate embeddings for semantic/hybrid search
Output is a single markdown file with all extractions, plus a queryable SQLite database.
Example Workflow
Scenario: Strategic planning sessions
-
Extract from session:
minutes process 2026-02-22-strategy.jsonl -o ./strategy-minutes
-
Review markdown:
cat ./strategy-minutes/2026-02-22-07-32-38.md -
Cross-reference with glossary (if provided):
# Your GLOSSARY_PATH contains definitions for SRE, NIST AI RMF, EU AI Act # Output markdown shows which extracted concepts are in glossary vs unknown
-
Batch process all recent sessions:
minutes batch --since 1m --project strategy --sort date
-
Search for related decisions across all sessions:
minutes search "idempotency" --category decision --mode hybrid
Requirements
- Python 3.10+
- Local LLM running on
GATEWAY_URL(default:http://localhost:8800/v1) - Optional: Sentence Transformers for semantic search (
pip install "take-minutes[search]")
Tips
- Cold start: First extraction takes longer due to model loading. Subsequent runs are faster.
- Large transcripts: Automatically chunks long inputs; adjust
MAX_CHUNK_SIZEif needed. - Private data: All processing is local; nothing sent to external APIs (unless you configure
GATEWAY_URLto an external provider). - Incremental indexing: Reprocessing same file is skipped unless you use
--no-dedup. - Searching: Use
--mode hybridfor best results (combines keyword + semantic search).
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file take_minutes-0.2.0.tar.gz.
File metadata
- Download URL: take_minutes-0.2.0.tar.gz
- Upload date:
- Size: 42.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
35052b6dd8fc3e41e05fbe6dcc65371e6d9f4129d5248b07dde5b62e1cc6ba1d
|
|
| MD5 |
fa1e0117f7a847e9b27ba230448d51b2
|
|
| BLAKE2b-256 |
28d53060c3e859324670e144a0c4c6d54df040fb1425a5bb0aefec7bb72cf89b
|
Provenance
The following attestation bundles were made for take_minutes-0.2.0.tar.gz:
Publisher:
publish.yml on danieliser/take-minutes
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
take_minutes-0.2.0.tar.gz -
Subject digest:
35052b6dd8fc3e41e05fbe6dcc65371e6d9f4129d5248b07dde5b62e1cc6ba1d - Sigstore transparency entry: 984986545
- Sigstore integration time:
-
Permalink:
danieliser/take-minutes@5dcec9c9a5ddaf89ffaa2b90f34dc8b700f9a072 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/danieliser
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@5dcec9c9a5ddaf89ffaa2b90f34dc8b700f9a072 -
Trigger Event:
push
-
Statement type:
File details
Details for the file take_minutes-0.2.0-py3-none-any.whl.
File metadata
- Download URL: take_minutes-0.2.0-py3-none-any.whl
- Upload date:
- Size: 33.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
90a1a6c6e2e8bab5c15de23910e9657348f4aa02e10bb0f488b58c3258ff25eb
|
|
| MD5 |
9247466a71b808910a8bff2a471f2770
|
|
| BLAKE2b-256 |
192641c86e53cebf85b66b4f0347ea1209df5ef4015854a7ec7c23fc6ef917e1
|
Provenance
The following attestation bundles were made for take_minutes-0.2.0-py3-none-any.whl:
Publisher:
publish.yml on danieliser/take-minutes
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
take_minutes-0.2.0-py3-none-any.whl -
Subject digest:
90a1a6c6e2e8bab5c15de23910e9657348f4aa02e10bb0f488b58c3258ff25eb - Sigstore transparency entry: 984986547
- Sigstore integration time:
-
Permalink:
danieliser/take-minutes@5dcec9c9a5ddaf89ffaa2b90f34dc8b700f9a072 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/danieliser
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@5dcec9c9a5ddaf89ffaa2b90f34dc8b700f9a072 -
Trigger Event:
push
-
Statement type: