Document intelligence CLI - Transform documents into structured knowledge graphs

Project description

Kurt

AI-powered writing assistant for B2B marketing and technical content

Kurt helps B2B marketers and content teams create accurate, grounded content using AI. It works with Claude Code or Cursor to produce blog posts, product pages, documentation, positioning docs, and more—all backed by your source material and guided by customizable templates.

What Kurt Does
Who It's For
Prerequisites
Quick Start
Key Features
How It Works
CLI Reference
Documentation
For Developers
Telemetry
License
Support

What Kurt Does

📝 Template-Driven Writing: 22 built-in templates for common B2B content (blog posts, product pages, docs, positioning, campaign briefs, etc.)
🔍 Source-Grounded: Fetches content from your website, docs, or CMS to use as factual grounding
🎯 Content Discovery: Analyzes your content to find topics, technologies, and coverage gaps
🔬 Research Integration: Search Reddit, HackerNews, or query Perplexity for competitive intelligence
📤 CMS Publishing: Publish directly to Sanity (more CMSes coming soon)

Who It's For

B2B Marketers creating product pages, blog posts, and campaign materials
Content Teams managing documentation, tutorials, and guides
Product Marketers writing positioning docs, launch plans, and messaging frameworks
Developer Advocates creating technical content and integration guides

Prerequisites

Before getting started, you'll need:

Required

OpenAI API Key - Used for:
- Content indexing and metadata extraction
- Topic/technology discovery
- Gap analysis features
- Get your API key from: https://platform.openai.com/api-keys

Optional

Firecrawl API Key - For advanced web scraping:
- Handles JavaScript and dynamic content
- Best for modern SPAs and interactive sites
- Get your API key from: https://firecrawl.dev
- Alternative: Kurt falls back to Trafilatura (free, fast, but no JS rendering)

Quick Start

Install Kurt

Install Kurt CLI:

# Using uv (recommended)
uv tool install kurt-core

# Or using pip
pip install kurt-core

Initialize your project:
```
cd your-project-directory
kurt init  # Installs both Claude Code and Cursor support by default
```
This creates:
- .kurt/ directory with SQLite database
- .claude/ directory with Claude Code instructions
- .cursor/ directory with Cursor rules
- kurt/ directory with all 22 content templates (shared by both IDEs)
- .env.example with API key placeholders
Note: You can also install for a specific IDE only:
- kurt init --ide claude - Claude Code only
- kurt init --ide cursor - Cursor only
Configure API keys:
```
cp .env.example .env
# Edit .env and add your API keys
```
Required:
- OPENAI_API_KEY - For content indexing and metadata extraction
Optional:
- FIRECRAWL_API_KEY - For web scraping with JavaScript support (falls back to Trafilatura)

Use with Your IDE

Start creating content:

With Claude Code:
- Open your project in Claude Code
- Claude automatically loads Kurt's instructions from .claude/
- Ask Claude: "Create a blog post project about [topic]"
- Claude will guide you through template selection, source gathering, and writing
- See .claude/CLAUDE.md for full workflow details
With Cursor:
- Open your project in Cursor
- Cursor automatically loads Kurt's rules from .cursor/rules/
- Mention @add-profile to create your content profile
- Mention @add-project to start a new writing project
- See .cursor/rules/kurt-main.mdc for full workflow details
Switch between IDEs anytime - both share the same database and templates!

Use Kurt CLI Standalone

For developers or those who want to use Kurt without an AI editor:

# Initialize project
kurt init

# Fetch content from a website
kurt content map url https://example.com          # Discover URLs
kurt content fetch --url-prefix https://example.com/  # Download content

# List and search content
kurt content list
kurt content search "topic keyword"

# Discover topics and gaps
kurt content list-entities topic
kurt content list-entities technology

# Research
kurt integrations research search "market research question"

See CLI Reference below for full command documentation.

Key Features

✨ Content Templates

Kurt includes 22 templates for common B2B content types:

Internal Strategy:

Positioning + Messaging
ICP Segmentation
Persona Segmentation
Campaign Brief
Launch Plan

Public Marketing Content:

Blog Posts (Thought Leadership)
Product Pages
Solution Pages
Homepage
Integration Pages

Documentation:

Tutorials & Guides
API Documentation
Technical Documentation

Email & Social:

Marketing Emails
Drip Email Sequences
Product Update Newsletters
Social Media Posts

Specialized:

Video Scripts
Podcast Interview Plans

All templates are customizable and include:

Style guidelines (tone, voice, examples)
Source requirements (what content to gather)
Structure templates (format and organization)
Research workflows (how to find information)

See templates in src/kurt/claude_plugin/kurt/templates/

🌐 Content Ingestion

Fetch content from web sources to use as grounding material:

Configuration (edit kurt.config in your project root):

# Scraping engine - choose based on your content source
# Options:
#   - trafilatura (default): Fast, free, static HTML only
#   - firecrawl: Handles JavaScript/SPAs (requires FIRECRAWL_API_KEY in .env)
#   - httpx: Proxy-friendly alternative to trafilatura
INGESTION_FETCH_ENGINE="trafilatura"

# LLM models for content analysis (format: provider/model-name)
# Alternatives: "anthropic/claude-3-haiku", "google/gemini-1.5-flash", "groq/llama-3.1-8b"
INDEXING_LLM_MODEL="openai/gpt-4o-mini"              # For metadata extraction
EMBEDDING_MODEL="openai/text-embedding-3-small"      # For embeddings

API Keys (add to .env file):

OPENAI_API_KEY=your_key_here           # Required for OpenAI models
FIRECRAWL_API_KEY=your_key_here        # Optional, for Firecrawl scraping
ANTHROPIC_API_KEY=your_key_here        # Optional, for Claude models
GOOGLE_API_KEY=your_key_here           # Optional, for Gemini models

# Map sitemap to discover URLs (fast, no downloads, no LLM calls)
kurt content map url https://docs.example.com

# Fetch specific content: get content + extract metadata with LLM calls
kurt content fetch --url-prefix https://docs.example.com/guides/

# Fetch by URL pattern
kurt content fetch --url-contains /blog/

# Fetch all discovered URLs
kurt content fetch --all

Content is stored as markdown in sources/{domain}/{path}/ with metadata in SQLite.

🔍 Content Discovery & Gap Analysis

Kurt indexes your content to help you find gaps and plan new content:

# See all topics covered in your content
kurt content list-entities topic

# See all technologies documented
kurt content list-entities technology

# Find all docs about a specific entity
kurt content list --with-entity "Topic:authentication"
kurt content list --with-entity "Technology:Python"

# Find docs with specific relationships
kurt content list --with-relationship integrates_with

# Search for content
kurt content search "API integration"

# Filter by content type
kurt content list --with-content-type tutorial

This powers gap analysis workflows where you can:

Compare your content vs competitors' coverage
Identify topics with low documentation
Find technologies that need more examples
Plan tutorial topics based on what's missing

🔬 Research Integration

Built-in research capabilities for competitive intelligence and market research:

# Query Perplexity for research
kurt integrations research search "B2B SaaS pricing trends 2024"

Requires API keys (configured in .env). See CLAUDE.md for setup.

📤 Publishing

Publish directly to your CMS:

# Configure Sanity CMS
kurt integrations cms onboard --platform sanity

# Publish content
kurt integrations cms publish --file content.md --content-type blog-post

Currently supports Sanity. More CMSes coming soon.

How It Works

Kurt follows a 3-step content creation process:

1. Project Planning

Create a project for your content initiative
Select format templates (blog post, product page, etc.)
Gather sources (fetch web content, research competitors, collect docs)
Optional: Conduct research using integrated tools

2. Writing

AI (Claude) drafts content using your templates and sources
All claims are grounded in source material (no hallucinations)
Content follows your company's style guidelines
Outline → Draft → Edit workflow

3. Publishing

Review and refine content
Publish to CMS or export as markdown
Track sources and maintain traceability

All work is organized in /projects/{project-name}/ directories with a plan.md tracking progress.

CLI Reference

Project Setup

# Initialize new Kurt project (installs both Claude Code and Cursor support by default)
kurt init

# Or install for a specific IDE only
kurt init --ide claude   # Claude Code only
kurt init --ide cursor   # Cursor only

# Initialize with custom database path
kurt init --db-path data/my-project.db

# What gets created by default:
# - .kurt/ directory with SQLite database
# - .claude/ directory with Claude Code instructions
# - .cursor/ directory with Cursor rules
# - kurt/ directory with 22 content templates (shared by both)
# - .env.example with API key placeholders
#
# Both IDEs share the same database and templates!

Content Ingestion

Map-Then-Fetch Workflow (recommended):

# 1. Discover URLs from sitemap (fast, creates NOT_FETCHED records)
kurt content map url https://example.com

# 2. Review discovered URLs
kurt content list --status NOT_FETCHED

# 3. Fetch content (batch or selective)
kurt content fetch --url-prefix https://example.com/     # All from domain
kurt content fetch --url-contains /blog/                 # URLs containing pattern
kurt content fetch --all                                 # All NOT_FETCHED docs
kurt content fetch https://example.com/page              # Single URL

# Options
kurt content fetch --url-prefix https://example.com/ --max-concurrent 10  # Parallel downloads
kurt content fetch --url-prefix https://example.com/ --status ERROR       # Retry failed

Direct Fetch:

# Fetch single URL directly (auto-creates document if doesn't exist)
kurt content fetch https://example.com/page

Content Discovery

# List all content
kurt content list
kurt content list --status FETCHED --limit 20

# Get specific document
kurt content get <document-id>

# Search content
kurt content search "keyword"

# Discover topics and technologies
kurt content list-entities topic
kurt content list-entities technology
kurt content list-entities topic --min-docs 5            # Only topics in 5+ docs
kurt content list-entities topic --include "*/docs/*"    # Filter by path

# Filter by metadata
kurt content list --with-content-type tutorial
kurt content list --in-cluster "Tutorials"

# Statistics
kurt content stats

Content Indexing

# Index content to extract metadata (topics, technologies, content types)
kurt content index --all

# Index specific documents
kurt content index --url-prefix https://example.com/

# Re-index (if content changed)
kurt content index --force

Research

# Search using Perplexity
kurt integrations research search "your research question"

# Monitor Reddit discussions
kurt integrations research reddit -s dataengineering --timeframe day
kurt integrations research reddit -s "datascience+machinelearning" --keywords "api,tools"

# Monitor HackerNews
kurt integrations research hackernews --timeframe day
kurt integrations research hackernews --keywords "API,developer tools" --min-score 50

CMS Integration

# Configure CMS
kurt integrations cms onboard --platform sanity

# Publish content
kurt integrations cms publish --file content.md --content-type blog-post

Analytics Integration

# Configure analytics (PostHog)
kurt integrations analytics onboard your-domain.com --platform posthog

# Sync analytics data
kurt integrations analytics sync your-domain.com

# View content with analytics
kurt content list --with-analytics

Advanced Features

Content Clustering:

# Organize documents into topic clusters
kurt content cluster

# List all clusters
kurt content list-clusters

Document Links:

# Show links from/to a document
kurt content links <document-id>

Metadata Sync:

# Update file frontmatter from database
kurt content sync-metadata

Delete Content:

# Delete documents
kurt content delete <document-id>
kurt content delete --url-prefix https://example.com/

Background Workflows

# List background workflows
kurt workflows list

# Check workflow status
kurt workflows status <workflow-id>

# Follow workflow progress
kurt workflows follow <workflow-id>

# Cancel a workflow
kurt workflows cancel <workflow-id>

Administrative Commands

# Check project status
kurt status

# Manage telemetry
kurt admin telemetry status
kurt admin telemetry disable
kurt admin telemetry enable

# Database migrations
kurt admin migrate upgrade
kurt admin migrate downgrade

Documentation

CLAUDE.md: Complete guide to using Kurt with Claude Code
INDEXING-AND-SEARCH.md: Content indexing and discovery features
Template Documentation: All 22 content templates
CLI Reference: Detailed CLI command documentation

For Developers

Installation for Development

# Clone repository
git clone https://github.com/yourusername/kurt-core.git
cd kurt-core

# Install with uv (recommended)
uv sync

# Or with pip
pip install -e .

Running Tests

# Install test dependencies
uv sync --extra eval

# Run evaluation scenarios
uv run kurt-eval list
uv run kurt-eval run 01_basic_init
uv run kurt-eval run-all

Kurt-Eval

Test framework for validating Kurt's AI agent behavior using Claude:

# Configure
cp eval/.env.example eval/.env
# Add your ANTHROPIC_API_KEY to eval/.env

# List test scenarios
uv run kurt-eval list

# Run specific scenario
uv run kurt-eval run 01_basic_init

# Run all scenarios
uv run kurt-eval run-all

# View results
cat eval/results/01_basic_init_*.json

Available test scenarios:

01_basic_init - Initialize a Kurt project
02_add_url - Initialize and add content from a URL
03_interactive_project - Multi-turn conversation with user agent
04_with_claude_plugin - Test with Claude plugin integration

See eval/scenarios/ for scenario definitions.

Architecture

Content Storage:

Metadata stored in SQLite (Document table)
Content stored as markdown files in sources/{domain}/{path}/
Metadata extracted with Trafilatura and LLM-based indexing

Database Schema:

CREATE TABLE documents (
    id TEXT PRIMARY KEY,              -- UUID
    title TEXT NOT NULL,
    source_type TEXT,                 -- URL, FILE_UPLOAD, API
    source_url TEXT UNIQUE,
    content_path TEXT,                -- Relative path to markdown file
    ingestion_status TEXT,            -- NOT_FETCHED, FETCHED, ERROR
    content_hash TEXT,                -- Trafilatura fingerprint
    description TEXT,
    author JSON,
    published_date DATETIME,
    categories JSON,
    language TEXT,

    -- Indexed metadata (from LLM)
    content_type TEXT,                -- tutorial, guide, blog, reference, etc.
    primary_topics JSON,              -- List of topics
    tools_technologies JSON,          -- List of tools/technologies
    has_code_examples BOOLEAN,
    has_step_by_step_procedures BOOLEAN,
    has_narrative_structure BOOLEAN,
    indexed_with_hash TEXT,
    indexed_with_git_commit TEXT,

    created_at DATETIME,
    updated_at DATETIME
);

Batch Fetching:

Uses httpx with async/await for parallel downloads
Semaphore-based concurrency control (default: 5 concurrent)
Graceful error handling (continues on individual failures)

Contributing

Contributions welcome! Please:

Fork the repository
Create a feature branch
Add tests for new features
Submit a pull request

Telemetry

Kurt collects anonymous usage analytics to help us improve the tool. We take privacy seriously.

What We Collect

Command usage (e.g., kurt content list)
Execution metrics (timing, success/failure rates)
Environment (OS, Python version, Kurt version)
Anonymous machine ID (UUID, not tied to personal info)

What We DON'T Collect

Personal information (names, emails)
File paths or URLs
Command arguments or user data
Any sensitive information

How to Opt-Out

# Use the CLI command
kurt admin telemetry disable

# Or set environment variable
export DO_NOT_TRACK=1
export KURT_TELEMETRY_DISABLED=1

# Check status
kurt admin telemetry status

All telemetry is:

Anonymous: No personal information collected
Transparent: Clearly documented what we collect
Optional: Easy to opt-out
Non-blocking: Never slows down CLI commands
Secure: Uses PostHog cloud (SOC 2 compliant)

License

MIT

Support

Issues: GitHub Issues
Discussions: GitHub Discussions
Documentation: See CLAUDE.md for full usage guide

Project details

Release history Release notifications | RSS feed

0.3.0

Dec 5, 2025

This version

0.2.8

Nov 20, 2025

0.2.7

Nov 14, 2025

0.2.6

Nov 14, 2025

0.2.5

Nov 14, 2025

0.2.4

Nov 14, 2025

0.2.2

Nov 14, 2025

0.2.1

Nov 14, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kurt_core-0.2.8.tar.gz (1.0 MB view details)

Uploaded Nov 20, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

kurt_core-0.2.8-py3-none-any.whl (756.7 kB view details)

Uploaded Nov 20, 2025 Python 3

File details

Details for the file kurt_core-0.2.8.tar.gz.

File metadata

Download URL: kurt_core-0.2.8.tar.gz
Upload date: Nov 20, 2025
Size: 1.0 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: uv/0.9.10 {"installer":{"name":"uv","version":"0.9.10"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for kurt_core-0.2.8.tar.gz
Algorithm	Hash digest
SHA256	`d9ea32cc87c859e8bef4cb43d14278fa8d9e178e53c61177058d19d785ee67de`
MD5	`a06ebd0d5397db37e90bdc7922343e04`
BLAKE2b-256	`9cb09de4b8bbf1da8330480637a5081da5f851e03eb3795c0b133bbda060c5dc`

See more details on using hashes here.

File details

Details for the file kurt_core-0.2.8-py3-none-any.whl.

File metadata

Download URL: kurt_core-0.2.8-py3-none-any.whl
Upload date: Nov 20, 2025
Size: 756.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: uv/0.9.10 {"installer":{"name":"uv","version":"0.9.10"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for kurt_core-0.2.8-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ab716cc8974e28adc61f941ff16948ab1db4d7a37e8282181f428f1899331869`
MD5	`378b2604e69a97779bf12fb41637e292`
BLAKE2b-256	`516fd55ef93b16c71f8f1734e5029d7b4332725a4b1ca2b3c663d84c1f92e24f`

See more details on using hashes here.

kurt-core 0.2.8

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

Kurt

Table of Contents

What Kurt Does

Who It's For

Prerequisites

Required

Optional

Quick Start

Install Kurt

Use with Your IDE

Use Kurt CLI Standalone

Key Features

✨ Content Templates

🌐 Content Ingestion

🔍 Content Discovery & Gap Analysis

🔬 Research Integration

📤 Publishing

How It Works

1. Project Planning

2. Writing

3. Publishing

CLI Reference

Project Setup

Content Ingestion

Content Discovery

Content Indexing

Research

CMS Integration

Analytics Integration

Advanced Features

Background Workflows

Administrative Commands

Documentation

For Developers

Installation for Development

Running Tests

Kurt-Eval

Architecture

Contributing

Telemetry

What We Collect

What We DON'T Collect

How to Opt-Out

License

Support

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes