Skip to main content

AI-powered taxonomy generation for your data

Project description

Delve: AI-Powered Taxonomy Generation

Delve is a production-ready SDK and CLI for automatically generating taxonomies from your data using state-of-the-art language models.

📚 Read the full documentation →

Quick Start

Installation

pip install delve-taxonomy

# Set API keys
export ANTHROPIC_API_KEY="your-key-here"
export OPENAI_API_KEY="your-key-here"  # Required for classifier embeddings

CLI

# Basic usage (shows progress spinners)
delve run data.csv --text-column text

# With progress bars and ETA
delve run data.csv --text-column text -v

# Quiet mode (errors only)
delve run data.csv --text-column text -q

# JSON with nested data
delve run data.json --json-path "$.messages[*].content"

Python SDK

from delve import Delve, Verbosity

# Initialize client (silent by default - library best practice)
delve = Delve()

# Or with progress output
delve = Delve(verbosity=Verbosity.NORMAL)

# Run taxonomy generation
result = delve.run_sync("data.csv", text_column="text")

# Access results
print(f"Generated {len(result.taxonomy)} categories")
for category in result.taxonomy:
    print(f"  - {category.name}: {category.description}")

# Access labeled documents
for doc in result.labeled_documents[:5]:
    print(f"  [{doc.category}] {doc.content[:50]}...")

Binary Detection (Single Category)

For fast filtering when you know the category you're looking for:

from delve import Delve

# Find all refund-related documents (~$1-2 for 30K docs, runs in minutes)
result = Delve.find_matches(
    "data.csv",
    category={
        "name": "Refund Request",
        "description": "User asking for refund or money back",
        "keywords": ["refund", "money back", "cancel"],
    },
    text_column="text",
    threshold=0.6,
)

print(f"Found {result.stats['matches']} matches")
for doc in result.matched_documents[:5]:
    print(f"  [{doc.confidence:.2f}] {doc.content[:50]}...")

Features

  • Automated Taxonomy Generation - No manual category creation using Claude 3.5 Sonnet
  • Binary Detection - Fast, cheap single-category filtering with find_matches()
  • Multiple Data Sources - CSV, JSON/JSONL, LangSmith runs, pandas DataFrames
  • Smart Categorization - Iterative refinement with minibatch clustering
  • Flexible Exports - JSON, CSV, and Markdown reports

Requirements

  • Python 3.9+
  • Anthropic API key (for taxonomy generation)
  • OpenAI API key (for classifier embeddings when sample_size > 0)

Documentation

Development

# Install dependencies
uv sync

# Run tests
pytest tests/

# Run linting
ruff check src/

# Format code
ruff format src/

Documentation Development

To work on the documentation locally, you'll need Node.js 20.17+ (for Mintlify):

# If using nvm, the project includes .nvmrc
nvm use

# Install Mintlify CLI (if not already installed)
npm install -g mintlify

# Run the docs server
cd docs
mintlify dev

See the full documentation for more details on contributing and development.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

delve_taxonomy-0.1.12.tar.gz (52.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

delve_taxonomy-0.1.12-py3-none-any.whl (64.9 kB view details)

Uploaded Python 3

File details

Details for the file delve_taxonomy-0.1.12.tar.gz.

File metadata

  • Download URL: delve_taxonomy-0.1.12.tar.gz
  • Upload date:
  • Size: 52.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for delve_taxonomy-0.1.12.tar.gz
Algorithm Hash digest
SHA256 49ccc470d6b4c5a77488b5c22b0a5d0aa79fc6807129c742ce079ab8ef5e6d07
MD5 751258f2072e68eec434e65f65f37436
BLAKE2b-256 fc0a1e613ab61832b11d27964e8245790689c22ca325fa869c1463d603edf558

See more details on using hashes here.

File details

Details for the file delve_taxonomy-0.1.12-py3-none-any.whl.

File metadata

File hashes

Hashes for delve_taxonomy-0.1.12-py3-none-any.whl
Algorithm Hash digest
SHA256 873f50a93a1e26ede94bcd44f535be23563fda36f926a29df2c0376efd2b4751
MD5 99dd036324454c99bd156d171bfb4e83
BLAKE2b-256 a513d7d13474eccfba9beade730bd42e9f2f1af3189972016ad76da0f1c81f3b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page