AI-powered taxonomy generation for your data
Project description
Delve: AI-Powered Taxonomy Generation
Delve is a production-ready SDK and CLI for automatically generating taxonomies from your data using state-of-the-art language models.
📚 Read the full documentation →
Quick Start
Installation
pip install delve-taxonomy
# Set API keys
export ANTHROPIC_API_KEY="your-key-here"
export OPENAI_API_KEY="your-key-here" # Required for classifier embeddings
CLI
# Basic usage (shows progress spinners)
delve run data.csv --text-column text
# With progress bars and ETA
delve run data.csv --text-column text -v
# Quiet mode (errors only)
delve run data.csv --text-column text -q
# JSON with nested data
delve run data.json --json-path "$.messages[*].content"
Python SDK
from delve import Delve, Verbosity
# Initialize client (silent by default - library best practice)
delve = Delve()
# Or with progress output
delve = Delve(verbosity=Verbosity.NORMAL)
# Run taxonomy generation
result = delve.run_sync("data.csv", text_column="text")
# Access results
print(f"Generated {len(result.taxonomy)} categories")
for category in result.taxonomy:
print(f" - {category.name}: {category.description}")
# Access labeled documents
for doc in result.labeled_documents[:5]:
print(f" [{doc.category}] {doc.content[:50]}...")
Binary Detection (Single Category)
For fast filtering when you know the category you're looking for:
from delve import Delve
# Find all refund-related documents (~$1-2 for 30K docs, runs in minutes)
result = Delve.find_matches(
"data.csv",
category={
"name": "Refund Request",
"description": "User asking for refund or money back",
"keywords": ["refund", "money back", "cancel"],
},
text_column="text",
threshold=0.6,
)
print(f"Found {result.stats['matches']} matches")
for doc in result.matched_documents[:5]:
print(f" [{doc.confidence:.2f}] {doc.content[:50]}...")
Features
- Automated Taxonomy Generation - No manual category creation using Claude 3.5 Sonnet
- Binary Detection - Fast, cheap single-category filtering with
find_matches() - Multiple Data Sources - CSV, JSON/JSONL, LangSmith runs, pandas DataFrames
- Smart Categorization - Iterative refinement with minibatch clustering
- Flexible Exports - JSON, CSV, and Markdown reports
Requirements
- Python 3.9+
- Anthropic API key (for taxonomy generation)
- OpenAI API key (for classifier embeddings when sample_size > 0)
Documentation
Development
# Install dependencies
uv sync
# Run tests
pytest tests/
# Run linting
ruff check src/
# Format code
ruff format src/
Documentation Development
To work on the documentation locally, you'll need Node.js 20.17+ (for Mintlify):
# If using nvm, the project includes .nvmrc
nvm use
# Install Mintlify CLI (if not already installed)
npm install -g mintlify
# Run the docs server
cd docs
mintlify dev
See the full documentation for more details on contributing and development.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file delve_taxonomy-0.1.12.tar.gz.
File metadata
- Download URL: delve_taxonomy-0.1.12.tar.gz
- Upload date:
- Size: 52.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
49ccc470d6b4c5a77488b5c22b0a5d0aa79fc6807129c742ce079ab8ef5e6d07
|
|
| MD5 |
751258f2072e68eec434e65f65f37436
|
|
| BLAKE2b-256 |
fc0a1e613ab61832b11d27964e8245790689c22ca325fa869c1463d603edf558
|
File details
Details for the file delve_taxonomy-0.1.12-py3-none-any.whl.
File metadata
- Download URL: delve_taxonomy-0.1.12-py3-none-any.whl
- Upload date:
- Size: 64.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
873f50a93a1e26ede94bcd44f535be23563fda36f926a29df2c0376efd2b4751
|
|
| MD5 |
99dd036324454c99bd156d171bfb4e83
|
|
| BLAKE2b-256 |
a513d7d13474eccfba9beade730bd42e9f2f1af3189972016ad76da0f1c81f3b
|