Python SDK for the Discovery Engine API
Project description
Discovery Engine Python SDK
Find novel, statistically validated patterns in tabular data — feature interactions, subgroup effects, and conditional relationships that correlation analysis and LLMs miss.
Installation
pip install discovery-engine-api
For pandas DataFrame support:
pip install discovery-engine-api[pandas]
Quick Start
from discovery import Engine
engine = Engine(api_key="disco_...")
result = await engine.discover(
file="data.csv",
target_column="outcome",
)
for pattern in result.patterns:
if pattern.p_value < 0.05 and pattern.novelty_type == "novel":
print(f"{pattern.description} (p={pattern.p_value:.4f})")
print(f"Full report: {result.report_url}")
Get your API key from the Developers page.
Parameters
await engine.discover(
file: str | Path | pd.DataFrame, # Dataset to analyze
target_column: str, # Column to predict/analyze
depth_iterations: int = 1, # 1=fast, higher=deeper search
visibility: str = "public", # "public" (free) or "private" (credits)
title: str | None = None, # Dataset title
description: str | None = None, # Dataset description
column_descriptions: dict[str, str] | None = None, # Improves pattern explanations
excluded_columns: list[str] | None = None, # Columns to exclude (e.g., IDs)
timeout: float = 1800, # Max seconds to wait
)
Tip: Providing
column_descriptionssignificantly improves pattern explanations. If your columns have non-obvious names, always describe them.
Depth and visibility: Public runs are always
depth_iterations=1regardless of settings. To usedepth_iterations > 1, setvisibility="private". Private runs consume credits based on file size × depth.
Examples
Working with Pandas DataFrames
import pandas as pd
from discovery import Engine
df = pd.read_csv("data.csv")
engine = Engine(api_key="disco_...")
result = await engine.discover(
file=df,
target_column="outcome",
column_descriptions={
"age": "Patient age in years",
"bmi": "Body mass index",
},
excluded_columns=["patient_id", "timestamp"],
)
Inspecting Columns Before Running
If you need to see the dataset's columns before choosing a target column, upload first and inspect:
# Upload once and get the server's parsed column list
upload = await engine.upload_file(file="data.csv", title="My dataset")
print(upload["columns"]) # [{"name": "col1", "type": "continuous", ...}, ...]
print(upload["rowCount"]) # e.g., 5000
# Pass the result to avoid re-uploading
result = await engine.run_async(
file="data.csv",
target_column="col1",
wait=True,
upload_result=upload, # skips the upload step
)
Running in the Background
Runs take 3–15 minutes. If you need to do other work while Discovery Engine runs:
import asyncio
from discovery import Engine
async def main():
async with Engine(api_key="disco_...") as engine:
# Submit without waiting
run = await engine.run_async(
file="data.csv",
target_column="outcome",
wait=False,
)
print(f"Submitted run {run.run_id}, continuing...")
# ... do other work ...
# Check back later
result = await engine.wait_for_completion(run.run_id, timeout=1800)
return result
result = asyncio.run(main())
Synchronous Usage
For scripts and Jupyter notebooks:
from discovery import Engine
engine = Engine(api_key="disco_...")
result = engine.run(
file="data.csv",
target_column="outcome",
wait=True,
)
For Jupyter notebooks, install the jupyter extra for engine.run() compatibility:
pip install discovery-engine-api[jupyter]
Or use await engine.discover(...) / await engine.run_async(...) directly in async notebook cells.
Working with Results
# Filter for significant novel patterns
novel = [p for p in result.patterns
if p.p_value < 0.05 and p.novelty_type == "novel"]
# Get patterns that increase the target
increasing = [p for p in result.patterns if p.target_change_direction == "max"]
# Inspect conditions
for pattern in result.patterns:
for cond in pattern.conditions:
print(f" {cond['feature']}: {cond}")
# Feature importance
if result.feature_importance:
top = sorted(result.feature_importance.scores,
key=lambda s: abs(s.score), reverse=True)
# Share the interactive report
print(f"Explore: {result.report_url}")
Credits and Pricing
- Public runs: Free. Results published to public gallery. Locked to depth=1.
- Private runs: 1 credit per MB per depth iteration. $1.00 per credit.
- Formula:
credits = max(1, ceil(file_size_mb * depth_iterations))
# Estimate cost before running
estimate = await engine.estimate(
file_size_mb=10.5,
num_columns=25,
depth_iterations=2,
visibility="private",
)
# estimate["cost"]["credits"] -> 21
# estimate["cost"]["free_alternative"] -> True
# estimate["account"]["sufficient"] -> True/False
Manage credits and plans at disco.leap-labs.com/account.
File Size Limits
Uploads up to 5 GB. Files are uploaded directly to cloud storage using presigned URLs.
Supported formats: CSV, TSV, Excel (.xlsx), JSON, Parquet, ARFF, Feather.
Return Value
EngineResult
@dataclass
class EngineResult:
run_id: str
status: str # "pending", "processing", "completed", "failed"
summary: Summary | None # LLM-generated insights
patterns: list[Pattern] # Discovered patterns (the core output)
columns: list[Column] # Feature info and statistics
feature_importance: FeatureImportance | None # Global importance scores
correlation_matrix: list[CorrelationEntry] # Feature correlations
report_url: str | None # Shareable link to interactive web report
task: str | None # "regression", "binary_classification", "multiclass_classification"
total_rows: int | None
error_message: str | None
Pattern
@dataclass
class Pattern:
id: str
description: str # Human-readable description
conditions: list[dict] # Conditions defining the pattern
p_value: float # FDR-adjusted p-value
p_value_raw: float | None # Raw p-value before adjustment
novelty_type: str # "novel" or "confirmatory"
novelty_explanation: str # Why this is novel or confirmatory
citations: list[dict] # Academic citations
target_change_direction: str # "max" (increases target) or "min" (decreases)
abs_target_change: float # Magnitude of effect
support_count: int # Rows matching this pattern
support_percentage: float # Percentage of dataset
target_mean: float | None # For regression tasks
target_std: float | None
Pattern Conditions
Each condition in pattern.conditions is a dict with a type field:
Continuous condition — a numeric range:
{
"type": "continuous",
"feature": "age",
"min_value": 45.0,
"max_value": 65.0,
"min_q": 0.35, # quantile of min_value
"max_q": 0.72 # quantile of max_value
}
Categorical condition — a set of values:
{
"type": "categorical",
"feature": "region",
"values": ["north", "east"]
}
Datetime condition — a time range:
{
"type": "datetime",
"feature": "date",
"min_value": 1609459200000, # epoch ms
"max_value": 1640995200000,
"min_datetime": "2021-01-01", # human-readable
"max_datetime": "2022-01-01"
}
Summary
@dataclass
class Summary:
overview: str # High-level summary of findings
key_insights: list[str] # Main takeaways
novel_patterns: PatternGroup # Novel pattern IDs and explanation
Column
@dataclass
class Column:
id: str
name: str
display_name: str
type: str # "continuous" or "categorical"
data_type: str # "int", "float", "string", "boolean", "datetime"
enabled: bool
description: str | None
mean: float | None
median: float | None
std: float | None
min: float | None
max: float | None
feature_importance_score: float | None # Signed importance score
FeatureImportance
Computed using Hierarchical Perturbation (HiPe), an ablation-based method. Scores are signed — positive means the feature increases the prediction, negative means it decreases it.
@dataclass
class FeatureImportance:
kind: str # "global"
baseline: float # Baseline model output
scores: list[FeatureImportanceScore]
@dataclass
class FeatureImportanceScore:
feature: str
score: float # Signed importance score
Error Handling
from discovery import (
Engine,
AuthenticationError,
InsufficientCreditsError,
RateLimitError,
RunFailedError,
PaymentRequiredError,
)
try:
result = await engine.discover(file="data.csv", target_column="target")
except AuthenticationError as e:
print(e.suggestion) # "Check your API key at https://disco.leap-labs.com/developers"
except InsufficientCreditsError as e:
print(f"Need {e.credits_required}, have {e.credits_available}")
print(e.suggestion) # "Purchase credits or run publicly for free"
except RateLimitError as e:
print(f"Retry after {e.retry_after} seconds")
except RunFailedError as e:
print(f"Run {e.run_id} failed: {e}")
except TimeoutError:
pass # Retrieve later with engine.wait_for_completion(run_id)
All errors include a suggestion field with actionable instructions.
MCP Server
Discovery Engine is available as an MCP server with tools for the full discovery lifecycle — estimate, analyze, check status, get results, manage account.
{
"mcpServers": {
"discovery-engine": {
"url": "https://disco.leap-labs.com/mcp",
"env": { "DISCOVERY_API_KEY": "disco_..." }
}
}
}
Links
- PyPI: discovery-engine-api
- API keys: disco.leap-labs.com/developers
- LLM-friendly docs: disco.leap-labs.com/llms-full.txt
- MCP manifest: disco.leap-labs.com/.well-known/mcp.json
- Credits & billing: disco.leap-labs.com/account
- Public reports: disco.leap-labs.com/discover
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file discovery_engine_api-0.2.75.tar.gz.
File metadata
- Download URL: discovery_engine_api-0.2.75.tar.gz
- Upload date:
- Size: 17.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
43f7dbd01bd26d69fd3f7b27f37eb0fa7a0debccae74f506727ddf6db388b048
|
|
| MD5 |
d8fca965afa853f7750e03ba1a0b2e65
|
|
| BLAKE2b-256 |
b87e0da542c4af15f63ad169a63f77f814a27ce43b598c8902ae954da23698a5
|
File details
Details for the file discovery_engine_api-0.2.75-py3-none-any.whl.
File metadata
- Download URL: discovery_engine_api-0.2.75-py3-none-any.whl
- Upload date:
- Size: 21.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7b032c5149464f78715c18112b26a505c1603e2e37702cbe2c689f5f25829f72
|
|
| MD5 |
c5463c6ee0835836c20ae73337380a0a
|
|
| BLAKE2b-256 |
2506e9214edb5c7389e0787879d5529cbee96db3a73e7c43cc68de5dbae10d64
|