Python SDK for the Discovery Engine API
Project description
Discovery Engine Python API
The Discovery Engine Python API provides a simple programmatic interface to run analyses via Python, offering an alternative to using the web dashboard. Instead of uploading datasets and configuring analyses through the UI, you can automate your discovery workflows directly from your Python code or scripts.
All analyses run through the API are fully integrated with your Discovery Engine account. Results are automatically displayed in the dashboard, where you can view detailed reports, explore patterns, and share findings with your team. Your account management, credit balance, and subscription settings are all handled through the dashboard.
Installation
pip install discovery-engine-api
For pandas DataFrame support:
pip install discovery-engine-api[pandas]
For Jupyter notebook support:
pip install discovery-engine-api[jupyter]
This installs nest-asyncio, which is required to use engine.run() in Jupyter notebooks. Alternatively, you can use await engine.run_async() directly in Jupyter notebooks without installing the jupyter extra.
Configuration
API Keys
Get your API key from the Developers page in your Discovery Engine dashboard.
Quick Start
from discovery import Engine
# Initialize engine
engine = Engine(api_key="your-api-key")
# Run analysis on a dataset and wait for results
result = engine.run(
file="data.csv",
target_column="diagnosis",
mode="fast",
description="Rare diseases dataset",
wait=True # Wait for completion and return full results
)
print(f"Run ID: {result.run_id}")
print(f"Status: {result.status}")
print(f"Found {len(result.patterns)} patterns")
Examples
Working with Pandas DataFrames
import pandas as pd
from discovery import Engine
df = pd.read_csv("data.csv")
# or create DataFrame directly
engine = Engine(api_key="your-api-key")
result = engine.run(
file=df, # Pass DataFrame directly
target_column="outcome",
column_descriptions={
"age": "Patient age in years",
"heart rate": None
},
wait=True
)
Async Workflow
import asyncio
from discovery import Engine
async def run_analysis():
async with Engine(api_key="your-api-key") as engine:
# Start analysis without waiting
result = await engine.run_async(
file="data.csv",
target_column="target",
wait=False
)
print(f"Started run: {result.run_id}")
# Later, get results
result = await engine.get_results(result.run_id)
# Or wait for completion
result = await engine.wait_for_completion(result.run_id, timeout=1200)
return result
result = asyncio.run(run_analysis())
Using in Jupyter Notebooks
In Jupyter notebooks, you have two options:
Option 1: Install the jupyter extra (recommended)
pip install discovery-engine-api[jupyter]
Then use engine.run() as normal:
from discovery import Engine
engine = Engine(api_key="your-api-key")
result = engine.run(file="data.csv", target_column="target", wait=True)
Option 2: Use async directly
from discovery import Engine
engine = Engine(api_key="your-api-key")
result = await engine.run_async(file="data.csv", target_column="target", wait=True)
Configuration Options
The run() and run_async() methods accept the following parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
file |
str, Path, or DataFrame |
Required | Dataset file path or pandas DataFrame |
target_column |
str |
Required | Name of column to predict |
mode |
"fast" / "deep" |
"fast" |
Analysis depth |
title |
str |
None |
Optional dataset title |
description |
str |
None |
Optional dataset description |
column_descriptions |
Dict[str, str] |
None |
Optional column name -> description mapping |
visibility |
"public" / "private" |
"public" |
Dataset visibility (private requires credits) |
auto_report_use_llm_evals |
bool |
True |
Use LLM for pattern descriptions |
author |
str |
None |
Optional dataset author attribution |
source_url |
str |
None |
Optional source URL for dataset attribution |
wait |
bool |
False |
Wait for analysis to complete and return full results |
wait_timeout |
float |
None |
Maximum seconds to wait for completion (only if wait=True) |
Credits and Pricing
- Public datasets: Free (0 credits required)
- Private datasets:
- Fast mode: 1 credit per MB
- Deep mode: 3 credits per MB
If you don't have enough credits for a private run, the SDK will raise an httpx.HTTPStatusError with an error message like:
Insufficient credits. You need X credits but only have Y available.
Solutions:
- Make your dataset public (set
visibility="public") - completely free - Visit https://disco.leap-labs.com/account to:
- Purchase additional credits
- Upgrade to a subscription plan that includes more credits
Return Value
The run() and run_async() methods return an EngineResult object with the following fields:
EngineResult
@dataclass
class EngineResult:
# Identifiers
run_id: str # Unique run identifier
report_id: Optional[str] # Report ID (if report created)
status: str # "pending", "processing", "completed", "failed"
# Dataset metadata
dataset_title: Optional[str] # Dataset title
dataset_description: Optional[str] # Dataset description
total_rows: Optional[int] # Number of rows in dataset
target_column: Optional[str] # Name of target column
task: Optional[str] # "regression", "binary_classification", or "multiclass_classification"
# LLM-generated summary
summary: Optional[Summary] # Summary object with overview, insights, etc.
# Discovered patterns
patterns: List[Pattern] # List of discovered patterns
# Column/feature information
columns: List[Column] # List of columns with statistics and importance
# Correlation matrix
correlation_matrix: List[CorrelationEntry] # Feature correlations
# Global feature importance
feature_importance: Optional[FeatureImportance] # Feature importance scores
# Job tracking
job_id: Optional[str] # Job ID for tracking processing
job_status: Optional[str] # Job status
error_message: Optional[str] # Error message if analysis failed
Summary
@dataclass
class Summary:
overview: str # High-level explanation of findings
key_insights: List[str] # List of main takeaways
novel_patterns: PatternGroup # Novel pattern explanations
surprising_findings: PatternGroup # Surprising findings
statistically_significant: PatternGroup # Statistically significant patterns
data_insights: Optional[DataInsights] # Important features, correlations
selected_pattern_id: Optional[str] # ID of selected pattern
Pattern
@dataclass
class Pattern:
id: str # Pattern identifier
task: str # Task type
target_column: str # Target column name
direction: str # "min" or "max"
p_value: float # Statistical p-value
conditions: List[Dict] # Pattern conditions (continuous, categorical, datetime)
lift_value: float # Lift value (how much the pattern increases/decreases target)
support_count: int # Number of rows matching pattern
support_percentage: float # Percentage of rows matching pattern
pattern_type: str # "validated" or "speculative"
novelty_type: str # "novel" or "confirmatory"
target_score: float # Target score for this pattern
description: str # Human-readable description
novelty_explanation: str # Explanation of novelty
target_class: Optional[str] # Target class (for classification)
target_mean: Optional[float] # Target mean (for regression)
target_std: Optional[float] # Target standard deviation
citations: List[Dict] # Academic citations
Column
@dataclass
class Column:
id: str # Column identifier
name: str # Column name
display_name: str # Display name
type: str # "continuous" or "categorical"
data_type: str # "int", "float", "string", "boolean", "datetime"
enabled: bool # Whether column is enabled
description: Optional[str] # Column description
# Statistics
mean: Optional[float] # Mean value
median: Optional[float] # Median value
std: Optional[float] # Standard deviation
min: Optional[float] # Minimum value
max: Optional[float] # Maximum value
iqr_min: Optional[float] # IQR minimum
iqr_max: Optional[float] # IQR maximum
mode: Optional[str] # Mode value
approx_unique: Optional[int] # Approximate unique count
null_percentage: Optional[float] # Percentage of null values
# Feature importance
feature_importance_score: Optional[float] # Feature importance score
FeatureImportance
@dataclass
class FeatureImportance:
kind: str # Feature importance type: "global"
baseline: float # Baseline model output
scores: List[FeatureImportanceScore] # List of feature scores
CorrelationEntry
@dataclass
class CorrelationEntry:
feature_x: str # First feature name
feature_y: str # Second feature name
value: float # Correlation value (-1 to 1)
Pattern
@dataclass
class Pattern:
id: str
task: str
target_column: str
direction: str # "min" or "max"
p_value: float
conditions: List[Dict] # Continuous, categorical, or datetime conditions
lift_value: float
support_count: int
support_percentage: float
pattern_type: str # "validated" or "speculative"
novelty_type: str # "novel" or "confirmatory"
target_score: float
description: str
novelty_explanation: str
target_class: Optional[str]
target_mean: Optional[float]
target_std: Optional[float]
citations: List[Dict]
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file discovery_engine_api-0.1.6.tar.gz.
File metadata
- Download URL: discovery_engine_api-0.1.6.tar.gz
- Upload date:
- Size: 19.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c0fd1ae2ddece297497969e48fadbf14f1efd74dc3966c26372cd0b65bd770a6
|
|
| MD5 |
426552fd0cf112e3a72157eb2b7706fa
|
|
| BLAKE2b-256 |
f0211105293400275987f01db77793a53db949ccfefda3ec2023e0a039f2eb6b
|
File details
Details for the file discovery_engine_api-0.1.6-py3-none-any.whl.
File metadata
- Download URL: discovery_engine_api-0.1.6-py3-none-any.whl
- Upload date:
- Size: 13.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
693ca9a57145b730ccbd4c916459ed951f1f734f362db9501cd252503d2ba3e6
|
|
| MD5 |
750afe1ad8b10792e741d60239b66762
|
|
| BLAKE2b-256 |
1df49d06f72eaaf7f012258d2b7722a04fc2bcb876f9dce479b3abe26df8dee9
|