Python SDK for the Discovery Engine API
Project description
Discovery Engine Python SDK
Python client library for the Discovery Engine API.
Installation
pip install leap-discovery-client
For pandas DataFrame support:
pip install leap-discovery-client[pandas]
Quick Start
from discovery import Client
# Initialize client - automatically uses the production API
client = Client(api_key="your-api-key")
# Analyze a dataset and wait for results
result = client.analyze(
file="data.csv",
target_column="price",
mode="fast",
description="House price dataset from Kaggle",
column_descriptions={
"age": "Age of the house in years",
"price": "Sale price in USD"
},
visibility="public",
wait=True # Wait for completion and return full results
)
print(f"Run ID: {result.run_id}")
print(f"Status: {result.status}")
print(f"Found {len(result.patterns)} patterns")
Features
- Simple API: Single
analyze()method handles the entire workflow - Complete Results: Returns everything shown in the Discovery dashboard
- Pandas Support: Upload DataFrames directly with automatic column inference
- Async Support: Use
analyze_async()for async workflows - Polling: Automatically wait for completion with configurable timeout
What You Get Back
The SDK returns an AnalysisResult with everything the Discovery dashboard shows:
Summary (LLM-generated)
result.summary.overview # High-level explanation of findings
result.summary.key_insights # List of main takeaways
result.summary.novel_patterns # Novel pattern explanations
result.summary.surprising_findings
result.summary.statistically_significant
result.summary.data_insights # Important features, correlations
Patterns
for pattern in result.patterns:
print(f"Pattern {pattern.id}: {pattern.description}")
print(f" Direction: {pattern.direction}")
print(f" Lift: {pattern.lift_value}")
print(f" Support: {pattern.support_count} ({pattern.support_percentage:.1%})")
print(f" P-value: {pattern.p_value}")
print(f" Type: {pattern.pattern_type} / {pattern.novelty_type}")
print(f" Conditions: {pattern.conditions}")
print(f" Citations: {len(pattern.citations)}")
Columns with Feature Importance
for col in result.columns:
print(f"{col.display_name}")
print(f" Type: {col.type} ({col.data_type})")
print(f" Stats: mean={col.mean}, std={col.std}, min={col.min}, max={col.max}")
print(f" Null %: {col.null_percentage}")
if col.feature_importance_score:
print(f" Importance: {col.feature_importance_score}")
Correlation Matrix
for entry in result.correlation_matrix:
print(f"{entry.feature_x} <-> {entry.feature_y}: {entry.value:.3f}")
Feature Importance
if result.feature_importance:
print(f"Model type: {result.feature_importance.kind}")
print(f"Baseline: {result.feature_importance.baseline}")
for score in result.feature_importance.scores:
print(f" {score.feature}: {score.score}")
Configuration
The client automatically uses the production API endpoint. For testing or custom deployments, you can override the URL via the DISCOVERY_API_URL environment variable:
export DISCOVERY_API_URL="https://custom-api.example.com"
Configuration Options
All dashboard options are supported:
| Option | Type | Default | Description |
|---|---|---|---|
file |
str, Path, or DataFrame |
- | Dataset file or pandas DataFrame |
target_column |
str |
- | Name of column to predict |
mode |
"fast" / "deep" |
"fast" |
Analysis depth |
visibility |
"public" / "private" |
"public" |
Dataset visibility |
task |
str |
auto | "regression", "binary_classification", or "multiclass_classification" |
description |
str |
- | Dataset description |
column_descriptions |
Dict[str, str] |
- | Column name -> description mapping |
timeseries_groups |
List[Dict] |
- | Timeseries column groups |
auto_train_num_trials |
int |
1 | Number of training trials |
auto_train_max_epochs |
int |
10 | Maximum training epochs |
auto_report_use_llm_evals |
bool |
True |
Use LLM for descriptions |
wait |
bool |
False |
Wait for completion |
wait_timeout |
float |
None |
Max seconds to wait |
Async Usage
import asyncio
from discovery import Client
async def main():
async with Client(api_key="...") as client:
# Start analysis without waiting
result = await client.analyze_async(
file=df,
target_column="target"
)
print(f"Started run: {result.run_id}")
# Later, get results
result = await client.get_results(result.run_id)
# Or wait for completion
result = await client.wait_for_completion(result.run_id, timeout=600)
asyncio.run(main())
Step-by-Step API
For more control, use the individual methods:
# 1. Upload file
file_info = await client.upload_file("data.csv")
# 2. Create dataset
dataset = await client.create_dataset(
title="My Dataset",
description="...",
total_rows=1000
)
# 3. Link file to dataset
await client.create_file_record(dataset["id"], file_info)
# 4. Define columns
columns = await client.create_columns(dataset["id"], [
{"name": "age", "display_name": "Age", "type": "continuous", ...},
{"name": "price", "display_name": "Price", "type": "continuous", ...},
])
# 5. Start run
run = await client.create_run(
dataset["id"],
target_column_id=columns[1]["id"],
task="regression",
mode="fast"
)
# 6. Get results
result = await client.get_results(run["id"])
Data Types
AnalysisResult
@dataclass
class AnalysisResult:
run_id: str
report_id: Optional[str]
status: str # "pending", "processing", "completed", "failed"
# Dataset metadata
dataset_title: Optional[str]
dataset_description: Optional[str]
total_rows: Optional[int]
target_column: Optional[str]
task: Optional[str]
# Results
summary: Optional[Summary]
patterns: List[Pattern]
columns: List[Column]
correlation_matrix: List[CorrelationEntry]
feature_importance: Optional[FeatureImportance]
# Job tracking
job_id: Optional[str]
job_status: Optional[str]
error_message: Optional[str]
Pattern
@dataclass
class Pattern:
id: str
task: str
target_column: str
direction: str # "min" or "max"
p_value: float
conditions: List[Dict] # Continuous, categorical, or datetime conditions
lift_value: float
support_count: int
support_percentage: float
pattern_type: str # "validated" or "speculative"
novelty_type: str # "novel" or "confirmatory"
target_score: float
description: str
novelty_explanation: str
target_class: Optional[str]
target_mean: Optional[float]
target_std: Optional[float]
citations: List[Dict]
Column
@dataclass
class Column:
id: str
name: str
display_name: str
type: str # "continuous" or "categorical"
data_type: str # "int", "float", "string", "boolean", "datetime"
enabled: bool
description: Optional[str]
# Statistics
mean: Optional[float]
median: Optional[float]
std: Optional[float]
min: Optional[float]
max: Optional[float]
iqr_min: Optional[float]
iqr_max: Optional[float]
mode: Optional[str]
approx_unique: Optional[int]
null_percentage: Optional[float]
# Feature importance
feature_importance_score: Optional[float]
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file leap_discovery_client-0.1.0.tar.gz.
File metadata
- Download URL: leap_discovery_client-0.1.0.tar.gz
- Upload date:
- Size: 17.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
49676435463885ff003e36de9129400d6d198a3955608f236010c5edfd1a5876
|
|
| MD5 |
1e0c32925f3e3064b81653a6a7454715
|
|
| BLAKE2b-256 |
3cd989a9a4935a16e7dd9c0a232217d5bb21791c1439d47918c3ba7d152c2153
|
File details
Details for the file leap_discovery_client-0.1.0-py3-none-any.whl.
File metadata
- Download URL: leap_discovery_client-0.1.0-py3-none-any.whl
- Upload date:
- Size: 12.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b5673df494ed74a7d7c78e6fa70fbf72ca97ba3a1d8f679d012c32fc3dbef440
|
|
| MD5 |
6339371b6a9258b17fbf00d0f2458c25
|
|
| BLAKE2b-256 |
d435b89b9ea90b9160178d9f575a3bc974ffc102ec2d9ecd30524b70ab7f662d
|