Skip to main content

Python SDK for the Discovery Engine API

Project description

Discovery Engine Python SDK

Python client library for the Discovery Engine API.

Installation

pip install leap-discovery-client

For pandas DataFrame support:

pip install leap-discovery-client[pandas]

Quick Start

from discovery import Client

# Initialize client - automatically uses the production API
client = Client(api_key="your-api-key")

# Analyze a dataset and wait for results
result = client.analyze(
    file="data.csv",
    target_column="price",
    mode="fast",
    description="House price dataset from Kaggle",
    column_descriptions={
        "age": "Age of the house in years",
        "price": "Sale price in USD"
    },
    visibility="public",
    wait=True  # Wait for completion and return full results
)

print(f"Run ID: {result.run_id}")
print(f"Status: {result.status}")
print(f"Found {len(result.patterns)} patterns")

Features

  • Simple API: Single analyze() method handles the entire workflow
  • Complete Results: Returns everything shown in the Discovery dashboard
  • Pandas Support: Upload DataFrames directly with automatic column inference
  • Async Support: Use analyze_async() for async workflows
  • Polling: Automatically wait for completion with configurable timeout

What You Get Back

The SDK returns an AnalysisResult with everything the Discovery dashboard shows:

Summary (LLM-generated)

result.summary.overview           # High-level explanation of findings
result.summary.key_insights       # List of main takeaways
result.summary.novel_patterns     # Novel pattern explanations
result.summary.surprising_findings
result.summary.statistically_significant
result.summary.data_insights      # Important features, correlations

Patterns

for pattern in result.patterns:
    print(f"Pattern {pattern.id}: {pattern.description}")
    print(f"  Direction: {pattern.direction}")
    print(f"  Lift: {pattern.lift_value}")
    print(f"  Support: {pattern.support_count} ({pattern.support_percentage:.1%})")
    print(f"  P-value: {pattern.p_value}")
    print(f"  Type: {pattern.pattern_type} / {pattern.novelty_type}")
    print(f"  Conditions: {pattern.conditions}")
    print(f"  Citations: {len(pattern.citations)}")

Columns with Feature Importance

for col in result.columns:
    print(f"{col.display_name}")
    print(f"  Type: {col.type} ({col.data_type})")
    print(f"  Stats: mean={col.mean}, std={col.std}, min={col.min}, max={col.max}")
    print(f"  Null %: {col.null_percentage}")
    if col.feature_importance_score:
        print(f"  Importance: {col.feature_importance_score}")

Correlation Matrix

for entry in result.correlation_matrix:
    print(f"{entry.feature_x} <-> {entry.feature_y}: {entry.value:.3f}")

Feature Importance

if result.feature_importance:
    print(f"Model type: {result.feature_importance.kind}")
    print(f"Baseline: {result.feature_importance.baseline}")
    for score in result.feature_importance.scores:
        print(f"  {score.feature}: {score.score}")

Configuration

The client automatically uses the production API endpoint. For testing or custom deployments, you can override the URL via the DISCOVERY_API_URL environment variable:

export DISCOVERY_API_URL="https://custom-api.example.com"

Configuration Options

All dashboard options are supported:

Option Type Default Description
file str, Path, or DataFrame - Dataset file or pandas DataFrame
target_column str - Name of column to predict
mode "fast" / "deep" "fast" Analysis depth
visibility "public" / "private" "public" Dataset visibility
task str auto "regression", "binary_classification", or "multiclass_classification"
description str - Dataset description
column_descriptions Dict[str, str] - Column name -> description mapping
timeseries_groups List[Dict] - Timeseries column groups
auto_train_num_trials int 1 Number of training trials
auto_train_max_epochs int 10 Maximum training epochs
auto_report_use_llm_evals bool True Use LLM for descriptions
wait bool False Wait for completion
wait_timeout float None Max seconds to wait

Async Usage

import asyncio
from discovery import Client

async def main():
    async with Client(api_key="...") as client:
        # Start analysis without waiting
        result = await client.analyze_async(
            file=df,
            target_column="target"
        )
        print(f"Started run: {result.run_id}")

        # Later, get results
        result = await client.get_results(result.run_id)
        
        # Or wait for completion
        result = await client.wait_for_completion(result.run_id, timeout=600)

asyncio.run(main())

Step-by-Step API

For more control, use the individual methods:

# 1. Upload file
file_info = await client.upload_file("data.csv")

# 2. Create dataset
dataset = await client.create_dataset(
    title="My Dataset",
    description="...",
    total_rows=1000
)

# 3. Link file to dataset
await client.create_file_record(dataset["id"], file_info)

# 4. Define columns
columns = await client.create_columns(dataset["id"], [
    {"name": "age", "display_name": "Age", "type": "continuous", ...},
    {"name": "price", "display_name": "Price", "type": "continuous", ...},
])

# 5. Start run
run = await client.create_run(
    dataset["id"],
    target_column_id=columns[1]["id"],
    task="regression",
    mode="fast"
)

# 6. Get results
result = await client.get_results(run["id"])

Data Types

AnalysisResult

@dataclass
class AnalysisResult:
    run_id: str
    report_id: Optional[str]
    status: str  # "pending", "processing", "completed", "failed"
    
    # Dataset metadata
    dataset_title: Optional[str]
    dataset_description: Optional[str]
    total_rows: Optional[int]
    target_column: Optional[str]
    task: Optional[str]
    
    # Results
    summary: Optional[Summary]
    patterns: List[Pattern]
    columns: List[Column]
    correlation_matrix: List[CorrelationEntry]
    feature_importance: Optional[FeatureImportance]
    
    # Job tracking
    job_id: Optional[str]
    job_status: Optional[str]
    error_message: Optional[str]

Pattern

@dataclass
class Pattern:
    id: str
    task: str
    target_column: str
    direction: str  # "min" or "max"
    p_value: float
    conditions: List[Dict]  # Continuous, categorical, or datetime conditions
    lift_value: float
    support_count: int
    support_percentage: float
    pattern_type: str  # "validated" or "speculative"
    novelty_type: str  # "novel" or "confirmatory"
    target_score: float
    description: str
    novelty_explanation: str
    target_class: Optional[str]
    target_mean: Optional[float]
    target_std: Optional[float]
    citations: List[Dict]

Column

@dataclass
class Column:
    id: str
    name: str
    display_name: str
    type: str  # "continuous" or "categorical"
    data_type: str  # "int", "float", "string", "boolean", "datetime"
    enabled: bool
    description: Optional[str]
    
    # Statistics
    mean: Optional[float]
    median: Optional[float]
    std: Optional[float]
    min: Optional[float]
    max: Optional[float]
    iqr_min: Optional[float]
    iqr_max: Optional[float]
    mode: Optional[str]
    approx_unique: Optional[int]
    null_percentage: Optional[float]
    
    # Feature importance
    feature_importance_score: Optional[float]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

leap_discovery_client-0.1.0.tar.gz (17.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

leap_discovery_client-0.1.0-py3-none-any.whl (12.7 kB view details)

Uploaded Python 3

File details

Details for the file leap_discovery_client-0.1.0.tar.gz.

File metadata

  • Download URL: leap_discovery_client-0.1.0.tar.gz
  • Upload date:
  • Size: 17.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for leap_discovery_client-0.1.0.tar.gz
Algorithm Hash digest
SHA256 49676435463885ff003e36de9129400d6d198a3955608f236010c5edfd1a5876
MD5 1e0c32925f3e3064b81653a6a7454715
BLAKE2b-256 3cd989a9a4935a16e7dd9c0a232217d5bb21791c1439d47918c3ba7d152c2153

See more details on using hashes here.

File details

Details for the file leap_discovery_client-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for leap_discovery_client-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b5673df494ed74a7d7c78e6fa70fbf72ca97ba3a1d8f679d012c32fc3dbef440
MD5 6339371b6a9258b17fbf00d0f2458c25
BLAKE2b-256 d435b89b9ea90b9160178d9f575a3bc974ffc102ec2d9ecd30524b70ab7f662d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page