Python SDK for the Discovery Engine API

These details have not been verified by PyPI

Project links

Project description

Discovery Engine Python API

The Discovery Engine Python API provides a simple programmatic interface to run analyses via Python, offering an alternative to using the web dashboard. Instead of uploading datasets and configuring analyses through the UI, you can automate your discovery workflows directly from your Python code or scripts.

All analyses run through the API are fully integrated with your Discovery Engine account. Results are automatically displayed in the dashboard, where you can view detailed reports, explore patterns, and share findings with your team. Your account management, credit balance, and subscription settings are all handled through the dashboard.

Installation

pip install discovery-engine-api

For pandas DataFrame support:

pip install discovery-engine-api[pandas]

For Jupyter notebook support:

pip install discovery-engine-api[jupyter]

This installs nest-asyncio, which is required to use engine.run() in Jupyter notebooks. Alternatively, you can use await engine.run_async() directly in Jupyter notebooks without installing the jupyter extra.

Configuration

API Keys

Get your API key from the Developers page in your Discovery Engine dashboard.

Quick Start

from discovery import Engine

# Initialize engine
engine = Engine(api_key="your-api-key")

# Run analysis on a dataset and wait for results
result = engine.run(
    file="data.csv",
    target_column="diagnosis",
    description="Rare diseases dataset",
    excluded_columns=["patient_id"],  # Exclude ID column from analysis
    wait=True  # Wait for completion and return full results
)

print(f"Run ID: {result.run_id}")
print(f"Status: {result.status}")
print(f"Found {len(result.patterns)} patterns")

Examples

Working with Pandas DataFrames

import pandas as pd
from discovery import Engine

df = pd.read_csv("data.csv")
# or create DataFrame directly

engine = Engine(api_key="your-api-key")
result = engine.run(
    file=df,  # Pass DataFrame directly
    target_column="outcome",
    column_descriptions={
        "age": "Patient age in years",
        "heart rate": None
    },
    excluded_columns=["id", "timestamp"],  # Exclude ID and timestamp columns from analysis
    wait=True
)

Async Workflow

import asyncio
from discovery import Engine

async def run_analysis():
    async with Engine(api_key="your-api-key") as engine:
        # Start analysis without waiting
        result = await engine.run_async(
            file="data.csv",
            target_column="target",
            wait=False
        )
        print(f"Started run: {result.run_id}")

        # Later, get results
        result = await engine.get_results(result.run_id)
        
        # Or wait for completion
        result = await engine.wait_for_completion(result.run_id, timeout=1200)
        return result

result = asyncio.run(run_analysis())

Using in Jupyter Notebooks

In Jupyter notebooks, you have two options:

Option 1: Install the jupyter extra (recommended)

pip install discovery-engine-api[jupyter]

Then use engine.run() as normal:

from discovery import Engine

engine = Engine(api_key="your-api-key")
result = engine.run(file="data.csv", target_column="target", wait=True)

Option 2: Use async directly

from discovery import Engine

engine = Engine(api_key="your-api-key")
result = await engine.run_async(file="data.csv", target_column="target", wait=True)

Configuration Options

The run() and run_async() methods accept the following parameters:

Parameter	Type	Default	Description
`file`	`str`, `Path`, or `DataFrame`	Required	Dataset file path or pandas DataFrame
`target_column`	`str`	Required	Name of column to predict
`depth_iterations`	`int`	`1`	Analysis depth — number of iterative feature-removal cycles. Higher values find more subtle patterns but use more credits. The maximum useful value is `num_columns - 2`; values above that are capped server-side.
`title`	`str`	`None`	Optional dataset title
`description`	`str`	`None`	Optional dataset description
`column_descriptions`	`Dict[str, str]`	`None`	Optional column name → description mapping
`excluded_columns`	`List[str]`	`None`	Optional list of column names to exclude from analysis (e.g., IDs, timestamps)
`visibility`	`"public"` / `"private"`	`"public"`	Dataset visibility. Public runs are free but always use depth 1. Private runs require credits and support higher depth.
`auto_report_use_llm_evals`	`bool`	`True`	Use LLM for pattern descriptions and citations
`author`	`str`	`None`	Optional dataset author attribution
`source_url`	`str`	`None`	Optional source URL for dataset attribution
`wait`	`bool`	`False`	Wait for analysis to complete and return full results
`wait_timeout`	`float`	`None`	Maximum seconds to wait for completion (only if `wait=True`)

Note on depth and visibility: Public runs are always depth_iterations=1 regardless of settings. To use depth_iterations > 1, set visibility="private". Private runs consume credits based on file size × depth.

File Size Limits

The SDK supports file uploads up to 1 GB. Files are uploaded directly to cloud storage using presigned URLs, so there is no HTTP body size restriction.

Supported file formats: CSV, Parquet.

Credits and Pricing

If you don't have enough credits for a private run, the SDK will raise a ValueError with a message like:

Insufficient credits. You need X credits but only have Y available.

Solutions:

Make your dataset public (set visibility="public") — completely free
Visit https://disco.leap-labs.com/account to:
- Purchase additional credits
- Upgrade to a subscription plan that includes more credits

Return Value

The run() and run_async() methods return an EngineResult object with the following fields:

EngineResult

@dataclass
class EngineResult:
    # Identifiers
    run_id: str                    # Unique run identifier
    report_id: Optional[str]       # Report ID (if report created)
    status: str                    # "pending", "processing", "completed", "failed"
    
    # Dataset metadata
    dataset_title: Optional[str]
    dataset_description: Optional[str]
    total_rows: Optional[int]              # Number of rows in dataset
    target_column: Optional[str]           # Name of target column
    task: Optional[str]                    # "regression", "binary_classification", or "multiclass_classification"
    
    # LLM-generated summary
    summary: Optional[Summary]

    # Discovered patterns
    patterns: List[Pattern]

    # Column/feature information
    columns: List[Column]                  # List of columns with statistics and importance

    # Correlation matrix
    correlation_matrix: List[CorrelationEntry]  # Feature correlations
    
    # Global feature importance
    feature_importance: Optional[FeatureImportance]  # Feature importance scores
    
    # Job tracking
    job_id: Optional[str]
    job_status: Optional[str]
    error_message: Optional[str]

Pattern

@dataclass
class Pattern:
    id: str
    task: str                           # "regression", "binary_classification", "multiclass_classification"
    target_column: str
    target_change_direction: str        # "max" (increases target) or "min" (decreases target)
    p_value: float                      # FDR-adjusted p-value (lower = more significant)
    conditions: List[Dict]              # Conditions defining the pattern (see below)
    abs_target_change: float            # Absolute change in target (always positive, magnitude of effect)
    support_count: int                  # Number of rows matching pattern
    support_percentage: float           # Percentage of dataset matching pattern
    novelty_type: str                   # "novel" or "confirmatory"
    target_score: float                 # Effect size score
    description: str                    # Human-readable description
    novelty_explanation: str            # Why the pattern is novel or confirmatory
    target_class: Optional[str]         # For classification tasks
    target_mean: Optional[float]        # Target mean within pattern (regression)
    target_std: Optional[float]         # Target std within pattern (regression)
    citations: List[Dict]               # Academic citations if available
    p_value_raw: Optional[float]        # Raw p-value before FDR adjustment

Pattern Conditions

Each condition in pattern.conditions is a dict with a type field:

Continuous condition — a numeric range:

{
    "type": "continuous",
    "feature": "age",
    "min_value": 45.0,
    "max_value": 65.0,
    "min_q": 0.35,   # quantile of min_value
    "max_q": 0.72    # quantile of max_value
}

Categorical condition — a set of values:

{
    "type": "categorical",
    "feature": "region",
    "values": ["north", "east"]
}

Datetime condition — a time range:

{
    "type": "datetime",
    "feature": "date",
    "min_value": 1609459200000,   # epoch ms
    "max_value": 1640995200000,
    "min_datetime": "2021-01-01", # human-readable
    "max_datetime": "2022-01-01"
}

Summary

@dataclass
class Summary:
    overview: str                       # High-level summary of findings
    key_insights: List[str]             # Main takeaways
    novel_patterns: PatternGroup        # Novel pattern IDs and explanation
    selected_pattern_id: Optional[str]  # Featured pattern ID

Note: The data_insights field from v0.1.x has been removed. Use result.feature_importance and result.correlation_matrix directly instead — these provide the raw computed values without LLM summarization artifacts.

Column

@dataclass
class Column:
    id: str
    name: str
    display_name: str
    type: str                           # "continuous" or "categorical"
    data_type: str                      # "int", "float", "string", "boolean", "datetime"
    enabled: bool
    description: Optional[str]
    
    # Statistics (for numeric columns)
    mean: Optional[float]
    median: Optional[float]
    std: Optional[float]
    min: Optional[float]
    max: Optional[float]
    iqr_min: Optional[float]
    iqr_max: Optional[float]
    mode: Optional[str]                 # Statistical mode (None if all values unique)
    approx_unique: Optional[int]
    null_percentage: Optional[float]
    
    # Feature importance
    feature_importance_score: Optional[float]  # Signed importance score (see FeatureImportance)

FeatureImportance

Feature importance is computed using Hierarchical Perturbation (HiPe), an efficient ablation-based method. Scores are signed to indicate direction:

Positive: feature increases the prediction / supports predicted class
Negative: feature decreases the prediction / works against predicted class

@dataclass
class FeatureImportance:
    kind: str                           # "global"
    baseline: float                     # Baseline model output (mean prediction)
    scores: List[FeatureImportanceScore]

@dataclass
class FeatureImportanceScore:
    feature: str                        # Feature/column name
    score: float                        # Signed importance score

CorrelationEntry

@dataclass
class CorrelationEntry:
    feature_x: str
    feature_y: str
    value: float                        # Correlation coefficient (-1 to 1)

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.2.110

May 20, 2026

0.2.109

May 19, 2026

0.2.108

May 14, 2026

0.2.107

May 14, 2026

0.2.106

May 14, 2026

0.2.105

Apr 14, 2026

0.2.104

Apr 14, 2026

0.2.103

Apr 14, 2026

0.2.102

Apr 13, 2026

0.2.101

Apr 13, 2026

0.2.100

Apr 13, 2026

0.2.99

Apr 10, 2026

0.2.98

Apr 10, 2026

0.2.97

Apr 10, 2026

0.2.96

Apr 9, 2026

0.2.95

Apr 9, 2026

0.2.94

Apr 9, 2026

0.2.93

Apr 8, 2026

0.2.92

Apr 8, 2026

0.2.91

Apr 3, 2026

0.2.90

Apr 3, 2026

0.2.89

Apr 1, 2026

0.2.88

Apr 1, 2026

0.2.87

Apr 1, 2026

0.2.86

Apr 1, 2026

0.2.85

Mar 25, 2026

0.2.84

Mar 25, 2026

0.2.83

Mar 24, 2026

0.2.82

Mar 24, 2026

0.2.81

Mar 24, 2026

0.2.80

Mar 20, 2026

0.2.79

Mar 20, 2026

0.2.78

Mar 20, 2026

0.2.77

Mar 20, 2026

0.2.76

Mar 19, 2026

0.2.75

Mar 19, 2026

0.2.74

Mar 19, 2026

0.2.73

Mar 18, 2026

0.2.72

Mar 18, 2026

0.2.71

Mar 18, 2026

0.2.70

Mar 17, 2026

0.2.69

Mar 17, 2026

0.2.68

Mar 17, 2026

0.2.67

Mar 17, 2026

0.2.66

Mar 17, 2026

0.2.65

Mar 16, 2026

0.2.64

Mar 13, 2026

0.2.63

Mar 12, 2026

0.2.62

Mar 12, 2026

0.2.61

Mar 12, 2026

0.2.60

Mar 11, 2026

0.2.59

Mar 11, 2026

0.2.58

Mar 11, 2026

0.2.57

Mar 11, 2026

0.2.56

Mar 11, 2026

0.2.55

Mar 10, 2026

0.2.54

Mar 10, 2026

0.2.53

Mar 9, 2026

0.2.52

Mar 6, 2026

0.2.51

Mar 6, 2026

0.2.50

Mar 6, 2026

0.2.49

Feb 28, 2026

0.2.48

Feb 28, 2026

0.2.47

Feb 27, 2026

0.2.46

Feb 27, 2026

0.2.45

Feb 26, 2026

0.2.44

Feb 25, 2026

0.2.43

Feb 25, 2026

0.2.42

Feb 25, 2026

0.2.41

Feb 25, 2026

0.2.40

Feb 25, 2026

0.2.39

Feb 25, 2026

0.2.38

Feb 24, 2026

0.2.37

Feb 24, 2026

0.2.36

Feb 24, 2026

0.2.35

Feb 24, 2026

0.2.34

Feb 23, 2026

0.2.33

Feb 23, 2026

0.2.32

Feb 23, 2026

0.2.31

Feb 23, 2026

0.2.30

Feb 22, 2026

0.2.29

Feb 22, 2026

0.2.28

Feb 22, 2026

0.2.27

Feb 22, 2026

0.2.26

Feb 22, 2026

0.2.25

Feb 22, 2026

0.2.24

Feb 22, 2026

0.2.23

Feb 21, 2026

0.2.22

Feb 21, 2026

0.2.21

Feb 19, 2026

0.2.20

Feb 19, 2026

0.2.19

Feb 19, 2026

0.2.18

Feb 19, 2026

0.2.17

Feb 19, 2026

0.2.16

Feb 19, 2026

0.2.15

Feb 18, 2026

0.2.14

Feb 13, 2026

0.2.13

Feb 13, 2026

0.2.12

Feb 13, 2026

0.2.11

Feb 12, 2026

0.2.10

Feb 12, 2026

This version

0.2.9

Feb 12, 2026

0.2.8

Feb 11, 2026

0.2.7

Feb 11, 2026

0.2.6

Feb 10, 2026

0.2.5

Feb 10, 2026

0.2.4

Feb 9, 2026

0.2.3

Feb 9, 2026

0.2.2

Feb 6, 2026

0.2.1

Feb 6, 2026

0.1.101

Feb 6, 2026

0.1.100

Feb 6, 2026

0.1.99

Feb 5, 2026

0.1.98

Feb 5, 2026

0.1.97

Feb 5, 2026

0.1.96

Feb 5, 2026

0.1.95

Feb 4, 2026

0.1.94

Feb 4, 2026

0.1.93

Feb 4, 2026

0.1.92

Feb 4, 2026

0.1.91

Feb 4, 2026

0.1.90

Feb 4, 2026

0.1.89

Feb 4, 2026

0.1.84

Feb 4, 2026

0.1.83

Feb 3, 2026

0.1.80

Feb 3, 2026

0.1.77

Feb 3, 2026

0.1.76

Feb 3, 2026

0.1.73

Feb 2, 2026

0.1.67

Feb 2, 2026

0.1.61

Feb 2, 2026

0.1.59

Feb 1, 2026

0.1.58

Feb 1, 2026

0.1.57

Feb 1, 2026

0.1.55

Feb 1, 2026

0.1.52

Jan 31, 2026

0.1.50

Jan 31, 2026

0.1.47

Jan 31, 2026

0.1.37

Jan 30, 2026

0.1.36

Jan 30, 2026

0.1.34

Jan 29, 2026

0.1.33

Jan 29, 2026

0.1.26

Jan 27, 2026

0.1.24

Jan 27, 2026

0.1.12

Jan 22, 2026

0.1.9

Jan 22, 2026

0.1.7

Jan 22, 2026

0.1.6

Jan 22, 2026

0.1.5

Jan 22, 2026

0.1.1

Jan 21, 2026

0.1.0

Jan 21, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

discovery_engine_api-0.2.9.tar.gz (16.4 kB view details)

Uploaded Feb 12, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

discovery_engine_api-0.2.9-py3-none-any.whl (18.7 kB view details)

Uploaded Feb 12, 2026 Python 3

File details

Details for the file discovery_engine_api-0.2.9.tar.gz.

File metadata

Download URL: discovery_engine_api-0.2.9.tar.gz
Upload date: Feb 12, 2026
Size: 16.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for discovery_engine_api-0.2.9.tar.gz
Algorithm	Hash digest
SHA256	`6d9d774685296c759b21eac9db04c89992837db452235adcf8d57916a5b4a834`
MD5	`0614dd7492eb453ea46ac32bc1a03797`
BLAKE2b-256	`83726509828b53bdc303bf7c749b8243a6971d8ec5d7ddbe59600bd7bb324b2c`

See more details on using hashes here.

File details

Details for the file discovery_engine_api-0.2.9-py3-none-any.whl.

File metadata

Download URL: discovery_engine_api-0.2.9-py3-none-any.whl
Upload date: Feb 12, 2026
Size: 18.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for discovery_engine_api-0.2.9-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6215834e290659fd36b0a8f3068270858a1f2930dc5c8f9c25e12edfb7689b4a`
MD5	`874abbedb4bb064024394c0a86aa481e`
BLAKE2b-256	`e60d06ee9dbdf3d434e237ec7f757fd4c8a9064979b631bb9e2d17f66b22bc7d`

See more details on using hashes here.

discovery-engine-api 0.2.9

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Discovery Engine Python API

Installation

Configuration

API Keys

Quick Start

Examples

Working with Pandas DataFrames

Async Workflow

Using in Jupyter Notebooks

Configuration Options

File Size Limits

Credits and Pricing

Return Value

EngineResult

Pattern

Pattern Conditions

Summary

Column

FeatureImportance

CorrelationEntry

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes