Skip to main content

Sandboxed data analysis with LLMs, powered by DuckDB

Project description

Phantom
phantom

Sandboxed data analysis with LLMs (powered by DuckDB).

CI License

Phantom is a Python framework for LLM-assisted data analysis. The LLM doesn't need to see the actual data. Phantom reasons with opaque semantic references (@a3f2), writes SQL, and executes the queries locally in a sandboxed DuckDB engine.

Phantom CLI demo

Quick Start

pip install phantom-ai
import os
import phantom

session = phantom.Session(data_dir="~/data/exoplanets")

chat = phantom.Chat(
    session,
    provider="anthropic",
    api_key=os.environ["ANTHROPIC_API_KEY"],
    model="claude-sonnet-4-6",
    system="You are an astrophysicist.",
)

response = chat.ask(
    "Which habitable-zone exoplanets are within 50 light-years of Earth, "
    "and what kind of stars do they orbit?"
)

How It Works

Given two CSV files and the question "Which habitable-zone exoplanets are within 50 light-years of Earth, and what kind of stars do they orbit?", Phantom produces this tool-call trace:

[0] read_csv("exoplanets.csv")            → @6a97
[1] read_csv("stars.csv")                 → @cc35
[2] query({p: @6a97})                     → @b1a0  -- habitable-zone filter
[3] query({s: @cc35})                     → @f4e2  -- nearby stars (< 50 ly)
[4] query({hz: @b1a0, nb: @f4e2})         → @31d7  -- join + rank by distance
[5] export(@31d7)                         → [{name: "Proxima Cen b", ...}]

The semantic refs (@6a97, @cc35, ...) compose into a lazy execution graph:

@6a97 → @b1a0 ─┐
                ├→ @31d7
@cc35 → @f4e2 ─┘

Shared subgraphs are resolved once and cached. The query engine is DuckDB, so JOINs, window functions, CTEs, and aggregations all work natively.

Claude's answer (abridged):

Planet Distance Star Spectral type
Proxima Cen b 4.2 ly Proxima Cen M-dwarf (3,042 K)
Ross 128 b 11 ly Ross 128 M-dwarf (3,192 K)
Teegarden b 12 ly Teegarden M-dwarf (2,904 K)
GJ 667 Cc 24 ly GJ 667 C M-dwarf (3,350 K)
TRAPPIST-1 e/f/g 40 ly TRAPPIST-1 M-dwarf (2,566 K)
LHS 1140 b 41 ly LHS 1140 M-dwarf (3,216 K)
HD 40307 g 42 ly HD 40307 K-dwarf (4,977 K)

The nearest habitable-zone candidates overwhelmingly orbit M-dwarf stars — small, cool, and the most common type in the galaxy.

LLM Providers

Built-in support for Anthropic, OpenAI, and Google Gemini:

pip install "phantom-ai[anthropic]"
pip install "phantom-ai[openai]"
pip install "phantom-ai[google]"
chat = phantom.Chat(
    session,
    provider="anthropic",
    api_key=os.environ["ANTHROPIC_API_KEY"],
    model="claude-sonnet-4-6",
)
chat = phantom.Chat(
    session,
    provider="openai",
    api_key=os.environ["OPENAI_API_KEY"],
    model="gpt-4o",
)
chat = phantom.Chat(
    session,
    provider="google",
    api_key=os.environ["GOOGLE_API_KEY"],
    model="gemini-2.0-flash",
)

Phantom also honours each SDK's native env var (ANTHROPIC_API_KEY, OPENAI_API_KEY, GOOGLE_API_KEY) when api_key is omitted — useful for CI.

Any OpenAI-compatible API (Groq, Together, Fireworks, Ollama, vLLM, ...) works via base_url:

chat = phantom.Chat(
    session,
    provider=phantom.OpenAIProvider(
        api_key="...",
        base_url="https://api.groq.com/openai/v1",
    ),
    model="llama-3.1-70b-versatile",
)

Custom Operations

Register domain-specific tools alongside the built-ins — the LLM can call them like any other operation:

@session.op
def fetch_lightcurve(target: str) -> dict:
    """Fetch a lightcurve from the MAST archive."""
    return mast_api.query(target)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

phantom_ai-0.4.2.tar.gz (52.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

phantom_ai-0.4.2-py3-none-any.whl (55.2 kB view details)

Uploaded Python 3

File details

Details for the file phantom_ai-0.4.2.tar.gz.

File metadata

  • Download URL: phantom_ai-0.4.2.tar.gz
  • Upload date:
  • Size: 52.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for phantom_ai-0.4.2.tar.gz
Algorithm Hash digest
SHA256 89a322be0caed79493bcebd62b94c8d67552d058e3877680cc2cc589538f6600
MD5 9b5cd900cf94b249fa773e5cc4479a63
BLAKE2b-256 09ff05205bf58125cfa00b533eb33aea44af748b538214a7178ecbca67a0377f

See more details on using hashes here.

Provenance

The following attestation bundles were made for phantom_ai-0.4.2.tar.gz:

Publisher: release.yml on James-Wirth/phantom-ai

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file phantom_ai-0.4.2-py3-none-any.whl.

File metadata

  • Download URL: phantom_ai-0.4.2-py3-none-any.whl
  • Upload date:
  • Size: 55.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for phantom_ai-0.4.2-py3-none-any.whl
Algorithm Hash digest
SHA256 cdc62f75ccb4653c61a24837b6f9f5f692ddede81032ea390991297ede87fe1d
MD5 e1029d1a9583b350ca8d77a664be698f
BLAKE2b-256 be731eac9796d6042a66a7f0da30c96b3751c69fce17050bb12ca0326a30a0af

See more details on using hashes here.

Provenance

The following attestation bundles were made for phantom_ai-0.4.2-py3-none-any.whl:

Publisher: release.yml on James-Wirth/phantom-ai

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page