Sandboxed data analysis with LLMs, powered by DuckDB
Project description
phantom
Sandboxed data analysis with LLMs (powered by DuckDB).
Phantom is a Python framework for LLM-assisted data analysis. The LLM doesn't need to see the actual data. Phantom reasons with opaque semantic references (@a3f2), writes SQL, and executes the queries locally in a sandboxed DuckDB engine.
Quick Start
pip install phantom-ai
import os
import phantom
session = phantom.Session(data_dir="~/data/exoplanets")
chat = phantom.Chat(
session,
provider="anthropic",
api_key=os.environ["ANTHROPIC_API_KEY"],
model="claude-sonnet-4-6",
system="You are an astrophysicist.",
)
response = chat.ask(
"Which habitable-zone exoplanets are within 50 light-years of Earth, "
"and what kind of stars do they orbit?"
)
How It Works
Given two CSV files and the question "Which habitable-zone exoplanets are within 50 light-years of Earth, and what kind of stars do they orbit?", Phantom produces this tool-call trace:
[0] read_csv("exoplanets.csv") → @6a97
[1] read_csv("stars.csv") → @cc35
[2] query({p: @6a97}) → @b1a0 -- habitable-zone filter
[3] query({s: @cc35}) → @f4e2 -- nearby stars (< 50 ly)
[4] query({hz: @b1a0, nb: @f4e2}) → @31d7 -- join + rank by distance
[5] export(@31d7) → [{name: "Proxima Cen b", ...}]
The semantic refs (@6a97, @cc35, ...) compose into a lazy execution graph:
@6a97 → @b1a0 ─┐
├→ @31d7
@cc35 → @f4e2 ─┘
Shared subgraphs are resolved once and cached. The query engine is DuckDB, so JOINs, window functions, CTEs, and aggregations all work natively.
Claude's answer (abridged):
Planet Distance Star Spectral type Proxima Cen b 4.2 ly Proxima Cen M-dwarf (3,042 K) Ross 128 b 11 ly Ross 128 M-dwarf (3,192 K) Teegarden b 12 ly Teegarden M-dwarf (2,904 K) GJ 667 Cc 24 ly GJ 667 C M-dwarf (3,350 K) TRAPPIST-1 e/f/g 40 ly TRAPPIST-1 M-dwarf (2,566 K) LHS 1140 b 41 ly LHS 1140 M-dwarf (3,216 K) HD 40307 g 42 ly HD 40307 K-dwarf (4,977 K) The nearest habitable-zone candidates overwhelmingly orbit M-dwarf stars — small, cool, and the most common type in the galaxy.
LLM Providers
Built-in support for Anthropic, OpenAI, and Google Gemini:
pip install "phantom-ai[anthropic]"
pip install "phantom-ai[openai]"
pip install "phantom-ai[google]"
chat = phantom.Chat(
session,
provider="anthropic",
api_key=os.environ["ANTHROPIC_API_KEY"],
model="claude-sonnet-4-6",
)
chat = phantom.Chat(
session,
provider="openai",
api_key=os.environ["OPENAI_API_KEY"],
model="gpt-4o",
)
chat = phantom.Chat(
session,
provider="google",
api_key=os.environ["GOOGLE_API_KEY"],
model="gemini-2.0-flash",
)
Phantom also honours each SDK's native env var (ANTHROPIC_API_KEY, OPENAI_API_KEY, GOOGLE_API_KEY) when api_key is omitted — useful for CI.
Any OpenAI-compatible API (Groq, Together, Fireworks, Ollama, vLLM, ...) works via base_url:
chat = phantom.Chat(
session,
provider=phantom.OpenAIProvider(
api_key="...",
base_url="https://api.groq.com/openai/v1",
),
model="llama-3.1-70b-versatile",
)
Custom Operations
Register domain-specific tools alongside the built-ins — the LLM can call them like any other operation:
@session.op
def fetch_lightcurve(target: str) -> dict:
"""Fetch a lightcurve from the MAST archive."""
return mast_api.query(target)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file phantom_ai-0.4.2.tar.gz.
File metadata
- Download URL: phantom_ai-0.4.2.tar.gz
- Upload date:
- Size: 52.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
89a322be0caed79493bcebd62b94c8d67552d058e3877680cc2cc589538f6600
|
|
| MD5 |
9b5cd900cf94b249fa773e5cc4479a63
|
|
| BLAKE2b-256 |
09ff05205bf58125cfa00b533eb33aea44af748b538214a7178ecbca67a0377f
|
Provenance
The following attestation bundles were made for phantom_ai-0.4.2.tar.gz:
Publisher:
release.yml on James-Wirth/phantom-ai
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
phantom_ai-0.4.2.tar.gz -
Subject digest:
89a322be0caed79493bcebd62b94c8d67552d058e3877680cc2cc589538f6600 - Sigstore transparency entry: 1280981337
- Sigstore integration time:
-
Permalink:
James-Wirth/phantom-ai@4347a4e7e730f8d5317bf3255d1ef2273273e4a1 -
Branch / Tag:
refs/tags/v0.4.2 - Owner: https://github.com/James-Wirth
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@4347a4e7e730f8d5317bf3255d1ef2273273e4a1 -
Trigger Event:
push
-
Statement type:
File details
Details for the file phantom_ai-0.4.2-py3-none-any.whl.
File metadata
- Download URL: phantom_ai-0.4.2-py3-none-any.whl
- Upload date:
- Size: 55.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cdc62f75ccb4653c61a24837b6f9f5f692ddede81032ea390991297ede87fe1d
|
|
| MD5 |
e1029d1a9583b350ca8d77a664be698f
|
|
| BLAKE2b-256 |
be731eac9796d6042a66a7f0da30c96b3751c69fce17050bb12ca0326a30a0af
|
Provenance
The following attestation bundles were made for phantom_ai-0.4.2-py3-none-any.whl:
Publisher:
release.yml on James-Wirth/phantom-ai
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
phantom_ai-0.4.2-py3-none-any.whl -
Subject digest:
cdc62f75ccb4653c61a24837b6f9f5f692ddede81032ea390991297ede87fe1d - Sigstore transparency entry: 1280981338
- Sigstore integration time:
-
Permalink:
James-Wirth/phantom-ai@4347a4e7e730f8d5317bf3255d1ef2273273e4a1 -
Branch / Tag:
refs/tags/v0.4.2 - Owner: https://github.com/James-Wirth
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@4347a4e7e730f8d5317bf3255d1ef2273273e4a1 -
Trigger Event:
push
-
Statement type: