Simple tool to query Dremio with Apache Arrow Flight
Project description
dremio_simple_query
The purpose of this library is to easily query a Dremio source using Arrow Flight for analytics.
Use Dremio to Help:
- Govern your data
- Join your data across sources (Iceberg, Delta, S3, JSON, CSV, RDBMS, and more)
- Accelerate your queries across data sources
- Reduce your Data Warehouse Workloads
With this library your analysts can more easily get their data from Dremio and easily get to work running local analytics with Arrow, Pandas, Polars and DuckDB. This library can grab large datasets performantly thanks to using Apache Arrow Flight.
New Feature: Connection Profiles
Manage authentications easily with ~/.dremio/profiles.yaml.
from dremio_simple_query.connectv2 import DremioConnection
# Connect using a pre-configured profile
dremio = DremioConnection(profile="my_cloud_profile")
Example ~/.dremio/profiles.yaml:
profiles:
# 1. Cloud with PAT
my_cloud:
type: cloud
base_url: https://api.dremio.cloud
auth:
type: pat
token: MY_PAT_TOKEN
# 2. Software with PAT
my_software_pat:
type: software
base_url: https://dremio.company.com
auth:
type: pat
token: MY_PAT_TOKEN
# 3. Software with Username/Password
my_software_basic:
type: software
base_url: https://dremio.company.com
auth:
type: username_password
username: my_user
password: my_password
# 4. Software with Client Credentials
my_software_oauth:
type: software
base_url: https://dremio.company.com
auth:
type: oauth
client_id: MY_CLIENT_ID
client_secret: MY_CLIENT_SECRET
🚀 Recommended: Use the V2 Client (dremio_simple_query.connectv2)
The V2 client is the modern, robust implementation designed for stability and advanced features, especially for Dremio Cloud.
Why use V2?
| Feature | V1 (connect.py) |
V2 (connectv2.py) |
|---|---|---|
| Authentication | Token Only | Token OR Username/Password (Auto-Handshake) |
| Dremio Cloud | Basic Support | Robust Support (Session Persistence via Cookies) |
| Project Context | Default Project Only | Multi-Project Support (Route via project_id) |
| Code Quality | Basic | Type Hinted & Docstringed |
V2 Quick Start
from dremio_simple_query.connectv2 import DremioConnection
from os import getenv
from dotenv import load_dotenv
load_dotenv()
# Option 1: Authenticate with PAT (Personal Access Token)
dremio = DremioConnection(
location=getenv("ARROW_ENDPOINT"), # e.g., grpc+tls://data.dremio.cloud:443
token=getenv("DREMIO_TOKEN"),
project_id=getenv("DREMIO_PROJECT_ID") # Optional: Specify Project ID context
)
# Option 2: Authenticate with Username/Password (Software Only)
# Performs automatic Arrow Flight Handshake
dremio_auth = DremioConnection(
location="grpc+tls://dremio.company.com:32010",
username="my_user",
password="my_password"
)
# Query Data (Returns FlightStreamReader)
stream = dremio.toArrow("SELECT * FROM star_wars.battles")
# Convert to your favorite format
df_pandas = dremio.toPandas("SELECT * FROM star_wars.battles")
df_polars = dremio.toPolars("SELECT * FROM star_wars.battles")
duck_rel = dremio.toDuckDB("SELECT * FROM star_wars.battles")
Legacy: Use the V1 Client (dremio_simple_query.connect)
The original client is maintained for backward compatibility. It is lighter but lacks session persistence features required for stable Dremio Cloud routing.
from dremio_simple_query.connect import DremioConnection
# Authenticate with Token Only
dremio = DremioConnection(token="MY_TOKEN", location="grpc+tls://data.dremio.cloud:443")
# Query
stream = dremio.toArrow("SELECT 1")
Detailed Usage Guide
Getting Your URI and Token
| Protocol | Endpoint | Result | |
|---|---|---|---|
| Dremio Cloud (NA) | grpc+tls:// | data.dremio.cloud:443 | grpc+tls://data.dremio.cloud:443 |
| Dremio Cloud (EU) | grpc+tls:// | data.eu.dremio.cloud:443 | grpc+tls://data.eu.dremio.cloud:443 |
| Dremio Software (SSL) | grpc+tls:// | <ip-address>:32010 |
grpc+tls://<ip-address>:32010 |
| Dremio Software (NoSSL) | grpc:// | <ip-address>:32010 |
grpc://<ip-address>:32010 |
Getting your token (V1 helper)
The get_token helper is available in v1 to fetch a token via REST API if you don't use the V2 handshake.
from dremio_simple_query.connect import get_token
login_endpoint = "http://localhost:9047/apiv2/login"
payload = {"userName": username, "password": password}
token = get_token(uri=login_endpoint, payload=payload)
Data Conversion Methods (Both V1 & V2)
The .toArrow method returns a FlightStreamReader object which can be converted into typical Arrow objects.
Arrow Table
arrow_table = stream.read_all()
Arrow RecordBatchReader
batch_reader = stream.to_reader()
toPandas
df = dremio.toPandas("SELECT * FROM arctic.table1;")
toPolars
df = dremio.toPolars("SELECT * FROM arctic.table1;")
Querying with DuckDB
Using the DuckDB Relation API
duck_rel = dremio.toDuckDB("SELECT * FROM arctic.table1")
result = duck_rel.query("table1", "SELECT * from table1").fetchall()
Querying Arrow Objects with DuckDB
import duckdb
con = duckdb.connection()
stream = dremio.toArrow("SELECT * FROM arctic.table1;")
my_table = stream.read_all()
# Zero-copy query against Arrow table
results = con.execute("SELECT * FROM my_table;").fetchall()
print(results)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dremio_simple_query-2.0.0.tar.gz.
File metadata
- Download URL: dremio_simple_query-2.0.0.tar.gz
- Upload date:
- Size: 14.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1fd5acf1bc2d361d0ae1e0a3016085a217534b631f8b084f5e8ec4dd354eca32
|
|
| MD5 |
76fe6b978588b4d6039c48b0601d1ae6
|
|
| BLAKE2b-256 |
6e1e7b5ffdb86027c79601591d68576c5f6d5a77bbd67c9051c90f1094060a94
|
File details
Details for the file dremio_simple_query-2.0.0-py3-none-any.whl.
File metadata
- Download URL: dremio_simple_query-2.0.0-py3-none-any.whl
- Upload date:
- Size: 12.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6b5d47cb81f7e9b04367f0603a3526ed2fa4eeb9d902575583a27a6b7346c9c7
|
|
| MD5 |
a7679974c2195e870751d506b1c338ba
|
|
| BLAKE2b-256 |
9181eb5066178194112cecffa9f19c27e0c20425bc9456d7d8f99c2fa52481ce
|