Simple tool to query Dremio with Apache Arrow Flight
Project description
dremio_simple_query
The purpose of this library is to easily query a Dremio source using Arrow Flight for analytics.
Use Dremio to Help:
- Govern your data
- Join your data across sources (Iceberg, Delta, S3, JSON, CSV, RDBMS, and more)
- Accelerate your queries across data sources
- Reduce your Data Warehouse Workloads
With this library your analysts can more easily get their data from Dremio and easily get to work running local analytics with Arrow, Pandas, Polars and DuckDB. This library can grab large datasets performantly thanks to using Apache Arrow Flight.
Getting Your URI and Token
Protocol | Endpoint | Result | |
---|---|---|---|
Dremio Cloud (NA) | grpc+tls:// | data.dremio.cloud:443 | grpc+tls://data.dremio.cloud:443 |
Dremio Cloud (EU) | grpc+tls:// | data.eu.dremio.cloud:443 | grpc+tls://data.eu.dremio.cloud:443 |
Dremio Software (SSL) | grpc+tls:// | <ip-address> :32010 |
grpc+tls://<ip-address> :32010 |
Dremio Software (NoSSL) | grpc:// | <ip-address> :32010 |
grpc://<ip-address> :32010 |
Getting your token
- For Dremio Cloud can get token from interface or REST API
- For Dremio Software can get token from Rest API
The get_token function is included to help get the token from the Dremio Rest API.
from dremio_simple_query.connect import get_token, DremioConnection
## URL to Login Endpoint
login_endpoint = "http://localhost:9047/apiv2/login"
## Payload for Login
payload = {
"userName": username,
"password": password
}
## Get token from API
token = get_token(uri = login_endpoint, payload=payload)
## URL Dremio Software Flight Endpoint
arrow_endpoint="grpc://localhost:32010"
## Establish Client
dremio = DremioConnection(token, arrow_endpoint)
Setting up your connection
from dremio_simple_query.connect import DremioConnection
from os import getenv
from dotenv import load_dotenv
load_dotenv()
## Dremio Person Token
token = getenv("TOKEN")
## Arrow Endpoint (See Dremio Documentation)
uri = getenv("ARROW_ENDPOINT")
## Create Dremio Arrow Connection
dremio = DremioConnection(token, uri)
Query (Get Arrow Back)
If you want to get Arrow Data back you can run a query like so.
stream = dremio.toArrow("SELECT * FROM arctic.table1;")
The .toArrow
method returns a FlightStreamReader
object which can be converted into typical Arrow objects.
Arrow Table
arrow_table = stream.read_all()
Arrow RecordBatchReader
batch_reader = stream.to_reader()
toPandas (Get Pandas Dataframe Back)
df = dremio.toPandas("SELECT * FROM arctic.table1;")
toPolars (Get Polars Dataframe Back)
df = dremio.toPolars("SELECT * FROM arctic.table1;")
Querying with DuckDB
Using the DuckDB Relation API
Using the .toDuckDB
method the query results will be returned as a DuckDB relation.
duck_rel = dremio.toDuckDB("SELECT * FROM arctic.table1")
result = duck_rel.query("table1", "SELECT * from table1").fetchall()
result2 = duck_rel.filter
print(result)
Querying Arrow Objects with DuckDB
from dremio_simple_query.connect import DremioConnection
from os import getenv
from dotenv import load_dotenv
import duckdb
## DuckDB Connection
con = duckdb.connection()
load_dotenv()
## Dremio Person Token
token = getenv("TOKEN")
## Arrow Endpoint (See Dremio Documentation)
uri = getenv("ARROW_ENDPOINT")
## Create Dremio Arrow Connection
dremio = DremioConnection(token, uri)
## Get Data from Dremio
stream = dremio.toArrow("SELECT * FROM arctic.table1;")
## Turn into Arrow Table
my_table = stream.read_all()
## Query with Duckdb
results = con.execute("SELECT * FROM my_table;").fetchall()
print(results)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file dremio_simple_query-0.0.4.tar.gz
.
File metadata
- Download URL: dremio_simple_query-0.0.4.tar.gz
- Upload date:
- Size: 4.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 60ea218193b4c58846b6316604c57a7b2cfcb73b18bcfb14fed2f5c94fa119a0 |
|
MD5 | e3db5a1f9c3758c5b9d1b8c1eace5db0 |
|
BLAKE2b-256 | 4264c790d01bfc9ff49d8a8b0fd98d808d115ca297b5b50e7a0de3705f6160d1 |
File details
Details for the file dremio_simple_query-0.0.4-py3-none-any.whl
.
File metadata
- Download URL: dremio_simple_query-0.0.4-py3-none-any.whl
- Upload date:
- Size: 4.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f18b447428508bafd15c430904ef828e298041f5899f45153a7b8c8bdd16dae6 |
|
MD5 | 15c1defbca643a405d53b5aa480ca20a |
|
BLAKE2b-256 | 3295626540dfbb971d0c8f63eddc1a53e78444d1a48e8af8eb043d5888fd41c3 |