Skip to main content

Simple tool to query Dremio with Apache Arrow Flight

Project description

dremio_simple_query

The purpose of this library is to easily query a Dremio source using Arrow Flight for analytics.

LEARN MORE ABOUT DREMIO

Use Dremio to Help:

  • Govern your data
  • Join your data across sources (Iceberg, Delta, S3, JSON, CSV, RDBMS, and more)
  • Accelerate your queries across data sources
  • Reduce your Data Warehouse Workloads

With this library your analysts can more easily get their data from Dremio and easily get to work running local analytics with Arrow, Pandas, Polars and DuckDB. This library can grab large datasets performantly thanks to using Apache Arrow Flight.

Getting Your URI and Token

Protocol Endpoint Result
Dremio Cloud (NA) grpc+tls:// data.dremio.cloud:443 grpc+tls://data.dremio.cloud:443
Dremio Cloud (EU) grpc+tls:// data.eu.dremio.cloud:443 grpc+tls://data.eu.dremio.cloud:443
Dremio Software (SSL) grpc+tls:// <ip-address>:32010 grpc+tls://<ip-address>:32010
Dremio Software (NoSSL) grpc:// <ip-address>:32010 grpc://<ip-address>:32010

Getting your token

  • For Dremio Cloud can get token from interface or REST API
  • For Dremio Software can get token from Rest API

The get_token function is included to help get the token from the Dremio Rest API.

from dremio_simple_query.connect import get_token, DremioConnection

## URL to Login Endpoint
login_endpoint = "http://localhost:9047/apiv2/login"

## Payload for Login
payload = {
    "userName": username,
    "password": password
}

## Get token from API
token = get_token(uri = login_endpoint, payload=payload)

## URL Dremio Software Flight Endpoint
arrow_endpoint="grpc://localhost:32010"

## Establish Client
dremio = DremioConnection(token, arrow_endpoint)

Setting up your connection

from dremio_simple_query.connect import DremioConnection
from os import getenv
from dotenv import load_dotenv

load_dotenv()

## Dremio Person Token
token = getenv("TOKEN")

## Arrow Endpoint (See Dremio Documentation)
uri = getenv("ARROW_ENDPOINT")

## Create Dremio Arrow Connection
dremio = DremioConnection(token, uri)

Query (Get Arrow Back)

If you want to get Arrow Data back you can run a query like so.

stream = dremio.toArrow("SELECT * FROM arctic.table1;")

The .toArrow method returns a FlightStreamReader object which can be converted into typical Arrow objects.

Arrow Table

arrow_table = stream.read_all()

Arrow RecordBatchReader

batch_reader = stream.to_reader()

toPandas (Get Pandas Dataframe Back)

df = dremio.toPandas("SELECT * FROM arctic.table1;")

toPolars (Get Polars Dataframe Back)

df = dremio.toPolars("SELECT * FROM arctic.table1;")

Querying with DuckDB

Using the DuckDB Relation API

Using the .toDuckDB method the query results will be returned as a DuckDB relation.

duck_rel = dremio.toDuckDB("SELECT * FROM arctic.table1")

result = duck_rel.query("table1", "SELECT * from table1").fetchall()

result2 = duck_rel.filter

print(result)

Querying Arrow Objects with DuckDB

from dremio_simple_query.connect import DremioConnection
from os import getenv
from dotenv import load_dotenv
import duckdb

## DuckDB Connection
con = duckdb.connection()

load_dotenv()

## Dremio Person Token
token = getenv("TOKEN")

## Arrow Endpoint (See Dremio Documentation)
uri = getenv("ARROW_ENDPOINT")

## Create Dremio Arrow Connection
dremio = DremioConnection(token, uri)

## Get Data from Dremio
stream = dremio.toArrow("SELECT * FROM arctic.table1;")

## Turn into Arrow Table
my_table = stream.read_all()

## Query with Duckdb
results = con.execute("SELECT * FROM my_table;").fetchall()

print(results)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dremio_simple_query-0.0.4.tar.gz (4.4 kB view details)

Uploaded Source

Built Distribution

dremio_simple_query-0.0.4-py3-none-any.whl (4.8 kB view details)

Uploaded Python 3

File details

Details for the file dremio_simple_query-0.0.4.tar.gz.

File metadata

  • Download URL: dremio_simple_query-0.0.4.tar.gz
  • Upload date:
  • Size: 4.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.0

File hashes

Hashes for dremio_simple_query-0.0.4.tar.gz
Algorithm Hash digest
SHA256 60ea218193b4c58846b6316604c57a7b2cfcb73b18bcfb14fed2f5c94fa119a0
MD5 e3db5a1f9c3758c5b9d1b8c1eace5db0
BLAKE2b-256 4264c790d01bfc9ff49d8a8b0fd98d808d115ca297b5b50e7a0de3705f6160d1

See more details on using hashes here.

File details

Details for the file dremio_simple_query-0.0.4-py3-none-any.whl.

File metadata

File hashes

Hashes for dremio_simple_query-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 f18b447428508bafd15c430904ef828e298041f5899f45153a7b8c8bdd16dae6
MD5 15c1defbca643a405d53b5aa480ca20a
BLAKE2b-256 3295626540dfbb971d0c8f63eddc1a53e78444d1a48e8af8eb043d5888fd41c3

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page