A Dremio SDK for interacting with one or more Dremio instances
Project description
pydremio
Introduction
pydremio is a Python API wrapper for interacting with Dremio.
It allows you to perform operations on datasets and metadata within Dremio via either the HTTP API or Arrow Flight.
Since Arrow Flight offers significantly better performance, it is the recommended method for data operations.
This repository includes the core library, unit tests, and example code to help you get started.
The wrapper is distributed as a Python wheel (.whl) and can be found in the Releases section.
Published to PyPI.
Installation
You need Python 3.13 or higher.
Option 1: Install via pip
pip install pydremio
Option 2a: Install via pip from GitHub
pip install --upgrade --force-reinstall https://github.com/continental/pydremio/releases/download/v0.3.2/dremio-0.3.2-py3-none-any.whl
If you are behind a corporate firewall and you need a workaround (NOT recommended for use in production!):
pip install --upgrade --force-reinstall \
--trusted-host pypi.org \
--trusted-host files.pythonhosted.org \
--trusted-host github.com \
--trusted-host objects.githubusercontent.com \
--cert False \
https://github.com/continental/pydremio/releases/download/v0.3.2/dremio-0.3.2-py3-none-any.whl
Install a specific version
pip install https://github.com/continental/pydremio/releases/download/<version>/dremio-<version>-py3-none-any.whl
Option 2b: Use requirements.txt
python-dotenv == 1.0.1
https://github.com/continental/pydremio/releases/latest/download/dremio-latest-py3-none-any.whl
Getting Started
Logging in
The simplest way to create a logged-in client instance:
from dremio import Dremio
dremio = Dremio(<hostname>, username=<username>, password=<password>)
Replace the placeholders or, preferably, use environment variables (via a .env file) to avoid storing credentials in code.
Example .env file:
DREMIO_USERNAME="your_username@example.com"
DREMIO_PASSWORD="xyz-your-password-or-pat-xyz"
DREMIO_HOSTNAME="https://your.dremio.host.cloud"
You can then use the convenience method:
from dremio import Dremio
from dotenv import load_dotenv
load_dotenv()
dremio = Dremio.from_env()
By default pydremio assumes no TLS encryption. If you have set up TLS please use:
from dremio import Dremio
from dotenv import load_dotenv
load_dotenv()
dremio = Dremio.from_env()
dremio.flight_config.tls = True
or set it up in your .env-file:
DREMIO_FLIGHT_TLS=TRUE
More information here: Dremio authentication
Examples
- By default, the queries are run with Arrow Flight.
- The reason behind is that http-queries generate a lot of temporary cache. This cache is stored for longer time and for each query again. This may cause high storage-costs if you query big tables!
- For small datasets this may not a good trade-off in duration. Try
run(method='http')instead.
Load a dataset
from dremio import Dremio
dremio = Dremio.from_env()
ds = dremio.get_dataset("path.to.vds")
polars_df = ds.run().to_polars()
pandas_df = ds.run().to_pandas()
Create a folder
from dremio import Dremio, NewFolder
folder = dremio.create_folder("path.to.folder")
Create a folder with access control
from dremio import Dremio, NewFolder, AccessControlList, AccessControl
folder = dremio.create_folder("path.to.folder")
user_id = dremio.get_user_by_name('<user_name>')
folder.set_access_for_user(user_id, ['SELECT'])
Methods
All models are located in the models/ directory.
Below is an overview of available methods grouped by category.
🔐 Connection
login(username: str, password: str) -> strauth(auth: str = None, token: str = None) -> Dremio
📚 Catalog
Retrieval
get_catalog_by_id(id: UUID) -> CatalogObjectget_catalog_by_path(path: list[str]) -> CatalogObject- Accepts both list format (
["space", "dataset"]) and string format ("space/dataset")
- Accepts both list format (
Creation
create_catalog_item(item: NewCatalogObject | dict) -> CatalogObject
Updating
update_catalog_item(id: UUID | item: NewCatalogObject | dict) -> CatalogObjectupdate_catalog_item_by_path(path: list[str], item: NewCatalogObject | dict) -> CatalogObject
Deletion
delete_catalog_item(id: UUID) -> bool- Returns
Trueif successful
- Returns
Copying
copy_catalog_item_by_path(path: list[str], new_path: list[str]) -> CatalogObject
Refreshing
refresh_catalog(id: UUID) -> CatalogObject
Exploration
get_catalog_tree(id: str = None, path: str | list[str] = None)- ⚠️ Expensive operation, intended for exploration and mapping only
📊 Dataset
get_dataset(path: list[str] | str | None = None, *, id: UUID | None = None) -> Datasetcreate_dataset(path: list[str] | str, sql: str | SQLRequest, type: Literal['PHYSICAL_DATASET', 'VIRTUAL_DATASET'] = 'VIRTUAL_DATASET') -> Datasetdelete_dataset(path: list[str] | str) -> boolcopy_dataset(source_path: list[str] | str, target_path: list[str] | str) -> Datasetreference_dataset(source_path: list[str] | str, target_path: list[str] | str) -> Dataset
🗂️ Folder
get_folder(path: list[str] | str | None = None, *, id: UUID | None = None) -> Foldercreate_folder(path: str | list[str]) -> Folderdelete_folder(path: str | list[str], recursive: bool = True) -> boolcopy_folder(source_path: list[str] | str, target_path: list[str] | str, *, assume_privileges: bool = True, relative_references: bool = False) -> Folderreference_folder(source_path: list[str] | str, target_path: list[str] | str, *, assume_privileges: bool = True) -> Folder
🤝 Collaboration
Wiki and tags are associated by the ID of the collection item.
The tags object contains an array of tags.
get_wiki(id: UUID) -> Wikiset_wiki(id: UUID, wiki: Wiki) -> Wikiget_tags(id: str) -> Tagsset_tags(id: str, tags: Tags) -> Tags
🧠 SQL
sql(sql_request: SQLRequest) -> JobIdstart_job_on_dataset(id: UUID) -> JobIdget_job_info(id: UUID) -> Jobcancel_job(id: UUID) -> Jobget_job_results(id: UUID) -> JobResultsql_results(sql_request: SQLRequest) -> Job | JobResult
👤 User
get_users() -> list[User]get_user(id: UUID) -> Userget_user_by_name(name: str) -> Usercreate_user(user: User) -> Userupdate_user(id: UUID, user: User) -> Userdelete_user(id: UUID, tag: str) -> bool- Returns
Trueif deletion was successful
- Returns
Roadmap
- Publish to PyPI
- CLI support
Contributing
Contributions are welcome! Please open issues or pull requests for features, bugs, or improvements.
License
This project is licensed under the BSD License. See the LICENSE file for details.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pydremio-0.3.2.tar.gz.
File metadata
- Download URL: pydremio-0.3.2.tar.gz
- Upload date:
- Size: 54.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ca9c13de9f48b4d959da5b0babaee964366602883fb6b0dfb6f1a8cad1e71d5a
|
|
| MD5 |
693dd07dcbd9576d0f5e373be723deab
|
|
| BLAKE2b-256 |
e2d9682cdef5c5629711f92213ece5849e76ac9c7f245ffd45ea77ecab756f47
|
File details
Details for the file pydremio-0.3.2-py3-none-any.whl.
File metadata
- Download URL: pydremio-0.3.2-py3-none-any.whl
- Upload date:
- Size: 50.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e128ec07bf3f2ea009457b6f3489ef9f1363c119c619f35ec6377b7af27a6779
|
|
| MD5 |
8b6f0adcd60629ebfc1e93850f80fc68
|
|
| BLAKE2b-256 |
320556bd836420e8f78e576833da01e898e8dbab7ceafc2a402c0cfec6c3108f
|