A library used to fetch data from deltalake tables locally.
Project description
DeltaAgent - Deltalake Agent
This library can used to fetch data from deltalake tables without the dependency on Spark clusters. It is developed based on the pandas, adlfs and Office365-REST-Python-Client libraries.
Use cases and benefits
To use the library, firstly we need to install it by
pip install DeltaAgent
It requires the datalake account_name and account_key for setting up the connection to a Gen2 Azure blob storage account.
from DeltaAgent import DeltaAgent
da = DeltaAgent(account_name="account_name", account_key="account_key")
With the established connection agent, by the method parse_log_as_df, we can then parse the paths of valid parquet files and their corresponding partition information from the system log files under the _delta_log folder. The result is returned in the format of a pandas DataFrame, with an additonal method fetch_data.
At this stage we can do the basic inspections and perform the parition based filtering operation by the normal DataFrame loc method.
df_log = da.parse_log_as_df(container_name='container_name', table_path='deltatable_name')
df_log_filtered = df_log.loc[df_log.partition=='partition_value']
By calling the fetch_data method on the above delta log DataFrame, we can fetch the actual data from a deltalake table. This may take some time if the data volumn if big.
df_delta = df_log_filtered.fetch_data()
Please note that the values for container_name and delta_table can be also assigned when setting up the agent connection, as below:
da = DeltaAgent(account_name="account_name", account_key="account_key", container_name='container_name', table_path='deltatable_name')
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file deltaagent-0.0.11.tar.gz.
File metadata
- Download URL: deltaagent-0.0.11.tar.gz
- Upload date:
- Size: 7.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
860e7b40680086d1db48acddcccd48cfdb3293cdba438746d12815977bb78af3
|
|
| MD5 |
c5f9e2c48a8aecfcfb12c79c6de329a5
|
|
| BLAKE2b-256 |
4a33069609a32cace12d7fe7eaea2360a75e456f9efcf2c0169bd9a1b4a72177
|
File details
Details for the file deltaagent-0.0.11-py3-none-any.whl.
File metadata
- Download URL: deltaagent-0.0.11-py3-none-any.whl
- Upload date:
- Size: 10.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f8ca76d784bbcf1244a1f343113b0a1aad2ea3e23455333a71a344740bea5876
|
|
| MD5 |
8701262f1616f6d518a9b1b6d7543eb7
|
|
| BLAKE2b-256 |
ae5273d59507ea7ca5dff75235c6f473cc0ad6a5c7dc6583110b14f180c6d988
|