Skip to main content

A library used to fetch data from deltalake tables locally.

Project description

DeltaAgent - Deltalake Agent

This library can used to fetch data from deltalake tables without the dependency on Spark clusters. It is developed based on the pandas, adlfs and Office365-REST-Python-Client libraries.

Use cases and benefits

To use the library, firstly we need to install it by

pip install DeltaAgent

It requires the datalake account_name and account_key for setting up the connection to a Gen2 Azure blob storage account.

from DeltaAgent import DeltaAgent

da = DeltaAgent(account_name="account_name", account_key="account_key")

With the established connection agent, by the method parse_log_as_df, we can then parse the paths of valid parquet files and their corresponding partition information from the system log files under the _delta_log folder. The result is returned in the format of a pandas DataFrame, with an additonal method fetch_data.

At this stage we can do the basic inspections and perform the parition based filtering operation by the normal DataFrame loc method.

df_log = da.parse_log_as_df(container_name='container_name', table_path='deltatable_name')

df_log_filtered = df_log.loc[df_log.partition=='partition_value']

By calling the fetch_data method on the above delta log DataFrame, we can fetch the actual data from a deltalake table. This may take some time if the data volumn if big.

df_delta = df_log_filtered.fetch_data()

Please note that the values for container_name and delta_table can be also assigned when setting up the agent connection, as below:

da = DeltaAgent(account_name="account_name", account_key="account_key", container_name='container_name', table_path='deltatable_name')

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

deltaagent-0.0.11.tar.gz (7.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

deltaagent-0.0.11-py3-none-any.whl (10.5 kB view details)

Uploaded Python 3

File details

Details for the file deltaagent-0.0.11.tar.gz.

File metadata

  • Download URL: deltaagent-0.0.11.tar.gz
  • Upload date:
  • Size: 7.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.5

File hashes

Hashes for deltaagent-0.0.11.tar.gz
Algorithm Hash digest
SHA256 860e7b40680086d1db48acddcccd48cfdb3293cdba438746d12815977bb78af3
MD5 c5f9e2c48a8aecfcfb12c79c6de329a5
BLAKE2b-256 4a33069609a32cace12d7fe7eaea2360a75e456f9efcf2c0169bd9a1b4a72177

See more details on using hashes here.

File details

Details for the file deltaagent-0.0.11-py3-none-any.whl.

File metadata

  • Download URL: deltaagent-0.0.11-py3-none-any.whl
  • Upload date:
  • Size: 10.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.5

File hashes

Hashes for deltaagent-0.0.11-py3-none-any.whl
Algorithm Hash digest
SHA256 f8ca76d784bbcf1244a1f343113b0a1aad2ea3e23455333a71a344740bea5876
MD5 8701262f1616f6d518a9b1b6d7543eb7
BLAKE2b-256 ae5273d59507ea7ca5dff75235c6f473cc0ad6a5c7dc6583110b14f180c6d988

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page