Skip to main content

Functions to easily transform Azure blobs into pandas DataFrames and vice versa.

Project description

PyTest PyPI Latest Release Downloads

PandaBlob

Functions to easily transform Azure blobs into pandas DataFrames and vice versa.

Installation

Installing PandaBlob via pip is the preferred method, as it will always install the most recent stable release. If you do not have pip installed, this Python installation guide can guide you through the process.

To install PandaBlob, run this command in your terminal:

# Use pip to install PandaBlob
pip install pandablob

Downloading and installing PandaBlob from source is also possible, follow the code below.

# Download the package
git clone https://github.com/uijl/pandablob

# Go to the correct folder
cd pandablob

# Install package
pip install -e .

Usage

The code snip below shows how you can use PandaBlob, all you need is a BlobClient and possibly a pandas DataFrame or some keyword arguments for pandas.

# Import the Azure SDK and pandablob
import pandablob

from azure.storage.blob import ContainerClient

# Your Azure Credentials
account_url = "https://my_account_url.blob.core.windows.net/"
token = "your_key_string"
container = "your_container"
blobname = "your_blob_name.csv"

container_client = ContainerClient(account_url, container, credential=token)
blob_client = container_client.get_blob_client(blob=blobname)

# Specifiy your pandas keyword arguments
pandas_kwargs = {"index_col": 0}

# Read the blob as a pandas DataFrame
df = pandablob.blob_to_df(blob_client, pandas_kwargs)

Potential errors

There are three common errors that can be returned. Two are related to the blob storage and one because of the current limitations of pandablob.

  • ResourceExistsError - If the specified blob is already on the blob, this error is returned. There are two options, you can add the overwrite=True argument to your df_to_blob function or you can catch the exception. If you wish to enter it in an except statement, you can import it using from azure.core.exceptions import ResourceExistsError;
  • ResourceNotFoundError - If the specified blob is not found, this error is returned. If you wish to enter it in an except statement, you can import it using from azure.core.exceptions import ResourceNotFoundError;
  • TypeError - This error is returned by pandablob if you want to upload or download an extensiontype that is not yet supported. Currently only the following extensions are supported: .csv .json .txt, .xls and .xlsx.

To do list:

Some other stuff that needs to be done:

  • Include other files;
  • Easier downloading a .csv file;
  • Added MIT license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pandablob-0.0.5a.tar.gz (2.9 MB view details)

Uploaded Source

File details

Details for the file pandablob-0.0.5a.tar.gz.

File metadata

  • Download URL: pandablob-0.0.5a.tar.gz
  • Upload date:
  • Size: 2.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.8.10

File hashes

Hashes for pandablob-0.0.5a.tar.gz
Algorithm Hash digest
SHA256 f3bb4ecd2b236405117efe543bc57e8c90dadfa6f981186b662bc735199e6755
MD5 519a32cc56135c8d11da3e18b609253d
BLAKE2b-256 44b4f0652652709c410ce3039df278757f807d01897db47a1b3e1cb915634872

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page