Skip to main content

Azure Blob Storage Backend for Dask

Project description

Dask Azure Blob FileSystem

Azure Blob Storage Backend for Dask

https://travis-ci.org/manish/dask-azureblobfs.svg?branch=master Documentation Status

Features

  • Supports dask when your data files are stored in the cloud.
    • Import DaskAzureBlobFileSystem
    • Use abfs:// as protocol prefix and you are good to do.
  • For authentication, please read more on Usage.
  • Support for key-value storage which is backed by azure storage. Create an instance of AzureBlobMap

Usage

Make the right imports:

from azureblobfs.dask import DaskAzureBlobFileSystem
import dask.dataframe as dd

then put all data files in an azure storage container say clippy, then you can read it:

data = dd.read_csv("abfs://noaa/clippy/weather*.csv")
max_by_state = data.groupby("states").max().compute()

you would need to set your azure account name in environment variable AZURE_BLOB_ACCOUNT_NAME (which in our above example is noaa) and the account key in AZURE_BLOB_ACCOUNT_KEY.

If you don’t want to use account key and instead want to use SAS, set it in the environment variable AZURE_BLOB_SAS_TOKEN along with the connection string in the environment variable AZURE_BLOB_CONNECTION_STRING.

Installation

Just:

pip install dask-azureblobfs

or get the development version if you love to live dangerously:

pip install git+https://github.com/manish/dask-azureblobfs@master#egg=dask-azureblobfs

Credits

This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.

History

0.1.0 (2018-11-18)

  • First release on PyPI.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dask-azureblobfs-0.1.0.tar.gz (4.6 MB view hashes)

Uploaded source

Built Distribution

dask_azureblobfs-0.1.0-py3-none-any.whl (9.3 kB view hashes)

Uploaded py3

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Huawei Huawei PSF Sponsor Microsoft Microsoft PSF Sponsor NVIDIA NVIDIA PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page