Skip to main content

Azure Blob Storage Backend for Dask

Project description

Dask Azure Blob FileSystem

Azure Blob Storage Backend for Dask

https://travis-ci.org/manish/dask-azureblobfs.svg?branch=master Documentation Status

Features

  • Supports dask when your data files are stored in the cloud.
    • Import DaskAzureBlobFileSystem
    • Use abfs:// as protocol prefix and you are good to do.
  • For authentication, please read more on Usage.
  • Support for key-value storage which is backed by azure storage. Create an instance of AzureBlobMap

Usage

Make the right imports:

from azureblobfs.dask import DaskAzureBlobFileSystem
import dask.dataframe as dd

then put all data files in an azure storage container say clippy, then you can read it:

data = dd.read_csv("abfs://noaa/clippy/weather*.csv")
max_by_state = data.groupby("states").max().compute()

you would need to set your azure account name in environment variable AZURE_BLOB_ACCOUNT_NAME (which in our above example is noaa) and the account key in AZURE_BLOB_ACCOUNT_KEY.

If you don’t want to use account key and instead want to use SAS, set it in the environment variable AZURE_BLOB_SAS_TOKEN along with the connection string in the environment variable AZURE_BLOB_CONNECTION_STRING.

Installation

Just:

pip install dask-azureblobfs

or get the development version if you love to live dangerously:

pip install git+https://github.com/manish/dask-azureblobfs@master#egg=dask-azureblobfs

Credits

This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.

History

0.1.0 (2018-11-18)

  • First release on PyPI.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for dask-azureblobfs, version 0.1.0
Filename, size File type Python version Upload date Hashes
Filename, size dask_azureblobfs-0.1.0-py3-none-any.whl (9.3 kB) File type Wheel Python version py3 Upload date Hashes View
Filename, size dask-azureblobfs-0.1.0.tar.gz (4.6 MB) File type Source Python version None Upload date Hashes View

Supported by

Pingdom Pingdom Monitoring Google Google Object Storage and Download Analytics Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page