Azure Blob Storage Backend for Dask
Project description
Dask Azure Blob FileSystem
Azure Blob Storage Backend for Dask
Features
Supports dask when your data files are stored in the cloud.
Import DaskAzureBlobFileSystem
Use abfs:// as protocol prefix and you are good to do.
For authentication, please read more on Usage.
Support for key-value storage which is backed by azure storage. Create an instance of AzureBlobMap
Usage
Make the right imports:
from azureblobfs.dask import DaskAzureBlobFileSystem import dask.dataframe as dd
then put all data files in an azure storage container say clippy, then you can read it:
data = dd.read_csv("abfs://noaa/clippy/weather*.csv") max_by_state = data.groupby("states").max().compute()
you would need to set your azure account name in environment variable AZURE_BLOB_ACCOUNT_NAME (which in our above example is noaa) and the account key in AZURE_BLOB_ACCOUNT_KEY.
If you don’t want to use account key and instead want to use SAS, set it in the environment variable AZURE_BLOB_SAS_TOKEN along with the connection string in the environment variable AZURE_BLOB_CONNECTION_STRING.
Installation
Just:
pip install dask-azureblobfs
or get the development version if you love to live dangerously:
pip install git+https://github.com/manish/dask-azureblobfs@master#egg=dask-azureblobfs
Credits
This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.
History
0.1.0 (2018-11-18)
First release on PyPI.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for dask_azureblobfs-0.1.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d06b8ab74cd3771f0e429008fd04623246a94d025a2256fb0b31abc51317ecaf |
|
MD5 | 2b16c91c41027031a02db4710db1abd6 |
|
BLAKE2b-256 | e3713196bd5a61e7292da59e69b9b07dd4ee01e245a9327696177c1bdd0321f3 |