Azure Blob Storage Backend for Dask
Project description
Dask Azure Blob FileSystem
Azure Blob Storage Backend for Dask
Features
- Supports dask when your data files are stored in the cloud.
- Import DaskAzureBlobFileSystem
- Use abfs:// as protocol prefix and you are good to do.
- For authentication, please read more on Usage.
- Support for key-value storage which is backed by azure storage. Create an instance of AzureBlobMap
Usage
Make the right imports:
from azureblobfs.dask import DaskAzureBlobFileSystem import dask.dataframe as dd
then put all data files in an azure storage container say clippy, then you can read it:
data = dd.read_csv("abfs://noaa/clippy/weather*.csv") max_by_state = data.groupby("states").max().compute()
you would need to set your azure account name in environment variable AZURE_BLOB_ACCOUNT_NAME (which in our above example is noaa) and the account key in AZURE_BLOB_ACCOUNT_KEY.
If you don’t want to use account key and instead want to use SAS, set it in the environment variable AZURE_BLOB_SAS_TOKEN along with the connection string in the environment variable AZURE_BLOB_CONNECTION_STRING.
Installation
Just:
pip install dask-azureblobfs
or get the development version if you love to live dangerously:
pip install git+https://github.com/manish/dask-azureblobfs@master#egg=dask-azureblobfs
Credits
This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.
History
0.1.0 (2018-11-18)
- First release on PyPI.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for dask_azureblobfs-0.1.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d06b8ab74cd3771f0e429008fd04623246a94d025a2256fb0b31abc51317ecaf |
|
MD5 | 2b16c91c41027031a02db4710db1abd6 |
|
BLAKE2-256 | e3713196bd5a61e7292da59e69b9b07dd4ee01e245a9327696177c1bdd0321f3 |