Skip to main content

Azure Blob Storage Backend for Dask

Project description

Dask Azure Blob FileSystem

Azure Blob Storage Backend for Dask

https://travis-ci.org/manish/dask-azureblobfs.svg?branch=master Documentation Status

Features

  • Supports dask when your data files are stored in the cloud.

    • Import DaskAzureBlobFileSystem

    • Use abfs:// as protocol prefix and you are good to do.

  • For authentication, please read more on Usage.

  • Support for key-value storage which is backed by azure storage. Create an instance of AzureBlobMap

Usage

Make the right imports:

from azureblobfs.dask import DaskAzureBlobFileSystem
import dask.dataframe as dd

then put all data files in an azure storage container say clippy, then you can read it:

data = dd.read_csv("abfs://noaa/clippy/weather*.csv")
max_by_state = data.groupby("states").max().compute()

you would need to set your azure account name in environment variable AZURE_BLOB_ACCOUNT_NAME (which in our above example is noaa) and the account key in AZURE_BLOB_ACCOUNT_KEY.

If you don’t want to use account key and instead want to use SAS, set it in the environment variable AZURE_BLOB_SAS_TOKEN along with the connection string in the environment variable AZURE_BLOB_CONNECTION_STRING.

Installation

Just:

pip install dask-azureblobfs

or get the development version if you love to live dangerously:

pip install git+https://github.com/manish/dask-azureblobfs@master#egg=dask-azureblobfs

Credits

This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.

History

0.1.0 (2018-11-18)

  • First release on PyPI.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dask-azureblobfs-0.1.0.tar.gz (4.6 MB view details)

Uploaded Source

Built Distribution

dask_azureblobfs-0.1.0-py3-none-any.whl (9.3 kB view details)

Uploaded Python 3

File details

Details for the file dask-azureblobfs-0.1.0.tar.gz.

File metadata

  • Download URL: dask-azureblobfs-0.1.0.tar.gz
  • Upload date:
  • Size: 4.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.20.1 setuptools/40.6.2 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.6.7

File hashes

Hashes for dask-azureblobfs-0.1.0.tar.gz
Algorithm Hash digest
SHA256 4214ab2eb70d7efe7f2b9b32a7b6b395d03092e245e13ad8e764472ff5cf7181
MD5 cebfec2784d58ed306cf0c46648b6f28
BLAKE2b-256 ae69ac55acde2ad9902764a7f04175abf02c01ad6b1feef2494f26ea53c1839d

See more details on using hashes here.

File details

Details for the file dask_azureblobfs-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: dask_azureblobfs-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 9.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.20.1 setuptools/40.6.2 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.6.7

File hashes

Hashes for dask_azureblobfs-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d06b8ab74cd3771f0e429008fd04623246a94d025a2256fb0b31abc51317ecaf
MD5 2b16c91c41027031a02db4710db1abd6
BLAKE2b-256 e3713196bd5a61e7292da59e69b9b07dd4ee01e245a9327696177c1bdd0321f3

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page