Skip to main content

Access Azure Blobs and Data Lake Storage (ADLS) Gen2 with fsspec and dask

Project description

Filesystem interface to Azure Blob and Data Lake Storage (Gen2)

PyPI version shields.io Latest conda-forge version API Reference

Quickstart

This package can be installed using:

pip install adlfs

or

conda install -c conda-forge adlfs

The az:// and abfs:// protocols are included in fsspec's known_implementations registry.

To connect to Azure Blob Storage or Azure Data Lake Storage (ADLS) Gen2 filesystem you can use the protocol abfs or az:

import dask.dataframe as dd

storage_options={'account_name': ACCOUNT_NAME, 'account_key': ACCOUNT_KEY}

ddf = dd.read_csv('abfs://{CONTAINER}/{FOLDER}/*.csv', storage_options=storage_options)
ddf = dd.read_parquet('az://{CONTAINER}/folder.parquet', storage_options=storage_options)

Accepted protocol / uri formats include:
'PROTOCOL://container/path-part/file'
'PROTOCOL://container@account.blob.core.windows.net/path-part/file'
'PROTOCOL://container@account.dfs.core.windows.net/path-part/file'

or optionally, if AZURE_STORAGE_ACCOUNT_NAME and an AZURE_STORAGE_<CREDENTIAL> is 
set as an environmental variable, then storage_options will be read from the environmental
variables

To read from a public storage blob you are required to specify the 'account_name'. For example, you can access NYC Taxi & Limousine Commission as:

storage_options = {'account_name': 'azureopendatastorage'}
ddf = dd.read_parquet('az://nyctlc/green/puYear=2019/puMonth=*/*.parquet', storage_options=storage_options)

Details

The package includes pythonic filesystem implementations for both Azure Blobs and Azure Datalake Gen2 (ADLS), that facilitate interactions between these implementations and Dask. This is done leveraging the fsspec/filesystem_spec base class and Azure Python SDKs.

Operations against Azure Blobs and ADLS Gen2 are implemented by leveraging Azure Blob Storage Python SDK.

Setting credentials

If no credentials/configuration is provided, DefaultAzureCredential will be used for authentication. If you want to use alternative credentials, storage_options can be instantiated with a variety of keyword arguments:

  • connection_string
  • account_name
  • account_key
  • sas_token
  • tenant_id, client_id, and client_secret are combined for an Azure ServicePrincipal e.g. storage_options={'account_name': ACCOUNT_NAME, 'tenant_id': TENANT_ID, 'client_id': CLIENT_ID, 'client_secret': CLIENT_SECRET}
  • anon: bool, optional. Set to True to use anonymous authentication. If not set, the AZURE_STORAGE_ANON environment variable will be checked before defaulting to False where credentials are discovered on the system.
  • location_mode: valid values are "primary" or "secondary" and apply to RA-GRS accounts

For more argument details see all arguments for AzureBlobFileSystem here

The following environmental variables can also be set and picked up for authentication:

  • "AZURE_STORAGE_CONNECTION_STRING"
  • "AZURE_STORAGE_ACCOUNT_NAME"
  • "AZURE_STORAGE_ACCOUNT_KEY"
  • "AZURE_STORAGE_SAS_TOKEN"
  • "AZURE_STORAGE_TENANT_ID"
  • "AZURE_STORAGE_CLIENT_ID"
  • "AZURE_STORAGE_CLIENT_SECRET"

The filesystem can be instantiated for different use cases based on a variety of storage_options combinations. The following list describes some common use cases utilizing AzureBlobFileSystem, i.e. protocols abfsor az. Note that all cases require the account_name argument to be provided:

  1. Auto credential solving using Azure's DefaultAzureCredential() library: storage_options={'account_name': ACCOUNT_NAME} will use DefaultAzureCredential to get valid credentials to the container ACCOUNT_NAME. DefaultAzureCredential attempts to authenticate via the mechanisms and order visualized here.
  2. Anonymous connection to public container: storage_options={'account_name': ACCOUNT_NAME, 'anon': True} will assume the ACCOUNT_NAME points to a public container, and attempt to use an anonymous login. Note, the default value for anon is False.
  3. Azure ServicePrincipal: tenant_id, client_id, and client_secret are all used as credentials for an Azure ServicePrincipal: e.g. storage_options={'account_name': ACCOUNT_NAME, 'tenant_id': TENANT_ID, 'client_id': CLIENT_ID, 'client_secret': CLIENT_SECRET}.

Append Blob

The AzureBlobFileSystem accepts all of the Async BlobServiceClient arguments.

By default, write operations create BlockBlobs in Azure, which, once written can not be appended. It is possible to create an AppendBlob using mode="ab" when creating and operating on blobs. Currently, AppendBlobs are not available if hierarchical namespaces are enabled.

Older versions

ADLS Gen1 filesystem has officially been retired. Hence the adl:// method, which was designed to connect to ADLS Gen1 is obsolete.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

adlfs-2026.4.0.tar.gz (54.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

adlfs-2026.4.0-py3-none-any.whl (46.2 kB view details)

Uploaded Python 3

File details

Details for the file adlfs-2026.4.0.tar.gz.

File metadata

  • Download URL: adlfs-2026.4.0.tar.gz
  • Upload date:
  • Size: 54.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for adlfs-2026.4.0.tar.gz
Algorithm Hash digest
SHA256 84c6f0fc28403629ef6d6d90f0d1a35a3302179f65ce4686c939a42ad0496d8d
MD5 427629b201cec5a69a284ad41f7f2ead
BLAKE2b-256 ca51c766cea8a00f84f224aa672ea0f20e4f24091eb90ce56104accd003c7405

See more details on using hashes here.

File details

Details for the file adlfs-2026.4.0-py3-none-any.whl.

File metadata

  • Download URL: adlfs-2026.4.0-py3-none-any.whl
  • Upload date:
  • Size: 46.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for adlfs-2026.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a12a420583ae2d86d1b02902db147b7da5cf3e6eef40ee53024756a781d5d84f
MD5 2d43edcdc4aea9ec35ca632e71e26f10
BLAKE2b-256 71456e6061498d2cd7fecfcaa6e17c2057bfcac3578c656b9646c343c7021c6e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page