azure-datalake-store

Azure Data Lake Store Filesystem Client Library for Python

These details have not been verified by PyPI

Project links

Homepage

Project description

Microsoft Azure Data Lake Store Filesystem Library for Python

https://travis-ci.org/Azure/azure-data-lake-store-python.svg?branch=dev

https://coveralls.io/repos/github/Azure/azure-data-lake-store-python/badge.svg?branch=master

This project is the Python filesystem library for Azure Data Lake Store.

INSTALLATION

To install from source instead of pip (for local testing and development):

> pip install -r dev_requirements.txt
> python setup.py develop

Usage: Sample Code

To play with the code, here is a starting point:

from azure.datalake.store import core, lib, multithread
token = lib.auth(tenant_id, username, password)
adl = core.AzureDLFileSystem(token, store_name=store_name)

# typical operations
adl.ls('')
adl.ls('tmp/', detail=True)
adl.ls('tmp/', detail=True, invalidate_cache=True)
adl.cat('littlefile')
adl.head('gdelt20150827.csv')

# file-like object
with adl.open('gdelt20150827.csv', blocksize=2**20) as f:
    print(f.readline())
    print(f.readline())
    print(f.readline())
    # could have passed f to any function requiring a file object:
    # pandas.read_csv(f)

with adl.open('anewfile', 'wb') as f:
    # data is written on flush/close, or when buffer is bigger than
    # blocksize
    f.write(b'important data')

adl.du('anewfile')

# recursively download the whole directory tree with 5 threads and
# 16MB chunks
multithread.ADLDownloader(adl, "", 'my_temp_dir', 5, 2**24)

Progress can be tracked using a callback function in the form track(current, total) When passed, this will keep track of transferred bytes and be called on each complete chunk.

Here’s an example using the Azure CLI progress controller as the progress_callback:

from cli.core.application import APPLICATION

def _update_progress(current, total):
    hook = APPLICATION.get_progress_controller(det=True)
    hook.add(message='Alive', value=current, total_val=total)
    if total == current:
        hook.end()

...
ADLUploader(client, destination_path, source_path, thread_count, overwrite=overwrite,
        chunksize=chunk_size,
        buffersize=buffer_size,
        blocksize=block_size,
        progress_callback=_update_progress)

This will output a progress bar to the stdout:

Alive[#########################                                       ]  40.0881%

Finished[#############################################################]  100.0000%

Usage: Command Line Sample

To interact with the API at a higher-level, you can use the provided command-line interface in “samples/cli.py”. You will need to set the appropriate environment variables

azure_username
azure_password
azure_data_lake_store_name
azure_subscription_id
azure_resource_group_name
azure_service_principal
azure_service_principal_secret

to connect to the Azure Data Lake Store. Optionally, you may need to define azure_tenant_id or azure_data_lake_store_url_suffix.

Below is a simple sample, with more details beyond.

python samples\cli.py ls -l

Execute the program without arguments to access documentation.

To start the CLI in interactive mode, run “python samples/cli.py” and then type “help” to see all available commands (similiar to Unix utilities):

> python samples/cli.py
azure> help

Documented commands (type help <topic>):
========================================
cat    chmod  close  du      get   help  ls     mv   quit  rmdir  touch
chgrp  chown  df     exists  head  info  mkdir  put  rm    tail

azure>

While still in interactive mode, you can run “ls -l” to list the entries in the home directory (“help ls” will show the command’s usage details). If you’re not familiar with the Unix/Linux “ls” command, the columns represent 1) permissions, 2) file owner, 3) file group, 4) file size, 5-7) file’s modification time, and 8) file name.

> python samples/cli.py
azure> ls -l
drwxrwx--- 0123abcd 0123abcd         0 Aug 02 12:44 azure1
-rwxrwx--- 0123abcd 0123abcd   1048576 Jul 25 18:33 abc.csv
-r-xr-xr-x 0123abcd 0123abcd        36 Jul 22 18:32 xyz.csv
drwxrwx--- 0123abcd 0123abcd         0 Aug 03 13:46 tmp
azure> ls -l --human-readable
drwxrwx--- 0123abcd 0123abcd   0B Aug 02 12:44 azure1
-rwxrwx--- 0123abcd 0123abcd   1M Jul 25 18:33 abc.csv
-r-xr-xr-x 0123abcd 0123abcd  36B Jul 22 18:32 xyz.csv
drwxrwx--- 0123abcd 0123abcd   0B Aug 03 13:46 tmp
azure>

To download a remote file, run “get remote-file [local-file]”. The second argument, “local-file”, is optional. If not provided, the local file will be named after the remote file minus the directory path.

> python samples/cli.py
azure> ls -l
drwxrwx--- 0123abcd 0123abcd         0 Aug 02 12:44 azure1
-rwxrwx--- 0123abcd 0123abcd   1048576 Jul 25 18:33 abc.csv
-r-xr-xr-x 0123abcd 0123abcd        36 Jul 22 18:32 xyz.csv
drwxrwx--- 0123abcd 0123abcd         0 Aug 03 13:46 tmp
azure> get xyz.csv
2016-08-04 18:57:48,603 - ADLFS - DEBUG - Creating empty file xyz.csv
2016-08-04 18:57:48,604 - ADLFS - DEBUG - Fetch: xyz.csv, 0-36
2016-08-04 18:57:49,726 - ADLFS - DEBUG - Downloaded to xyz.csv, byte offset 0
2016-08-04 18:57:49,734 - ADLFS - DEBUG - File downloaded (xyz.csv -> xyz.csv)
azure>

It is also possible to run in command-line mode, allowing any available command to be executed separately without remaining in the interpreter.

For example, listing the entries in the home directory:

> python samples/cli.py ls -l
drwxrwx--- 0123abcd 0123abcd         0 Aug 02 12:44 azure1
-rwxrwx--- 0123abcd 0123abcd   1048576 Jul 25 18:33 abc.csv
-r-xr-xr-x 0123abcd 0123abcd        36 Jul 22 18:32 xyz.csv
drwxrwx--- 0123abcd 0123abcd         0 Aug 03 13:46 tmp
>

Also, downloading a remote file:

> python samples/cli.py get xyz.csv
2016-08-04 18:57:48,603 - ADLFS - DEBUG - Creating empty file xyz.csv
2016-08-04 18:57:48,604 - ADLFS - DEBUG - Fetch: xyz.csv, 0-36
2016-08-04 18:57:49,726 - ADLFS - DEBUG - Downloaded to xyz.csv, byte offset 0
2016-08-04 18:57:49,734 - ADLFS - DEBUG - File downloaded (xyz.csv -> xyz.csv)
>

Tests

For detailed documentation about our test framework, please visit the tests folder.

Need Help?

Be sure to check out the Microsoft Azure Developer Forums on Stack Overflow if you have trouble with the provided code. Most questions are tagged azure and python.

Contribute Code or Provide Feedback

If you would like to become an active contributor to this project please follow the instructions provided in Microsoft Azure Projects Contribution Guidelines. Furthermore, check out GUIDANCE.md for specific information related to this project.

If you encounter any bugs with the library please file an issue in the Issues section of the project.

Code of Conduct

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.

Release History

1.0.1 (2025-05-16)

Remove concat operation from multi-part upload. Upload of large fies will now be done in single chunk.

1.0.0-alpha0 (2024-07-12)

Use generic azure token credential for auth instead of custom lib.auth
Remove older Python support

0.0.53 (2023-04-11)

Add MSAL support. Remove ADAL support
Suppress deprecation warning when detecting pyopenssl existence.

0.0.52 (2020-11-25)

Changed logging verbosity when closing a stream
Filter out default acl for files when using recursive acl operations

0.0.51 (2020-10-15)

Add more logging for zero byte reads to investigate root cause.

0.0.50 (2020-09-10)

Fix bug with retrying for ADAL exception parsing.

0.0.49 (2020-08-05)

Fix bug with NoRetryPolicy
Remove Python 3.4,5 in travis configuration.
Fix logging for unicode

0.0.48 (2019-10-18)

Buffer writes to prevent out of memory problems
Add Python 3.7 in travis configuration

0.0.47 (2019-08-14)

Remove logging of bearer token
Documentation related changes(Add readme.md and correct some formatting)

0.0.46 (2019-06-25)

Expose per request timeout. Default to 60.
Concat will not retry by default.
Bug fixes.

0.0.45 (2019-05-10)

Update open and close ADLFile semantics
Refactor code and improve performance for opening a file

0.0.44 (2019-03-05)

Add continuation token to LISTSTATUS api call
Update api-version to 2018-09-01

0.0.43 (2019-03-01)

Fix bug in downloader when glob returns a single file

0.0.42 (2019-02-26)

Update docstrings
Remove logging setlevel to DEBUG for recursive acl operations

0.0.41 (2019-01-31)

Remove GetContentSummary api call
Move check_token() under retry block
Expose timeout parameter for AdlDownloader and AdlUploader
Raise an exception instead of silently break for zero length reads

0.0.40 (2019-01-08)

Fix zero length read
Remove dependence on custom wheel and conform to PEP 420

0.0.39 (2018-11-14)

Fix for Chunked Decoding exception thrown while reading response.content

0.0.38 (2018-11-12)

Added support for recursive acl functions
Fixed bug due to missing filesessionid in get_chunk

0.0.37 (2018-11-02)

Reverted some changes introduced in 0.0.35 that didn’t work with other tokens

0.0.36 (2018-10-31)

Fixed typo in refresh_token call

0.0.35 (2018-10-29)

Added retry for getting tokens
Added requests>=2.20 because of CVE 2018-18074
Fixed test parameters and updated test recordings

0.0.34 (2018-10-15)

Fixed concat issue with plus(or other symbols) in filename
Added readinto method
Changed api-version to 2018-05-01 for all.

0.0.32 (2018-10-04)

Fixed test bug
Fixed empty folder upload bug
Fixed ADL Downloader block size bug

0.0.31 (2018-09-10)

Added support for batched ls

0.0.30 (2018-08-28)

Fixed .travis.yml order to add azure-nspg dependency

0.0.29 (2018-08-22)

Fixed HISTORY.rst and Pypi documentation

0.0.28 (2018-08-20)

Added recovery from DatalakeBadOffsetException

0.0.27 (2018-08-08)

Fixed bug in single file check
Added Python2 exception compatibility

0.0.26 (2018-08-03)

Fixed bug due to not importing errno
Fixed bug in os.makedirs race condition
Updated Readme with correct environment variables and fixed some links

0.0.25 (2018-07-26)

Fixed downloading of empty directories and download of directory structure with only a single file

0.0.24 (2018-07-16)

Retry policy implemented for all operations, default being Exponential Retry Policy

0.0.23 (2018-07-11)

Fixed the incorrect download location in case of UNC local paths

0.0.22 (2018-06-02)

Encoding filepaths in URI

0.0.21 (2018-06-01)

Remove unused msrest dependency

0.0.20 (2018-05-25)

Compatibility of the sdist with wheel 0.31.0

0.0.19 (2018-03-14)

Fixed upload issue where destination filename was wrong while upload of directory with single file #208

0.0.18 (2018-02-05)

Fixed read issue where whole file was cached while doing positional reads #198
Fixed readline as well for the same

0.0.17 (2017-09-21)

Fixed README.rst indentation error
Changed management endpoint

0.0.16 (2017-09-11)

Fixed Multi chunk transfer hangs as merging chunks fails #187
Added syncflag and leaseid in create, append calls.
Added filesessionid in create, append and open calls.

0.0.15 (2017-07-26)

Enable Data Lake Store progress controller callback #174
Fix File state incorrectly marked as “errored” if contains chunks is “pending” state #182
Fix Race condition due to transfer future done_callback #177

0.0.14 (2017-07-10)

Fix an issue where common prefixes in paths for upload and download were collapsed into only unique paths.

0.0.13 (2017-06-28)

Add support for automatic refreshing of service principal credentials

0.0.12 (2017-06-20)

Fix a regression with ls returning the top level folder if it has no contents. It now properly returns an empty array if a folder has no children.

0.0.11 (2017-06-02)

Update to name incomplete file downloads with a .inprogress suffix. This suffix is removed when the download completes successfully.

0.0.10 (2017-05-24)

Allow users to explicitly use or invalidate the internal, local cache of the filesystem that is built up from previous ls calls. It is now set to always call the service instead of the cache by default.
Update to properly create the wheel package during build to ensure all pip packages are available.
Update folder upload/download to properly throw early in the event that the destination files exist and overwrite was not specified. NOTE: target folder existence (or sub folder existence) does not automatically cause failure. Only leaf node existence will result in failure.
Fix a bug that caused file not found errors when attempting to get information about the root folder.

0.0.9 (2017-05-09)

Enforce basic SSL utilization to ensure performance due to GitHub issue 625 <https://github.com/pyca/pyopenssl/issues/625>

0.0.8 (2017-04-26)

Fix server-side throttling retry support. This is not a guarantee that if the server is throttling the upload (or download) it will eventually succeed, but there is now a back-off retry in place to make it more likely.

0.0.7 (2017-04-19)

Update the build process to more efficiently handle multi-part namespaces for pip.

0.0.6 (2017-03-15)

Fix an issue with path caching that should drastically improve performance for download

0.0.5 (2017-03-01)

Fix for downloader to ensure there is access to the source path before creating destination files
Fix for credential objects to inherit from msrest.authentication for more universal authentication support
Add support for the following:
- set_expiry: allows for setting expiration on files
- ACL management:
  - set_acl: allows for the full replacement of an ACL on a file or folder
  - set_acl_entries: allows for “patching” an existing ACL on a file or folder
  - get_acl_status: retrieves the ACL information for a file or folder
  - remove_acl_entries: removes the specified entries from an ACL on a file or folder
  - remove_acl: removes all non-default ACL entries from a file or folder
  - remove_default_acl: removes all default ACL entries from a folder
Remove unsupported and unused “TRUNCATE” operation.
Added API-Version support with a default of the latest api version (2016-11-01)

0.0.4 (2017-02-07)

Fix for folder upload to properly delete folders with contents when overwrite specified.
Fix to set verbose output to False/Off by default. This removes progress tracking output by default but drastically improves performance.

0.0.3 (2017-02-02)

Fix to setup.py to include the HISTORY.rst file. No other changes

0.0.2 (2017-01-30)

Addresses an issue with lib.auth() not properly defaulting to 2FA
Fixes an issue with Overwrite for ADLUploader sometimes not being honored.
Fixes an issue with empty files not properly being uploaded and resulting in a hang in progress tracking.
Addition of a samples directory showcasing examples of how to use the client and upload and download logic.
General cleanup of documentation and comments.
This is still based on API version 2016-11-01

0.0.1 (2016-11-21)

Initial preview release. Based on API version 2016-11-01.
Includes initial ADLS filesystem functionality and extended upload and download support.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

1.0.1

Jun 3, 2025

1.0.0a0 pre-release

Aug 1, 2024

0.0.53

May 10, 2023

0.0.52

Apr 2, 2021

0.0.51

Oct 15, 2020

0.0.50

Sep 14, 2020

0.0.49

Aug 6, 2020

0.0.48

Oct 24, 2019

0.0.47

Aug 15, 2019

0.0.46

Jun 27, 2019

0.0.45

May 10, 2019

0.0.44

Mar 6, 2019

0.0.43

Mar 2, 2019

0.0.42

Feb 28, 2019

0.0.41

Feb 4, 2019

0.0.40

Jan 9, 2019

0.0.39

Nov 15, 2018

0.0.38

Nov 13, 2018

0.0.37

Nov 5, 2018

0.0.36

Nov 1, 2018

0.0.35

Oct 31, 2018

0.0.34

Oct 16, 2018

0.0.32

Oct 4, 2018

0.0.31

Sep 11, 2018

0.0.30

Aug 29, 2018

0.0.29

Aug 22, 2018

0.0.28

Aug 22, 2018

0.0.27

Aug 9, 2018

0.0.26

Aug 6, 2018

0.0.25

Jul 28, 2018

0.0.24

Jul 16, 2018

0.0.23

Jul 16, 2018

0.0.22

Jun 5, 2018

0.0.19

Mar 15, 2018

0.0.18

Feb 12, 2018

0.0.17

Sep 22, 2017

0.0.16

Sep 11, 2017

0.0.15

Jul 27, 2017

0.0.14

Jul 11, 2017

0.0.13

Jul 6, 2017

0.0.12

Jun 20, 2017

0.0.11

Jun 7, 2017

0.0.10

May 31, 2017

0.0.9

May 11, 2017

0.0.8

May 1, 2017

0.0.7

Apr 20, 2017

0.0.6

Mar 22, 2017

0.0.5

Mar 3, 2017

0.0.4

Feb 8, 2017

0.0.3

Feb 2, 2017

0.0.2

Feb 1, 2017

0.0.1

Nov 21, 2016

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

azure_datalake_store-1.0.1.tar.gz (69.5 kB view details)

Uploaded Jun 11, 2025 Source

Built Distribution

azure_datalake_store-1.0.1-py2.py3-none-any.whl (53.1 kB view details)

Uploaded Jun 3, 2025 Python 2Python 3

File details

Details for the file azure_datalake_store-1.0.1.tar.gz.

File metadata

Download URL: azure_datalake_store-1.0.1.tar.gz
Upload date: Jun 11, 2025
Size: 69.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.10

File hashes

Hashes for azure_datalake_store-1.0.1.tar.gz
Algorithm	Hash digest
SHA256	`5364d4445aab154a1c7cb10215629c3ce46ce5c7aaaf16071890c03fae53a035`
MD5	`1ff487f5bb81cbf837e6c965e8c6e687`
BLAKE2b-256	`745841042543710a3a0be3bd1b7851c790a3087cdbf4c8eb14efcd7a0a910ea7`

See more details on using hashes here.

File details

Details for the file azure_datalake_store-1.0.1-py2.py3-none-any.whl.

File metadata

Download URL: azure_datalake_store-1.0.1-py2.py3-none-any.whl
Upload date: Jun 3, 2025
Size: 53.1 kB
Tags: Python 2, Python 3
Uploaded using Trusted Publishing? No
Uploaded via: RestSharp/106.13.0.0

File hashes

Hashes for azure_datalake_store-1.0.1-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`3772a2a247aaf9f5fa4b0f2cc0a0225072960cc245cfc8130588babb2b9fe705`
MD5	`d2788330e42f6504983be31ea59e0f61`
BLAKE2b-256	`75bd9cc9f114dbf90717dac49f1f7365156a9a005ef7016df2d4eb28d6442b90`

See more details on using hashes here.

azure-datalake-store 1.0.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Microsoft Azure Data Lake Store Filesystem Library for Python

INSTALLATION

Usage: Sample Code

Usage: Command Line Sample

Tests

Need Help?

Contribute Code or Provide Feedback

Code of Conduct

Release History

1.0.1 (2025-05-16)

1.0.0-alpha0 (2024-07-12)

0.0.53 (2023-04-11)

0.0.52 (2020-11-25)

0.0.51 (2020-10-15)

0.0.50 (2020-09-10)

0.0.49 (2020-08-05)

0.0.48 (2019-10-18)

0.0.47 (2019-08-14)

0.0.46 (2019-06-25)

0.0.45 (2019-05-10)

0.0.44 (2019-03-05)

0.0.43 (2019-03-01)

0.0.42 (2019-02-26)

0.0.41 (2019-01-31)

0.0.40 (2019-01-08)

0.0.39 (2018-11-14)

0.0.38 (2018-11-12)

0.0.37 (2018-11-02)

0.0.36 (2018-10-31)

0.0.35 (2018-10-29)

0.0.34 (2018-10-15)

0.0.32 (2018-10-04)

0.0.31 (2018-09-10)

0.0.30 (2018-08-28)

0.0.29 (2018-08-22)

0.0.28 (2018-08-20)

0.0.27 (2018-08-08)

0.0.26 (2018-08-03)

0.0.25 (2018-07-26)

0.0.24 (2018-07-16)

0.0.23 (2018-07-11)

0.0.22 (2018-06-02)

0.0.21 (2018-06-01)

0.0.20 (2018-05-25)

0.0.19 (2018-03-14)

0.0.18 (2018-02-05)

0.0.17 (2017-09-21)

0.0.16 (2017-09-11)

0.0.15 (2017-07-26)

0.0.14 (2017-07-10)

0.0.13 (2017-06-28)

0.0.12 (2017-06-20)

0.0.11 (2017-06-02)

0.0.10 (2017-05-24)

0.0.9 (2017-05-09)

0.0.8 (2017-04-26)

0.0.7 (2017-04-19)

0.0.6 (2017-03-15)

0.0.5 (2017-03-01)

0.0.4 (2017-02-07)

0.0.3 (2017-02-02)

0.0.2 (2017-01-30)

0.0.1 (2016-11-21)

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed