Skip to main content

Download and upload files in batches from Azure Blob Storage Containers

Project description

logo

Downloads PyPi Open Source Code style: black

Azure Batch Load

High level Python wrapper for the Azure CLI to download or upload files in batches from or to Azure Blob Storage Containers. This project aims to be the missing functionality in the Python SDK of Azure Storage since there is no possibility to download or upload batches of files from or to containers. The only option in the Azure Storage Python SDK is downloading file by file, which takes a lot of time.

Besides doing loads in batches, since version 0.0.5 it's possible to set method to single which will use the Azure Python SDK to process files one by one.

Installation

pip install azurebatchload

See PyPi for package index.

Note: For batch uploads (method="batch") Azure CLI has to be installed and configured. Check if Azure CLI is installed through terminal:

az --version

Requirements

Azure Storage connection string has to be set as environment variable AZURE_STORAGE_CONNECTION_STRING or the seperate environment variables AZURE_STORAGE_KEY and AZURE_STORAGE_NAME which will be used to create the connection string.

Usage

Download

1. Using the standard environment variables

Azure-batch-load automatically checks for environment variables: AZURE_STORAGE_CONNECTION_STRING, AZURE_STORAGE_KEYand AZURE_STORAGE_ACCOUNT. So if the connection_string or storage_key + storage_account are set as environment variables, we can leave the argument connection_string, account_key and account_name empty:

from azurebatchload import Download

Download(
   destination='../pdfs',
   source='blobcontainername',
   extension='.pdf'
).download()

2. Using method="single"

We can make skip the usage of the Azure CLI and just make use Python SDK by setting the method="single":

from azurebatchload import Download

Download(
   destination='../pdfs',
   source='blobcontainername',
   extension='.pdf',
   method='single'
).download()

3. Download a specific folder from a container

We can download a folder by setting the folder argument. This works both for single and batch.

from azurebatchload import Download

Download(
   destination='../pdfs',
   source='blobcontainername',
   folder='uploads/invoices/',
   extension='.pdf',
   method='single'
).download()

4. Download a given list of files

We can give a list of files to download with the list_files argument. Note, this only works with method='single'.

from azurebatchload import Download

Download(
   destination='../pdfs',
   source='blobcontainername',
   folder='uploads/invoices/',
   list_files=["invoice1.pdf", "invoice2.pdf"],
   method='single'
).download()

Upload:

1. Using the standard environment variables

from azurebatchload import Upload

Upload(
   destination='blobcontainername',
   source='../pdf',
   extension='*.pdf'
).upload()

2. Using the method="single" method which does not require Azure CLI.

from azurebatchload import Upload

Upload(
   destination='blobcontainername',
   source='../pdf',
   extension='*.pdf',
   method="single"
).upload()

3. Upload a given list of files with the list_files argument.

from azurebatchload import Upload

Upload(
   destination='blobcontainername',
   source='../pdf',
   list_files=["invoice1.pdf", "invoice2.pdf"],
   method="single"
).upload()

List blobs

With the Utils.list_blobs method we can do advanced listing of blobs in a container or specific folder in a container. We have several argument we can use to define our scope of information:

  • name_starts_with: This can be used to filter files with certain prefix, or to select certain folders: name_starts_with=folder1/subfolder/lastfolder/
  • dataframe: Define if you want a pandas dataframe object returned for your information.
  • extended_info: Get just the blob names or more extended information like size, creation date, modified date.

1. List a whole container with just the filenames as a list.

from azurebatchload import Utils

list_blobs = Utils(container='containername').list_blobs()

2. List a whole container with just the filenames as a dataframe.

from azurebatchload import Utils

df_blobs = Utils(
   container='containername',
   dataframe=True
).list_blobs()

3. List a folder in a container.

from azurebatchload import Utils

list_blobs = Utils(
   container='containername',
   name_starts_with="foldername/"
).list_blobs()

4. Get extended information a folder.

from azurebatchload import Utils

dict_blobs = Utils(
   container='containername',
   name_starts_with="foldername/",
   extended_info=True
).list_blobs()

5. Get extended information a folder returned as a pandas dataframe.

from azurebatchload import Utils

df_blobs = Utils(
   container='containername',
   name_starts_with="foldername/",
   extended_info=True,
   dataframe=True
).list_blobs()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

azurebatchload-0.6.3.tar.gz (13.1 kB view details)

Uploaded Source

Built Distribution

azurebatchload-0.6.3-py3-none-any.whl (13.4 kB view details)

Uploaded Python 3

File details

Details for the file azurebatchload-0.6.3.tar.gz.

File metadata

  • Download URL: azurebatchload-0.6.3.tar.gz
  • Upload date:
  • Size: 13.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.3

File hashes

Hashes for azurebatchload-0.6.3.tar.gz
Algorithm Hash digest
SHA256 34ca26921bd6d1ad9694ef9028e698e10bad2f1b1833e1bca409d465bd1d12fb
MD5 e889867418c07f42f93116e0d5d33e05
BLAKE2b-256 19b7bf424b1c842a121adf3e5d39a85e45f5605c2ad7ea375c506ce199fbeeee

See more details on using hashes here.

File details

Details for the file azurebatchload-0.6.3-py3-none-any.whl.

File metadata

File hashes

Hashes for azurebatchload-0.6.3-py3-none-any.whl
Algorithm Hash digest
SHA256 824984ae14eef6f4e5cc0d2e82b99bc8e4a72bb46fad708516b7d834d1c300d7
MD5 c5b2892b6a722aa0d7219c72c15e8bd3
BLAKE2b-256 e1bff53cf491f1317eb8430a29a041cbc1021c2fd6cd34b4c0d336a36810eaab

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page