Skip to main content

A native Python interface wrapping AzCopy for bulk data transfer to and from Azure Blob Storage.

Project description

Azpype

NOTE: This is still a very early stage project. Public interfaces and large parts of the implmentation are still subject to change.

Azpype is intended to primarily be an easy-to-use lightweight native Python interface to the already excellent AzCopy command line tool.

The secondary aim is for it to extend the functionality with some additional scaffolding and functionality such as

Python enhanced logging

-- INFO HERE --

Config driven defaults

-- INFO HERE --

Out-of-the-box and custom Validation Checks

-- INFO HERE --


Installation

Currently supports Windows, Mac (Apple Silicon and Intel)

📢 Important: For both convenience and the purpose of behaving as a python native library; installing azpype will additionally download the platform appropriate precompiled azcopy binary (v10.18.1) and store it under ~/.azpype/. This will be bundled in as part of the package distributuion and not as a separate installation script.

Install via pip

pip install azpype

Usage

Setup for Authentication

Currently azpype leverages application service principal based auth. Ensure that either the environment or the process make the following environement variables available:

  • AZCOPY_TENANT_ID
  • AZCOPY_SPA_APPLICATION_ID
  • AZCOPY_SPA_CLIENT_SECRET
  • AZCOPY_AUTO_LOGIN_TYPE

Setting and environment variable in python:

import os

#These are dummy values of course
os.environ["AZCOPY_TENANT_ID"] = "12d3fba3-efac-1234-a1b2-3f4cafbcb123"
os.environ["AZCOPY_SPA_APPLICATION_ID"] = "e1234c36-bc1e-4f23-ace7-cb088c04c123"
os.environ["AZCOPY_SPA_CLIENT_SECRET"] = "cAl1Q~2mdABUUSCD2KEZzaF150P0jXAqKs2ANdMS"
#This needs to be set so that interactive login is not needed
os.environ["AZCOPY_AUTO_LOGIN_TYPE"]= "SPN" #SPN=Service Principal

Setting environment variables in python via .env:

#pip install python-dotenv #if needed
import os
from dotenv import load_dotenv
load_dotenv('.env')


#This assumes you have an .env file in your working directory with an entry like:  
#AZCOPY_TENANT_ID="12d3fba3-efac-1234-a1b2-3f4cafbcb123"
tenant_id = os.getenv('AZCOPY_TENANT_ID')
#etc

OR Set environment variable via shell (MacOS & Linux)

export AZCOPY_TENANT_ID=""12d3fba3-efac-1234-a1b2-3f4cafbcb123"

OR Set environment variable via shell (Windows)

setx AZCOPY_TENANT_ID ""12d3fba3-efac-1234-a1b2-3f4cafbcb123"

Configuration

When pip installed a directory called ~/.azpype will be created, underneath it there will be a configuration file called copy_config.yaml. These are default key-values that are options/arguments to the Copy command. For example the yaml could have values like this:

# Overwrite the conflicting files and blobs at the destination if this flag is set to true.
# Possible values include 'true', 'false', 'prompt', and 'ifSourceNewer'.
# Default: 'true'
overwrite: 'ifSourceNewer'

# Create an MD5 hash of each file, and save the hash as the Content-MD5 property of the destination blob or file.
# Only available when uploading.
# Default: None
put-md5: NULL

This would translate to the passing the azcopy cli --put-md5 and --overwrite 'ifSourceNewer. These are passed to azpype as kwargs which are then appropriately parsed to construct the final command.

Copy

Perhaps the most important interface and the primary workhorse command.

Basic Usage

from azpype.commands.copy import Copy

#Syntax
#Copy('file-system-source','blob-storage-destination', **kwargs).execute()

azure_storage_account = "my_storage_account"
blob_container="my_container"
optional_container_path=""

destination = f"https://{azure_storage_account}.blob.core.windows.net/{blob_container}/{optional_container_path}"

source = "./test_payload"

Copy(source, destination).execute()

📝 Housekeeping TODOs

  • 📘 Add back in unittests for other modules
  • 📚 Update readme with better articulated out line of 'why'
  • 📖 Add Usage section
  • 📖 Add instructions on how to create the application service principal, grant it permissions and create the client secret.
  • ⏱️ Update readme with timed examples of Azpype/AzCopy along with azure-blob-storage synchronous and async
  • 📘 Add example notebooks

Authentication

Currently, Azpype only accepts authenticating via Application Service Principal set via the following Azcopy environment variables:

  • AZCOPY_TENANT_ID
  • AZCOPY_SPA_APPLICATION_ID
  • AZCOPY_SPA_CLIENT_SECRET
  • AZCOPY_AUTO_LOGIN_TYPE

These can be injected/overriden at runtime into the python process via

import os
os.environ["AZCOPY_TENANT_ID"] = <TenantID>
# ...

Please follow good practices when handling these environment variables and client credentials.

Going forward Azpype aims to use a default precedence order for authentication, starting with MSI, then SPA, then SAS. Ideally using, or following the pattern of DefaultAzureCredential().


🚧 In-Development: FS Monitor

I'd love to get some feedback on this feature but my thought is for azpype to be as simple as possible I may create an 'agent' mode for it which takes advantage of the watchdog package. Agent mode will allow Azpype to be deployed as a long-running background process, triggering actions based on file system events. For instance, poll every 5 minutes and run Copy() when a new file is detected. Then user code can do the appropriate stage clearing/archiving etc.

🚧 Status: Not yet in development


🧪 Benchmark Grid Search

Currently, Azcopy provides a useful benchmarking utility which helps determine optimal concurrency for a given network, machine (assuming default settings of auto tuning to cores), number of files and size per file.

The Benchmark grid search feature - will leverage this and create small grid search through various combinations of file count and file size, outputting plots/data to reflect the expected range of performance for Azcopy in that execution environment.

🚧 Status: Not yet in development

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

azpype-0.3.10-py2.py3-none-win_amd64.whl (37.8 kB view details)

Uploaded Python 2Python 3Windows x86-64

azpype-0.3.10-py2.py3-none-win32.whl (37.8 kB view details)

Uploaded Python 2Python 3Windows x86

azpype-0.3.10-py2.py3-none-macosx_11_0_arm64.whl (37.1 kB view details)

Uploaded Python 2Python 3macOS 11.0+ ARM64

azpype-0.3.10-py2.py3-none-macosx_10_9_x86_64.whl (37.1 kB view details)

Uploaded Python 2Python 3macOS 10.9+ x86-64

File details

Details for the file azpype-0.3.10-py2.py3-none-win_amd64.whl.

File metadata

  • Download URL: azpype-0.3.10-py2.py3-none-win_amd64.whl
  • Upload date:
  • Size: 37.8 kB
  • Tags: Python 2, Python 3, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.18

File hashes

Hashes for azpype-0.3.10-py2.py3-none-win_amd64.whl
Algorithm Hash digest
SHA256 21ee98450e74bfc626d0993fa786473c1952e7da839b16e78ca746d20b217364
MD5 851e2d0a737af9a098b1622483a1c4cd
BLAKE2b-256 03ec040c32386fceec07aea115e4b47f7fa4335628c387ec8c118a8075f45823

See more details on using hashes here.

File details

Details for the file azpype-0.3.10-py2.py3-none-win32.whl.

File metadata

  • Download URL: azpype-0.3.10-py2.py3-none-win32.whl
  • Upload date:
  • Size: 37.8 kB
  • Tags: Python 2, Python 3, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.18

File hashes

Hashes for azpype-0.3.10-py2.py3-none-win32.whl
Algorithm Hash digest
SHA256 749b3aade1e10d62cb75763b248c7ac9bebbbdd662d707d82db88793ec44361f
MD5 f4d4626906169713843029c91d9c387a
BLAKE2b-256 49962fd2d2abe034c912963b2aab84bf2fd2fed55a53afee90dee86a5372ff84

See more details on using hashes here.

File details

Details for the file azpype-0.3.10-py2.py3-none-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for azpype-0.3.10-py2.py3-none-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 651ff8dd2e4224b93f413de5573bc525c4d347f9463d1e0bce7b8c26f1dc1dae
MD5 2164186148216fb62531f2d085aaba8a
BLAKE2b-256 a508953f8aed4d3e96039710939e6e516db3db2ac42df2f6023b4f19ed7c8fa5

See more details on using hashes here.

File details

Details for the file azpype-0.3.10-py2.py3-none-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for azpype-0.3.10-py2.py3-none-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 01d1bff62da0b65f7a40179c5826d9cb1c6a6980fd0e4ffcfca69778f5d7c43f
MD5 c3b47c6edd9ebdaba65637618fe930f6
BLAKE2b-256 7019c06b89af59a2102fb39a035d6b34011968d9ebcb69a61a48b4f6289e11ce

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page