Skip to main content

A native Python interface wrapping AzCopy for bulk data transfer to and from Azure Blob Storage.

Project description

Azpype 🚀 [beta]

A Python wrapper for AzCopy that feels native and gets out of your way.

Why Azpype?

Performance: AzCopy, written in Go, significantly outperforms Python's Azure SDK for bulk transfers. Go's goroutines provide true parallelism for file I/O and network operations, while Python's GIL limits concurrency. For large-scale transfers, AzCopy can be 5-10x faster.

Python Integration: But switching between Python and bash scripts breaks your workflow. Azpype solves this by wrapping AzCopy in a native Python interface. Now you can:

  • Write pure Python scripts with data processing before and after transfers
  • Capture and parse output programmatically
  • Handle errors with try/except blocks
  • Integrate with your existing Python data pipeline

Additional Benefits:

  • Zero-configuration setup - Bundles the right AzCopy binary for your platform
  • Smart defaults - YAML config for common settings, override with kwargs when needed
  • Rich logging - Structured logs with loguru, daily rotation, and visual command output
  • Built-in validation - Checks auth, network, and paths before executing
  • Job management - List, resume, and recover failed transfers programmatically

Installation

pip install azpype

That's it. Azpype automatically:

  • Downloads the appropriate AzCopy binary (v10.18.1) for your platform
  • Creates a config directory at ~/.azpype/
  • Sets up a default configuration file

Quick Start

Basic Copy Operation

from azpype.commands.copy import Copy

# Upload a local directory to Azure Blob Storage
Copy(
    source="./data",
    destination="https://myaccount.blob.core.windows.net/mycontainer/"
).execute()

# Download from Azure to local
Copy(
    source="https://myaccount.blob.core.windows.net/mycontainer/data/",
    destination="./downloads"
).execute()

Working with Return Values

The execute() method returns an AzCopyStdoutParser object with parsed attributes - no manual string parsing needed!

# Execute returns a parsed object with useful attributes
result = Copy(
    source="./data",
    destination="https://myaccount.blob.core.windows.net/mycontainer/"
).execute()

# Access structured data directly
print(f"Job ID: {result.job_id}")
print(f"Files transferred: {result.number_of_file_transfers_completed}")
print(f"Files skipped: {result.number_of_file_transfers_skipped}")
print(f"Bytes transferred: {result.total_bytes_transferred}")
print(f"Elapsed time: {result.elapsed_time} minutes")
print(f"Final status: {result.final_job_status}")

# Use exit code for flow control
if result.exit_code == 0:
    print("Transfer successful!")
else:
    print(f"Transfer failed: {result.stdout}")

Available Attributes

The parser automatically extracts these attributes from AzCopy output:

Attribute Type Description
exit_code int Command exit code (0 = success)
job_id str Unique job identifier for resuming
elapsed_time float Transfer duration in minutes
final_job_status str Status like "Completed", "CompletedWithSkipped", "Failed"
number_of_file_transfers int Total files attempted
number_of_file_transfers_completed int Successfully transferred files
number_of_file_transfers_skipped int Files skipped (already exist, etc.)
number_of_file_transfers_failed int Failed file transfers
total_bytes_transferred int Total data transferred in bytes
total_number_of_transfers int Total transfer operations
stdout str Raw command output if needed
raw_stdout str Unprocessed output with ANSI codes

Real-World Example: Pipeline Integration

def smart_sync_with_monitoring(local_path, remote_path):
    """
    Sync data and monitor transfer metrics
    """
    result = Copy(
        source=local_path,
        destination=remote_path,
        overwrite="ifSourceNewer",
        recursive=True
    ).execute()
    
    # Make decisions based on parsed results
    if result.exit_code != 0:
        raise Exception(f"Transfer failed: {result.final_job_status}")
    
    if result.number_of_file_transfers_failed > 0:
        print(f"Warning: {result.number_of_file_transfers_failed} files failed")
        # Could trigger retry logic here
    
    if result.number_of_file_transfers_skipped == result.number_of_file_transfers:
        print("All files already up-to-date")
        return "NO_CHANGES"
    
    # Report transfer metrics
    gb_transferred = result.total_bytes_transferred / (1024**3)
    transfer_rate = gb_transferred / (result.elapsed_time / 60)  # GB/hour
    
    print(f"Transferred {gb_transferred:.2f} GB at {transfer_rate:.2f} GB/hour")
    print(f"Completed: {result.number_of_file_transfers_completed} files")
    
    return result.job_id  # Return for potential resume operations

Authentication

Service Principal (Recommended)

Set these environment variables:

import os

os.environ["AZCOPY_TENANT_ID"] = "your-tenant-id"
os.environ["AZCOPY_SPA_APPLICATION_ID"] = "your-app-id"  
os.environ["AZCOPY_SPA_CLIENT_SECRET"] = "your-secret"
os.environ["AZCOPY_AUTO_LOGIN_TYPE"] = "SPN"

Or use a .env file:

# .env
AZCOPY_TENANT_ID=your-tenant-id
AZCOPY_SPA_APPLICATION_ID=your-app-id
AZCOPY_SPA_CLIENT_SECRET=your-secret
AZCOPY_AUTO_LOGIN_TYPE=SPN
from dotenv import load_dotenv
load_dotenv()

from azpype.commands.copy import Copy
Copy(source, destination).execute()

SAS Token

Pass the token directly (without the leading ?):

Copy(
    source="./data",
    destination="https://myaccount.blob.core.windows.net/mycontainer/",
    sas_token="sv=2021-12-02&ss=b&srt=sco&sp=rwdlacyx..."
).execute()

Configuration System

Azpype uses a two-level configuration system:

1. YAML Config File (Defaults)

Located at ~/.azpype/copy_config.yaml:

# Overwrite strategy at destination
overwrite: 'ifSourceNewer'  # Options: 'true', 'false', 'prompt', 'ifSourceNewer'

# Recursive copy for directories
recursive: true

# Create MD5 hashes during upload
put-md5: true

# Number of parallel transfers
concurrency: 16

2. Runtime Overrides (kwargs)

Override any config value at runtime:

Copy(
    source="./data",
    destination="https://...",
    overwrite="true",           # Override YAML setting
    concurrency=32,              # Increase parallelism
    dry_run=True,               # Test without copying
    exclude_pattern="*.tmp"     # Add exclusion pattern
).execute()

Common Usage Patterns

Upload with Patterns

# Upload only Python files
Copy(
    source="./project",
    destination="https://myaccount.blob.core.windows.net/code/",
    include_pattern="*.py",
    recursive=True
).execute()

# Exclude temporary files
Copy(
    source="./data",
    destination="https://myaccount.blob.core.windows.net/backup/",
    exclude_pattern="*.tmp;*.log;*.cache",
    recursive=True
).execute()

Sync with Overwrite Control

# Only upload newer files
Copy(
    source="./local-data",
    destination="https://myaccount.blob.core.windows.net/data/",
    overwrite="ifSourceNewer",
    recursive=True
).execute()

# Never overwrite existing files
Copy(
    source="./archive",
    destination="https://myaccount.blob.core.windows.net/archive/",
    overwrite="false"
).execute()

Dry Run Testing

# See what would be copied without actually transferring
Copy(
    source="./large-dataset",
    destination="https://myaccount.blob.core.windows.net/datasets/",
    dry_run=True
).execute()

Job Management

Resume failed or cancelled transfers:

from azpype.commands.jobs import Jobs

jobs = Jobs()

# List all jobs
exit_code, output = jobs.list()

# Resume a specific job
jobs.resume(job_id="abc123-def456")

# Find and resume the last failed job
job_id = jobs.last_failed()
if job_id:
    jobs.resume(job_id=job_id)

# Auto-recover (find and resume last failed)
jobs.recover_last_failed()

Logging

Azpype provides rich logging with automatic rotation:

  • Location: ~/.azpype/azpype_YYYY-MM-DD.log
  • Rotation: Daily, with 7-day retention and gzip compression
  • Console output: Color-coded with progress indicators
  • Command details: Full command, exit codes, and stdout/stderr captured

Example log output:

2025-08-15 19:09:29 | INFO | COPY | Starting copy operation
2025-08-15 19:09:29 | INFO | COPY | ========== COMMAND EXECUTION ==========
2025-08-15 19:09:29 | INFO | COPY | Command: azcopy copy ./data https://...
2025-08-15 19:09:29 | INFO | COPY | Exit Code: 0
2025-08-15 19:09:29 | INFO | COPY | STDOUT:
2025-08-15 19:09:29 | INFO | COPY |   Job abc123 has started
2025-08-15 19:09:29 | INFO | COPY |   100.0%, 10 Done, 0 Failed, 0 Pending

Available Options

Common options for the Copy command:

Option Type Description
overwrite str How to handle existing files: 'true', 'false', 'prompt', 'ifSourceNewer'
recursive bool Include subdirectories
include_pattern str Include only matching files (wildcards supported)
exclude_pattern str Exclude matching files (wildcards supported)
dry_run bool Preview what would be copied without transferring
concurrency int Number of parallel transfers
block_size_mb float Block size for large files (in MiB)
put_md5 bool Create MD5 hashes during upload
check_length bool Verify file sizes after transfer
as_subdir bool Place folder sources as subdirectories

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

azpype-0.5.2-py3-none-any.whl (58.5 MB view details)

Uploaded Python 3

File details

Details for the file azpype-0.5.2-py3-none-any.whl.

File metadata

  • Download URL: azpype-0.5.2-py3-none-any.whl
  • Upload date:
  • Size: 58.5 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.13

File hashes

Hashes for azpype-0.5.2-py3-none-any.whl
Algorithm Hash digest
SHA256 21ee9e597fe74efe25d0a7fab3bf8b55d65ea3f9a15bbfca0a784439a6458b21
MD5 b5e431768b6ff017fd0082a8cd104456
BLAKE2b-256 70ec8a9761bf85f3c19b0136cef3ca17a1ae33e02b70542c5801c83abece3bd3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page