Skip to main content

AzFS is to provide convenient Python read/write functions for Azure Storage Account.

Project description

azfs

CircleCI codecov Language grade: Python

PythonVersion PiPY Downloads

AzFS is to provide convenient Python read/write functions for Azure Storage Account.

azfs can

  • list files in blob (also with wildcard *),
  • check if file exists,
  • read csv as pd.DataFrame, and json as dict from blob,
  • write pd.DataFrame as csv, and dict as json to blob,
  • and raise lots of exceptions ! (Thank you for your cooperation)

install

$ pip install azfs

usage

create the client

import azfs
from azure.identity import DefaultAzureCredential

# credential is not required if your environment is on AAD
azc = azfs.AzFileClient()

# credential is required if your environment is not on AAD
credential = "[your storage account credential]"
# or
credential = DefaultAzureCredential()
azc = azfs.AzFileClient(credential=credential)

types of authorization

Currently, only support Azure Active Directory (AAD) token credential.

types of storage account kind

The table blow shows if azfs provides read/write functions for the storage.

account kind Blob Data Lake Queue File Table
StorageV2 O O O X X
StorageV1 O O O X X
BlobStorage O - - - -
  • O: provides basic functions
  • X: not provides
  • -: storage type unavailable

download data

azfs can get csv or json data from blob storage.

import azfs
import pandas as pd

azc = azfs.AzFileClient()
csv_path = "https://[storage-account].../*.csv"
json_path = "https://[storage-account].../*.json"
data_path = "https://[storage-account].../*.another_format"

# read csv as pd.DataFrame
df = azc.read_csv(csv_path, index_col=0)
# or
with azc:
    df = pd.read_csv_az(csv_path, header=None)

# read json
data = azc.read_json(json_path)

# also get data directory
data = azc.get(data_path)
# or, (`download` is an alias for `get`) 
data = azc.download(data_path)

upload data

import azfs
import pandas as pd

azc = azfs.AzFileClient()
csv_path = "https://[storage-account].../*.csv"
json_path = "https://[storage-account].../*.json"
data_path = "https://[storage-account].../*.another_format"


df = pd.DataFrame()
data = {"example": "data"}

# write csv
azc.write_csv(path=csv_path, df=df)
# or
with azc:
    df.to_csv_az(path=csv_path, index=False)

# read json as dict
azc.write_json(path=json_path, data=data, indent=4)

# also put data directory
import json
azc.put(path=json_path, data=json.dumps(data, indent=4)) 
# or, (`upload` is an alias for `put`)
azc.upload(path=json_path, data=json.dumps(data, indent=4))

enumerating(ls, glob) or checking if file exists

import azfs

azc = azfs.AzFileClient()

# get file_name list of blob
file_name_list = azc.ls("https://[storage-account].../{container_name}")
# or if set `attach_prefix` True, get full_path list of blob
file_full_path_list = azc.ls("https://[storage-account].../{container_name}", attach_prefix=True)

# find specific file with `*`
file_full_path_list = azc.glob("https://[storage-account].../{container_name}/*.csv")
# also search deeper directory
file_full_path_list = azc.glob("https://[storage-account].../{container_name}/*/*/*.csv")
# or if the directory starts with `a`
file_full_path_list = azc.glob("https://[storage-account].../{container_name}/a*/*.csv")

# check if file exists
is_exists = azc.exists("https://[storage-account].../*.csv")

remove, copy files, etc...

import azfs

azc = azfs.AzFileClient()

# copy file from `src_path` to `dst_path`
src_path = "https://[storage-account].../from/*.csv"
dst_path = "https://[storage-account].../to/*.csv"
is_copied = azc.cp(src_path=src_path, dst_path=dst_path, overwrite=True)

# remove the file
is_removed = azc.rm(path=src_path)

# get file meta info
data = azc.info(path=src_path)

dependencies

pandas >= "1.0.0"
azure-identity >= "1.3.1"
azure-storage-blob >= "12.3.0"
azure-storage-file-datalake >= "12.0.0"
azure-storage-queue >= "12.1.1"

references

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

azfs-0.1.5.tar.gz (14.7 kB view hashes)

Uploaded Source

Built Distribution

azfs-0.1.5-py3-none-any.whl (17.8 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page