Skip to main content

Read and write files on Google Cloud Storage and Amazon S3 as if they are on local computer.

Project description

Aries.storage: A Unified Storage Interface

Read and write files on Google Cloud Storage and Amazon S3 as if they are on local computer

The Aries storage sub-package provides a unified interface for accessing files and folders on local and cloud storage systems. The StorageFile class transform a file on cloud storage (e.g. Google Cloud Bucket) into a file-like object (stream). This enables us to read and write files on cloud storage like reading and writing on local disks. In addition, this package also includes high level APIs like copy, move and delete files and folders.

Motivation

As cloud platforms getting closer to our daily lives, file storage means more than just the hard drive on local computer. However, there is no standard cloud storage interface for reading and writing file on the cloud. The methods depends on the APIs provided by different providers. Also, reading and writing files on the cloud are so different from reading and writing files on the local computer. We have to treat them differently in the code. This package solves the problem by providing a unified way to access local and cloud storage. The IO interface is also designed to be the same as the way we access files on local computer. With this package, the modification on existing code to support cloud storage can be reduced significantly.

Implementation

Data access is provided through three classes: Aries.storage.StorageFile, Aries.storage.StorageFolder and Aries.storage.StoragePrefix. Each of them wraps an underlying "raw (or raw_io)" class, which contains platform dependent implementation. The Uniform Resource Identifier (URI)), e.g. file:///var/text.txt or gs://bucket_name/text.txt, is used to locate a file or folder. StorageFile and StorageFolder determine the underlying "raw" class automatically based on the scheme from the URI.

Currently, the following schemes are implemented:

  • Local computer (file://)
  • Google Cloud Storage (gs://)
  • Amazon S3 Storage (s3://)

The StorageFile Class

A StorageFile object can be initialized by

from Aries.storage import StorageFile

# uri: the Uniform Resource Identifier for a file
# local file path can also be used as uri.
uri = "/path/to/file.txt"
f = StorageFile(uri)

StorageFile() automatically determines the storage type by the scheme in the URI. For local file, URI can also be /var/text.txt without the scheme.

With a StorageFile, you can:

  • Get the file size: StorageFile("path/to/file").size
  • Get the md5 hex: StorageFile("path/to/file").md5_hex
  • Get the last update time: StorageFile("path/to/file").updated_time
  • Check if the file exist: StorageFile("path/to/file").exist()
  • Create an empty file: StorageFile("path/to/file").create()
  • Copy the file to another location: StorageFile("path/to/file").copy("gs://path/to/destination")
  • Move the file to another location: StorageFile("path/to/file").move("gs://path/to/destination")
  • Read the file (as bytes) into memory: StorageFile("path/to/file").read()
  • Delete the file: StorageFile("path/to/file").delete()

StorageFile is a file-like object implementing the I/O stream interface with BufferedIOBase and TextIOBase. The static StorageFile.init(uri, mode) method is designed to replace the built-in open() method.

However, initializing the StorageFile does NOT open the file. The StorageFile object provides open() and close() methods for opening and closing the file for read/write. The open() method returns the StorageFile instance itself.

Here is an example of using StorageFile with pandas:

from Aries.storage import StorageFile
import pandas as pd
df = pd.DataFrame([1, 3, 5])

uri = "gs://bucket_name/path/to/file.txt"
# Using StorageFile in pandas
f = StorageFile(uri).open('w'):
# f will be a file-like object
df.to_csv(f)
f.close()

The StorageFile.init() static method provides a shortcut for initializing and opening the file. This is designed to replace the built-in python open() method. The init() method returns a StorageFile instance. StorageFile also support context manager to open and close the file:

from Aries.storage import StorageFile
import pandas as pd
df = pd.DataFrame([1, 3, 5])

uri = "gs://bucket_name/path/to/file.txt"
# Using StorageFile in pandas
with StorageFile.init(uri, 'w') as f:
    # f will be a file-like object
    df.to_csv(f)

Once the file is opened, it can be used as a file-like object. The data can be accessed through methods like read() and write(). However, for Cloud Storage, the StorageFile might not have a fileno or file descriptor. In that case, it cannot be used when fileno is needed.

The init() and open() methods supports the same arguments as the Python built-in open() function. However, at this time, only the mode argument is used when opening cloud storage files.

High-Level APIs

The StorageFile class also supports high-level operations, including:

  • copy(), for copying the file to another location, e.g. StorageFile('/path/to/file.txt').copy('gs://bucket_name/path/to/file.txt')
  • move(), for moving the file, e.g. StorageFile('/path/to/file.txt').move('s3://bucket_name/path/to/file.txt')
  • delete(), for deleting the file, e.g. StorageFile('/path/to/file.txt').delete().

The copy() and move() methods also support cross-platform operations. For example:

# Move a file from local computer to Google cloud storage.
StorageFile('/path/to/file.txt').move('gs://bucket_name/path/to/file.txt')

The StorageFolder Class

The StorageFolder class provides the same high level APIs as the StorageFile class, as well as shortcuts for listing the files in a folder.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Aries-storage-0.1.308.tar.gz (23.4 kB view details)

Uploaded Source

Built Distribution

Aries_storage-0.1.308-py3-none-any.whl (27.4 kB view details)

Uploaded Python 3

File details

Details for the file Aries-storage-0.1.308.tar.gz.

File metadata

  • Download URL: Aries-storage-0.1.308.tar.gz
  • Upload date:
  • Size: 23.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.1.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.8.5

File hashes

Hashes for Aries-storage-0.1.308.tar.gz
Algorithm Hash digest
SHA256 e3167927405cb78e7fa63e483a7abefb74b9a9f270fdd75563e3e244db55b7a6
MD5 adb5953ca880907be62a721f82049f37
BLAKE2b-256 5c8a3775131e4de0c4532dd3d293e2dddecc6ba8c0a8915a7aefb9b8e4f4a059

See more details on using hashes here.

File details

Details for the file Aries_storage-0.1.308-py3-none-any.whl.

File metadata

  • Download URL: Aries_storage-0.1.308-py3-none-any.whl
  • Upload date:
  • Size: 27.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.1.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.8.5

File hashes

Hashes for Aries_storage-0.1.308-py3-none-any.whl
Algorithm Hash digest
SHA256 2c780e17e80203c01d5a31d5719915d8ebe5c61659a74d68ccaac54785bae9fb
MD5 9ba5f5d9c9784d8b6ec8256e5a6630a4
BLAKE2b-256 497a93149726b82d19625b9c65cafea9f55dd09c5b63016c154ac564bdbfc007

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page