Skip to main content

Read and write files on Google Cloud Storage and Amazon S3 as if they are on local computer.

Project description

Aries.storage: A Unified Storage Interface

Read and write files on Google Cloud Storage and Amazon S3 as if they are on local computer

The Aries storage sub-package provides a unified interface for accessing files and folders on local and cloud storage systems. The StorageFile class transform a file on cloud storage (e.g. Google Cloud Bucket) into a file-like object (stream). This enables us to read and write files on cloud storage like reading and writing on local disks. In addition, this package also includes high level APIs like copy, move and delete files and folders.

Motivation

As cloud platforms getting closer to our daily lives, file storage means more than just the hard drive on local computer. However, there is no standard cloud storage interface for reading and writing file on the cloud. The methods depends on the APIs provided by different providers. Also, reading and writing files on the cloud are so different from reading and writing files on the local computer. We have to treat them differently in the code. This package solves the problem by providing a unified way to access local and cloud storage. The IO interface is also designed to be the same as the way we access files on local computer. With this package, the modification on existing code to support cloud storage can be reduced significantly.

Implementation

Data access is provided through three classes: Aries.storage.StorageFile, Aries.storage.StorageFolder and Aries.storage.StoragePrefix. Each of them wraps an underlying "raw (or raw_io)" class, which contains platform dependent implementation. The Uniform Resource Identifier (URI)), e.g. file:///var/text.txt or gs://bucket_name/text.txt, is used to locate a file or folder. StorageFile and StorageFolder determine the underlying "raw" class automatically based on the scheme from the URI.

Currently, the following schemes are implemented:

  • Local computer (file://)
  • Google Cloud Storage (gs://)
  • Amazon S3 Storage (s3://)

The StorageFile Class

A StorageFile object can be initialized by

from Aries.storage import StorageFile

# uri: the Uniform Resource Identifier for a file
# local file path can also be used as uri.
uri = "/path/to/file.txt"
f = StorageFile(uri)

StorageFile() automatically determines the storage type by the scheme in the URI. For local file, URI can also be /var/text.txt without the scheme.

With a StorageFile, you can:

  • Get the file size: StorageFile("path/to/file").size
  • Get the md5 hex: StorageFile("path/to/file").md5_hex
  • Get the last update time: StorageFile("path/to/file").updated_time
  • Check if the file exist: StorageFile("path/to/file").exist()
  • Create an empty file: StorageFile("path/to/file").create()
  • Copy the file to another location: StorageFile("path/to/file").copy("gs://path/to/destination")
  • Move the file to another location: StorageFile("path/to/file").move("gs://path/to/destination")
  • Read the file (as bytes) into memory: StorageFile("path/to/file").read()
  • Delete the file: StorageFile("path/to/file").delete()

StorageFile is a file-like object implementing the I/O stream interface with BufferedIOBase and TextIOBase. The static StorageFile.init(uri, mode) method is designed to replace the built-in open() method.

However, initializing the StorageFile does NOT open the file. The StorageFile object provides open() and close() methods for opening and closing the file for read/write. The open() method returns the StorageFile instance itself.

Here is an example of using StorageFile with pandas:

from Aries.storage import StorageFile
import pandas as pd
df = pd.DataFrame([1, 3, 5])

uri = "gs://bucket_name/path/to/file.txt"
# Using StorageFile in pandas
f = StorageFile(uri).open('w'):
# f will be a file-like object
df.to_csv(f)
f.close()

The StorageFile.init() static method provides a shortcut for initializing and opening the file. This is designed to replace the built-in python open() method. The init() method returns a StorageFile instance. StorageFile also support context manager to open and close the file:

from Aries.storage import StorageFile
import pandas as pd
df = pd.DataFrame([1, 3, 5])

uri = "gs://bucket_name/path/to/file.txt"
# Using StorageFile in pandas
with StorageFile.init(uri, 'w') as f:
    # f will be a file-like object
    df.to_csv(f)

Once the file is opened, it can be used as a file-like object. The data can be accessed through methods like read() and write(). However, for Cloud Storage, the StorageFile might not have a fileno or file descriptor. In that case, it cannot be used when fileno is needed.

The init() and open() methods supports the same arguments as the Python built-in open() function. However, at this time, only the mode argument is used when opening cloud storage files.

High-Level APIs

The StorageFile class also supports high-level operations, including:

  • copy(), for copying the file to another location, e.g. StorageFile('/path/to/file.txt').copy('gs://bucket_name/path/to/file.txt')
  • move(), for moving the file, e.g. StorageFile('/path/to/file.txt').move('s3://bucket_name/path/to/file.txt')
  • delete(), for deleting the file, e.g. StorageFile('/path/to/file.txt').delete().

The copy() and move() methods also support cross-platform operations. For example:

# Move a file from local computer to Google cloud storage.
StorageFile('/path/to/file.txt').move('gs://bucket_name/path/to/file.txt')

The StorageFolder Class

The StorageFolder class provides the same high level APIs as the StorageFile class, as well as shortcuts for listing the files in a folder.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Aries-storage-0.1.304.tar.gz (23.4 kB view details)

Uploaded Source

Built Distribution

Aries_storage-0.1.304-py3-none-any.whl (27.3 kB view details)

Uploaded Python 3

File details

Details for the file Aries-storage-0.1.304.tar.gz.

File metadata

  • Download URL: Aries-storage-0.1.304.tar.gz
  • Upload date:
  • Size: 23.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.47.0 CPython/3.8.3

File hashes

Hashes for Aries-storage-0.1.304.tar.gz
Algorithm Hash digest
SHA256 c30ef51ad054b4bc5a3aeeb4c8da0b942cd4c61b574421528232883558f2854a
MD5 de3be7d2435b01e469e87e5fdbaf45dd
BLAKE2b-256 4ba0df2b1cc82eebee51bb4449c678d800df7b7bf537633a7600610ed9633867

See more details on using hashes here.

File details

Details for the file Aries_storage-0.1.304-py3-none-any.whl.

File metadata

  • Download URL: Aries_storage-0.1.304-py3-none-any.whl
  • Upload date:
  • Size: 27.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.47.0 CPython/3.8.3

File hashes

Hashes for Aries_storage-0.1.304-py3-none-any.whl
Algorithm Hash digest
SHA256 8214b2c4f976e8993a2b645fd1eccb45547475732d05643cc6eacb6d19839ced
MD5 07849c03d90eea7f19c1bc4c95f6ba90
BLAKE2b-256 be0461a8b7eb9f48e4bed2d4bd72c03437108ca560f68b27aecba3c1ad6e8c37

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page