Skip to main content

Reusable storage access for DataRobot

Project description

Usage

Library defines a common storage interface in datarobot_storage.base.Storage class, and several child classes implementing this interface for particular storage services:

  • datarobot_storage.azure.AzureBlobStorage
  • datarobot_storage.google.GoogleCloudStorage
  • datarobot_storage.amazon.S3Storage.

Factory method

Factory methods datarobot_storage.get_storage() and datarobot_storage.get_async_storage() tries to automatically detect storage type and other settings in the current execution environment and construct Storage object of the correct kind. Optional argument config_dict specifies mapping, that could be used instead of os.environ to read those settings from.

Optional storage_type argument allows you to request a storage of specific type, at the moment S3, AWS and Azure are supported. When storage_type argument is not specified, then the value available under the FILE_STORAGE_TYPE key in config_dict mapping to be used.

Access storage native client

Each class implements client property that returns a native storage client (e.g botocore.client). This would be the same client that datarobot_storage classes use to interact with storage, the client is pre-configured and is ready for use in your custom scenarios.

Examples

Get storage instance and configure it using os.environ values:

>>> from datarobot_storage import get_storage
>>> get_storage()  # doctest: +SKIP
<datarobot_storage.amazon.S3Storage object at 0x100360290>

Use alternative source of configuration, e.g. config.engine.EngConfig in DataRobot monolith:

>>> from config.engine import EngConfig  # doctest: +SKIP
>>> get_storage(config_dict=EngConfig)   # doctest: +SKIP
<datarobot_storage.amazon.S3Storage object at 0x108742680>

Request storage with awaitable methods:

from datarobot_storage import get_async_storage

async def main():
    storage = get_async_storage()
    await storage.list('remote/path/to/file')

It is also possible to manually pick particular storage implementation and instantiate it your way:

import os

from datarobot_storage.amazon import S3Storage
from datarobot_storage.amazon import S3Configuration

# Instantiate configuration from the environment
config = S3Configuration.from_dict(os.environ)

# Permanently disable multipart uploads and downloads
config.multipart_upload_enabled = False
config.multipart_download_enabled = False

# Instantiate storage from custom configuration
storage = S3Storage(storage_config=config)

And in asyncio-compatible way too using make_awaitable helper method:

import os

from datarobot_storage.amazon import S3Storage
from datarobot_storage.amazon import S3Configuration
from datarobot_storage.helpers import make_awaitable

config = S3Configuration.from_dict(os.environ)
storage = make_awaitable(S3Storage(storage_config=config))

For more examples, please see the functional test suite: tests/functional/test_storage.py

Command-line Usage

The purpose is to interactive access storage objects from the DataRobot service environment (usually a container running service and having storage access configured through the environment variables). Please check the embedde documentation for detailed reference:

$ python -m datarobot_storage --help
usage: datarobot_storage [-h] [-d] [-v] ACTION ...

DataRobot common storage utility

options:
  -h, --help     show this help message and exit
  -v, --verbose  Print additional info

Available actions:
  ACTION
    list         List remote objects
    delete       Delete single remote object
    get          Download remote object locally
    put          Upload local file to storage

Example of using it to list objects in AWS S3 bucket shrink-clusters-data-rd python under datarobot/mbtest prefix:

$ export AWS_PROFILE=datarobotrd
$ export FILE_STORAGE_TYPE=s3
$ export S3_BUCKET=shrink-clusters-data-rd
$ python -m datarobot_storage list /datarobot/mbtest | head
654238d71975fed77960e223
654498b81975fed7796edec3
654498b81975fed7796edecc
6544b2111975fed7796f7c3e
65450f181975fed77971b01d
654520681975fed77971ff02
6545238e1975fed779720edb
654582961975fed779744c4c
6548b7ff1975fed7797f420a
654aab4a1975fed77989e0a0

Developers

Shall you have any questions regarding development, testing or release process, please see CONTRIBUTING.md

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

datarobot_storage-0.0.1-py3-none-any.whl (51.0 kB view details)

Uploaded Python 3

File details

Details for the file datarobot_storage-0.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for datarobot_storage-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 bbcba6b3c642c78ecd35e9aab70089ee6c1ae22b74236abfa91dfde4d317cd54
MD5 37dc61acf7f533c600620fc5d511b148
BLAKE2b-256 2950d8692cfda7e82f5b5f3e860fafbb5af8542d77ae187cb0e76b020bfed113

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page