Skip to main content

Reusable storage access for DataRobot

Project description

Usage

Library defines a common storage interface in datarobot_storage.base.Storage class, and several child classes implementing this interface for particular storage services:

  • datarobot_storage.azure.AzureBlobStorage
  • datarobot_storage.google.GoogleCloudStorage
  • datarobot_storage.amazon.S3Storage.

Factory method

Factory methods datarobot_storage.get_storage() and datarobot_storage.get_async_storage() tries to automatically detect storage type and other settings in the current execution environment and construct Storage object of the correct kind. Optional argument config_dict specifies mapping, that could be used instead of os.environ to read those settings from.

Optional storage_type argument allows you to request a storage of specific type, at the moment S3, AWS and Azure are supported. When storage_type argument is not specified, then the value available under the FILE_STORAGE_TYPE key in config_dict mapping to be used.

Access storage native client

Each class implements client property that returns a native storage client (e.g botocore.client). This would be the same client that datarobot_storage classes use to interact with storage, the client is pre-configured and is ready for use in your custom scenarios.

Examples

Get storage instance and configure it using os.environ values:

>>> from datarobot_storage import get_storage
>>> get_storage()  # doctest: +SKIP
<datarobot_storage.amazon.S3Storage object at 0x100360290>

Use alternative source of configuration, e.g. config.engine.EngConfig in DataRobot monolith:

>>> from config.engine import EngConfig  # doctest: +SKIP
>>> get_storage(config_dict=EngConfig)   # doctest: +SKIP
<datarobot_storage.amazon.S3Storage object at 0x108742680>

Request storage with awaitable methods:

from datarobot_storage import get_async_storage

async def main():
    storage = get_async_storage()
    await storage.list('remote/path/to/file')

It is also possible to manually pick particular storage implementation and instantiate it your way:

import os

from datarobot_storage.amazon import S3Storage
from datarobot_storage.amazon import S3Configuration

# Instantiate configuration from the environment
config = S3Configuration.from_dict(os.environ)

# Permanently disable multipart uploads and downloads
config.multipart_upload_enabled = False
config.multipart_download_enabled = False

# Instantiate storage from custom configuration
storage = S3Storage(storage_config=config)

And in asyncio-compatible way too using make_awaitable helper method:

import os

from datarobot_storage.amazon import S3Storage
from datarobot_storage.amazon import S3Configuration
from datarobot_storage.helpers import make_awaitable

config = S3Configuration.from_dict(os.environ)
storage = make_awaitable(S3Storage(storage_config=config))

For more examples, please see the functional test suite: tests/functional/test_storage.py

Command-line Usage

The purpose is to interactive access storage objects from the DataRobot service environment (usually a container running service and having storage access configured through the environment variables). Please check the embedde documentation for detailed reference:

$ python -m datarobot_storage --help
usage: datarobot_storage [-h] [-d] [-v] ACTION ...

DataRobot common storage utility

options:
  -h, --help     show this help message and exit
  -v, --verbose  Print additional info

Available actions:
  ACTION
    list         List remote objects
    delete       Delete single remote object
    get          Download remote object locally
    put          Upload local file to storage

Example of using it to list objects in AWS S3 bucket shrink-clusters-data-rd python under datarobot/mbtest prefix:

$ export AWS_PROFILE=datarobotrd
$ export FILE_STORAGE_TYPE=s3
$ export S3_BUCKET=shrink-clusters-data-rd
$ python -m datarobot_storage list /datarobot/mbtest | head
654238d71975fed77960e223
654498b81975fed7796edec3
654498b81975fed7796edecc
6544b2111975fed7796f7c3e
65450f181975fed77971b01d
654520681975fed77971ff02
6545238e1975fed779720edb
654582961975fed779744c4c
6548b7ff1975fed7797f420a
654aab4a1975fed77989e0a0

Developers

Shall you have any questions regarding development, testing or release process, please see CONTRIBUTING.md

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

datarobot_storage-0.0.0-py3-none-any.whl (50.8 kB view details)

Uploaded Python 3

File details

Details for the file datarobot_storage-0.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for datarobot_storage-0.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 93f0359b6a1585c421ca514ddc147a72c497a2fe14b3b75386eff4725fd3622b
MD5 d3f04e218b5532d1b36be1979f03e6b6
BLAKE2b-256 326d320b98a2263fe9a4bf1868a4ae609caf9f7fcef5609aa58e8143b57a935d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page