Skip to main content

Accelerated Discovery Reusable Components.

Project description

Storage Access Reusable Component

This is the implementation of Storage Access Reusable Component. It serves as a wrapper around Dapr and intended to replace all other components' I/O operations.

1. Supported operations

Below is a list of the operations you might intend to perform in your component.

1.1. Upload

Uploads data from a file to an object in a bucket.

Arguments

  • src: Name of file to download.
  • dest: Object name in the bucket.
  • binding: The name of the binding to perform the operation.

1.2. Download

Downloads data of an object to file.

Arguments

  • src: Object name in the bucket.
  • dest: Name of file to download.
  • binding: The name of the binding to perform the operation.

2. Dapr configurations

  • address: Dapr Runtime gRPC endpoint address.
  • timeout: Value in seconds we should wait for sidecar to come up

3. Verbose mode

If you want to run the script in verbose mode you can append --verbose or -v to the command.

4. Usage

4.1 Pipeline native

Follow the step-by-step method below to add this component in your pipeline, or refer to the full example here workflow/components/storage/dummy_pipeline.py.

  1. Load the component.yaml file using load_component_from_file
io_op = kfp.components.load_component_from_file("path/to/component.yaml")

Or alternatively you can load from Github like this:

file_url = "https://raw.github.ibm.com/Accelerated-Discovery/Discovery-Platform/main/workflow/components/storage/component.yaml"
io_op = kfp.components.load_component_from_url(file_url)
  1. In your pipeline call the component with the parameters that fir your needs
dummy_task_1 = io_op(
    action="download",
    src="test.txt",
    dest="/mnt/downloaded.txt",
)
  1. Optional: Use volumes to keep files consistent between pods
vop = kfp.dsl.VolumeOp(
    name="volume_creation",
    resource_name="mypvc",
    size="1Mi",
    modes=kfp.dsl.VOLUME_MODE_RWO,
)

dummy_task_1 = io_op(
    action="download",
    src="test.txt",
    dest="/mnt/downloaded.txt",
).add_pvolumes({"/mnt": vop.volume})

dummy_task_2 = io_op(
    action="upload",
    src="/data/downloaded.txt",
    dest="{{workflow.namespace}}/{{workflow.name}}/{{workflow.uid}}/downloaded.txt",
).add_pvolumes({"/data": dummy_task_1.pvolume})
  1. Compile your pipeline as you're used to, for example
dsl-compile-tekton \
    --py <your pipeline file>.py \
    --output <your output name>.yaml

4.2 Python module

You can also invoke the manager using native python, which doesn't require a docker image to run. However, the package must be present in you python environment.

4.2.1 Setup

pip install ad-storage-component

4.2.2 Usage

from adstorage import download, upload

download_resp = download(
    src, dest,
    # binding_name="s3-state",  # Or any other binding
    # address=None, # endpoint:port
    # timeout=300,  # in seconds
)

upload_resp = upload(
    src, dest,
    # binding_name="s3-state",  # Or any other binding
    # address=None, # endpoint:port
    # timeout=300,  # in seconds
)

4.3 CLI

$ adsc -h

usage: adsc [-h] --src PATH --dest PATH [--binding NAME] [--address URL] [--timeout SEC] [--verbose] [--version] {download,upload}

Storage Access reusable component.

positional arguments:
  {download,upload}   action to be performed on data.

optional arguments:
  -h, --help          show this help message and exit
  --verbose, -v       run the script in debug mode.
  --version           show program's version number and exit

action arguments:
  --src, -r PATH      path of file to perform action on.
  --dest, -d PATH     object's desired full path in the destination.
  --binding, -b NAME  the name of the binding as defined in the components.

dapr arguments:
  --address, -a URL   Dapr Runtime gRPC endpoint address.
  --timeout, -t SEC   value in seconds we should wait for sidecar to come up.

Note: You can replace adsc with python adstorage/main.py ... if you don't have the package installed in your python environment.

Examples

  1. To download an object from S3 run
adsc download \
    --src test.txt \
    --dest tmp/downloaded.txt \
    --verbose
  1. To upload an object to S3 run
adsc upload \
    --src tmp/downloaded.txt \
    --dest local/uploaded.txt \
    --verbose

5. Publishing

Every change to the python script requires a new Docker image to be published or a PyPi package to be pushed.

5.1 Publish on all ends

To publish a docker image and a pypi image run the following command:

Note: Please make sure each one's documentation below.

make

5.2 Docker

5.2.1 Local registry

With kind I'm using a local registry accessible using 5001 port, running the following command will build and push the image to my local registry:

make docker-publish

5.2.2 Remote registry

To publish a new image in a remote registry you need to set the registry path variable:

REPO="registry-1.docker.io/distribution" make docker-publish

5.3. PyPi registry

If you have the right (write) permissions, and a correctly-configured $HOME/.pypirc file, run the following command to publish the package

make pypi-publish

Increment the version

To increment the version, go to adstorage/version.py and increment the version there. Both the setup.py and the CLI will read the new version correctly.

Note: We will run the pypi-install target to confirm the package is installable before publishing it to our PyPi registry.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

ad_components-0.1.4-py3.9.egg (17.1 kB view details)

Uploaded Source

File details

Details for the file ad_components-0.1.4-py3.9.egg.

File metadata

  • Download URL: ad_components-0.1.4-py3.9.egg
  • Upload date:
  • Size: 17.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.12

File hashes

Hashes for ad_components-0.1.4-py3.9.egg
Algorithm Hash digest
SHA256 35ae9b6d99fd458ec7c61ed271b7a877681479d7fb96cbb9fa4234b62974b63a
MD5 d6dea879f887cd926cd4337eb2cbfa6c
BLAKE2b-256 0a8eedc96692b1343a2fbfeff447157627c3e1466cc08e87e683666069313732

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page