Skip to main content

IO drivers for vineyard

Project description

vineyard

vineyard-io: IO drivers for vineyard

vineyard-io is a collection of IO drivers for vineyard. Currently it supports

  • Local filesystem

  • AWS S3

  • Aliyun OSS

  • Hadoop filesystem

The vineyard-io package leverages the filesystem-spec to support other storage sinks and sources in a unified fashion. Other adaptors that works for fsspec could be plugged in as well.

IO Adaptors

Vineyard has a set of prebuilt IO adaptors, that can serve as common routines for various IO operations and can take place of boilerplate parts in computation tasks.

Vineyard is capable of reading from and writing data to multiple file systems. Behind the scene, it leverage fsspec to delegate the workload to various file system implementations.

Specifically, we can specify parameters to be passed to the file system, through the storage_options parameter. storage_options is a dict that pass additional keywords to the file system, For instance, we could combine path = hdfs:///path/to/file with storage_options = {"host": "localhost", "port": 9600} to read from a HDFS.

Note that you must encode the storage_options by base64 before passing it to the scripts.

Alternatively, we can encode such information into the path, such as: hdfs://<ip>:<port>/path/to/file.

To read from multiple files you can pass a glob string or a list of paths, with the caveat that they must all have the same protocol.

Their functionality are described as follows:

  • read_bytes

    Usage: vineyard_read_bytes <ipc_socket> <path> <storage_options> <read_options> <proc_num> <proc_index>

    Read a file on local file systems, OSS, HDFS, S3, etc. to ByteStream.

  • write_bytes

    Usage: vineyard_write_bytes <ipc_socket> <path> <stream_id> <storage_options> <write_options> <proc_num> <proc_index>

    Write a ByteStream to a file on local file system, OSS, HDFS, S3, etc.

  • read_orc

    Usage: vineyard_read_orc <ipc_socket> <path/directory> <storage_options> <read_options> <proc_num> <proc_index>

    Read a ORC file on local file systems, OSS, HDFS, S3, etc. to DataframeStream.

  • write_orc

    Usage: vineyard_read_orc <ipc_socket> <path/directory> <storage_options> <read_options> <proc_num> <proc_index>

    Write a DataframeStream to a ORC file on local file system, OSS, HDFS, S3, etc.

  • read_vineyard_dataframe

    Usage: vineyard_read_vineyard_dataframe <ipc_socket> <vineyard_address> <storage_options> <read_options> <proc num> <proc index>

    Read a DataFrame in vineyard as a DataframeStream.

  • write_vineyard_dataframe

    Usage: vineyard_write_vineyard_dataframe <ipc_socket> <stream_id> <proc_num> <proc_index>

    Write a DataframeStream to a DataFrame in vineyard.

  • serializer

    Usage: vineyard_serializer <ipc_socket> <object_id>

    Serialize a vineyard object (non-global or global) as a ByteStream or a set of ByteStream (StreamCollection).

  • deserializer

    Usage: vineyard_deserializer <ipc_socket> <object_id>

    Deserialize a ByteStream or a set of ByteStream (StreamCollection) as a vineyard object.

  • read_bytes_collection

    Usage: vineyard_read_bytes_collection <ipc_socket> <prefix> <storage_options> <proc_num> <proc_index>

    Read a directory (on local filesystem, OSS, HDFS, S3, etc.) as a ByteStream or a set of ByteStream (StreamCollection).

  • write_bytes_collection

    Usage: vineyard_write_vineyard_dataframe <ipc_socket> <stream_id> <proc_num> <proc_index>

    Write a ByteStream or a set of ByteStream (StreamCollection) to a directory (on local filesystem, OSS, HDFS, S3, etc.).

  • parse_bytes_to_dataframe

    Usage: vineyard_parse_bytes_to_dataframe.py <ipc_socket> <stream_id> <proc_num> <proc_index>

    Parse a ByteStream (in CSV format) as a DataframeStream.

  • parse_dataframe_to_bytes

    Usage: vineyard_parse_dataframe_to_bytes <ipc_socket> <stream_id> <proc_num> <proc_index>

    Serialize a DataframeStream to a ByteStream (in CSV format).

  • dump_dataframe

    Usage: vineyard_dump_dataframe <ipc_socket> <stream_id>

    Dump the content of a DataframeStream, for debugging usage.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

vineyard_io-0.21.5-py3-none-any.whl (82.7 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page