IO drivers for vineyard
Project description
vineyard-io: IO drivers for vineyard
vineyard-io is a collection of IO drivers for vineyard. Currently it supports
Local filesystem
AWS S3
Aliyun OSS
Hadoop filesystem
The vineyard-io package leverages the filesystem-spec to support other storage sinks and sources in a unified fashion. Other adaptors that works for fsspec could be plugged in as well.
IO Adaptors
Vineyard has a set of prebuilt IO adaptors, that can serve as common routines for various IO operations and can take place of boilerplate parts in computation tasks.
Vineyard is capable of reading from and writing data to multiple file systems.
Behind the scene, it leverage fsspec to delegate the workload to various file system implementations.
Specifically, we can specify parameters to be passed to the file system, through the storage_options parameter.
storage_options is a dict that pass additional keywords to the file system,
For instance, we could combine path = hdfs:///path/to/file with storage_options = {"host": "localhost", "port": 9600}
to read from a HDFS.
Note that you must encode the storage_options by base64 before passing it to the scripts.
Alternatively, we can encode such information into the path,
such as: hdfs://<ip>:<port>/path/to/file.
To read from multiple files you can pass a glob string or a list of paths, with the caveat that they must all have the same protocol.
Their functionality are described as follows:
read_bytesUsage: vineyard_read_bytes <ipc_socket> <path> <storage_options> <read_options> <proc_num> <proc_index>Read a file on local file systems, OSS, HDFS, S3, etc. to
ByteStream.write_bytesUsage: vineyard_write_bytes <ipc_socket> <path> <stream_id> <storage_options> <write_options> <proc_num> <proc_index>Write a
ByteStreamto a file on local file system, OSS, HDFS, S3, etc.read_orcUsage: vineyard_read_orc <ipc_socket> <path/directory> <storage_options> <read_options> <proc_num> <proc_index>Read a ORC file on local file systems, OSS, HDFS, S3, etc. to
DataframeStream.write_orcUsage: vineyard_read_orc <ipc_socket> <path/directory> <storage_options> <read_options> <proc_num> <proc_index>Write a
DataframeStreamto a ORC file on local file system, OSS, HDFS, S3, etc.read_vineyard_dataframeUsage: vineyard_read_vineyard_dataframe <ipc_socket> <vineyard_address> <storage_options> <read_options> <proc num> <proc index>Read a
DataFramein vineyard as aDataframeStream.write_vineyard_dataframeUsage: vineyard_write_vineyard_dataframe <ipc_socket> <stream_id> <proc_num> <proc_index>Write a
DataframeStreamto aDataFramein vineyard.serializerUsage: vineyard_serializer <ipc_socket> <object_id>Serialize a vineyard object (non-global or global) as a
ByteStreamor a set ofByteStream(StreamCollection).deserializerUsage: vineyard_deserializer <ipc_socket> <object_id>Deserialize a
ByteStreamor a set ofByteStream(StreamCollection) as a vineyard object.read_bytes_collectionUsage: vineyard_read_bytes_collection <ipc_socket> <prefix> <storage_options> <proc_num> <proc_index>Read a directory (on local filesystem, OSS, HDFS, S3, etc.) as a
ByteStreamor a set ofByteStream(StreamCollection).write_bytes_collectionUsage: vineyard_write_vineyard_dataframe <ipc_socket> <stream_id> <proc_num> <proc_index>Write a
ByteStreamor a set ofByteStream(StreamCollection) to a directory (on local filesystem, OSS, HDFS, S3, etc.).parse_bytes_to_dataframeUsage: vineyard_parse_bytes_to_dataframe.py <ipc_socket> <stream_id> <proc_num> <proc_index>Parse a
ByteStream(in CSV format) as aDataframeStream.parse_dataframe_to_bytesUsage: vineyard_parse_dataframe_to_bytes <ipc_socket> <stream_id> <proc_num> <proc_index>Serialize a
DataframeStreamto aByteStream(in CSV format).dump_dataframeUsage: vineyard_dump_dataframe <ipc_socket> <stream_id>Dump the content of a
DataframeStream, for debugging usage.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file vineyard_io-0.24.2-py3-none-any.whl.
File metadata
- Download URL: vineyard_io-0.24.2-py3-none-any.whl
- Upload date:
- Size: 82.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c22846cd06dca918f5980bc2cdc53292066f355d83770dd25418797f864bc3d1
|
|
| MD5 |
1d5113733e271e767b137a2909df875f
|
|
| BLAKE2b-256 |
0b8b9c2b6e240224046005d56f5cd0f179b39cff82b54afd881545fcdd689d36
|