Skip to main content

Vineyard integration with machine learning frameworks

Project description

vineyard-ml: Accelerating Data Science Pipelines

Vineyard has been tightly integrated with the data preprocessing pipelines in widely-adopted machine learning frameworks like PyTorch, TensorFlow, and MXNet. Shared objects in vineyard, e.g., vineyard::Tensor, vineyard::DataFrame, vineyard::Table, etc., can be directly used as the inputs of the training and inference tasks in these frameworks.

Examples

Datasets

The following examples shows how DataFrame in vineyard can be used as the input of Dataset for PyTorch:

import os

import numpy as np
import pandas as pd

import torch
import vineyard

# connected to vineyard, see also: https://v6d.io/notes/getting-started.html
client = vineyard.connect(os.environ['VINEYARD_IPC_SOCKET'])

# generate a dummy dataframe in vineyard
df = pd.DataFrame({
    # multi-dimensional array as a column
    'data': vineyard.data.dataframe.NDArrayArray(np.random.rand(1000, 10)),
    'label': np.random.rand(1000)
})
object_id = client.put(df)

# take it as a torch dataset
from vineyard.contrib.ml.torch import torch_context
with torch_context():
    # ds is a `torch.utils.data.TensorDataset`
    ds = client.get(object_id)

# or, you can use datapipes from torchdata
from vineyard.contrib.ml.torch import datapipe
pipe = datapipe(ds)

# use the datapipes in your training loop
for data, label in pipe:
    # do something
    pass

Pytorch Modules

The following example shows how to use vineyard to share pytorch modules between processes:

import torch
import vineyard

# connected to vineyard, see also: https://v6d.io/notes/getting-started.html
client = vineyard.connect(os.environ['VINEYARD_IPC_SOCKET'])

# generate a dummy model in vineyard
class Model(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

model = Model()

# put the model into vineyard
from vineyard.contrib.ml.torch import torch_context
with torch_context():
    object_id = client.put(model)

# get the module state dict from vineyard and load it into a new model
model = Model()
with torch_context():
    state_dict = client.get(object_id)
model.load_state_dict(state_dict, assign=True)

By default, the compression is enabled for the vineyard client. Sometimes, the compression may not be efficient for the torch modules, you can disable it as follows:

from vineyard.contrib.ml.torch import torch_context
# add the client parameter to the torch_context to disable the compression
with torch_context(client):
    object_id = client.put(model)

# add the client parameter to the torch_context to disable the compression
with torch_context(client):
    state_dict = client.get(object_id)

Besides, if you want to put the torch modules into all vineyard workers spreadly to gather the network bandwidth of all workers, you can enable the spread option as follows:

from vineyard.contrib.ml.torch import torch_context
with torch_context(client, spread=True):
    object_id = client.put(model)

with torch_context(client):
    state_dict = client.get(object_id)

Reference and Implementation

For more details about vineyard itself, please refer to the Vineyard project.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

vineyard_ml-0.24.2-py3-none-any.whl (33.6 kB view details)

Uploaded Python 3

File details

Details for the file vineyard_ml-0.24.2-py3-none-any.whl.

File metadata

  • Download URL: vineyard_ml-0.24.2-py3-none-any.whl
  • Upload date:
  • Size: 33.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.13

File hashes

Hashes for vineyard_ml-0.24.2-py3-none-any.whl
Algorithm Hash digest
SHA256 110ba3d4dfae893d81352b9c0bc7343402bb465f98dfd77bf2115efe79b50112
MD5 2d9bb1c4a68230382ff5634970a22d03
BLAKE2b-256 c73e94d8f9b7a7da5348db14f06a23b683214b828ece794a35eadb75d173f50c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page