Skip to main content

A tensor disk offloader without data copying.

Project description

TensorNVME

A Python Library provides APIs to move PyTorch Tensors between CPU and NVMe.

Dependencies

Install

This package is only supported on Linux. liburing and libaio can be automatically installed. liburing is supported on Linux >= 5.10, and it won't be installed if the version of your Linux < 5.10.

It will search libaio and liburing in /usr/lib, /usr/lib64 and $LD_LIBRARY_PATH. If not found, backends will be installed in ~/.tensornvme, and ~/.bashrc will be modified to set $LD_LIBRARY_PATH correctly. Please source ~/.bashrc after installation. If you use other shells, please make sure $LD_LIBRARY_PATH is set correctly.

You must install pytorch and cmake before installing tensornvme. Once you upgrade pytorch, remember to reinstall tensornvme.

From source

git clone https://github.com/hpcaitech/TensorNVMe.git && cd TensorNVMe

First, install requirements:

pip install -r requirements.txt

To install tensornvme with liburing and libaio:

pip install -v --no-cache-dir .

To install tensornvme with only liburing:

DISABLE_AIO=1 pip install -v --no-cache-dir .

To install tensornvme with only libaio:

DISABLE_URING=1 pip install -v --no-cache-dir .

If you want to install libaio or liburing for system:

WITH_ROOT=1 sudo pip install -v --no-cache-dir .

Then they will be installed in /usr and ~/.bashrc will not be modified. Make sure you have root access.

From PIP

pip install packaging
pip install tensornvme

All acceptable environment variables are the same as those when installing from source.

Use docker

git clone https://github.com/hpcaitech/TensorNVMe.git && cd TensorNVMe/docker && docker build -t tensornvme .

CLI

We provide a CLI to test whether backends work well.

tensornvme check

Usage

It provide both synchronize and asynchronize I/O API.

Only CPU and contiguous tensors can be offloaded.

Synchronize API:

import torch
from tensornvme import DiskOffloader

x = torch.rand(2, 2)
y = torch.rand(4, 4, 4)
offloader = DiskOffloader('./offload')
offloader.sync_write(x)
# x is saved to a file on disk (in ./offload folder) and the memory of x is freed
offloader.sync_read(x)
# x is restored
offloader.sync_writev([x, y])
# x and y are offloaded
offloader.sync_readv([x, y])
# x and y are restored.
# sync_writev() and sync_readv() are order sensitive
# E.g. sync_writev([x, y]) and sync_writev([y, x]) are different

Asynchronize API:

import torch
from tensornvme import DiskOffloader

x = torch.rand(2, 2)
y = torch.rand(4, 4, 4)
offloader = DiskOffloader('./offload')
offloader.async_write(x)
# x is being offloaded in the background
offloader.sync_write_events()
# x is offloaded and the memory of x is freed
offloader.async_read(x)
# x is being restored in the background
offloader.sync_read_events()
# x is restored
offloader.async_writev([x, y])
# x and y are being offloaded in the background
offloader.synchronize()
# synchronize() will synchronize both write and read events.
offloader.async_readv([x, y])
offloader.synchronize()
# x and y are restored.
# async_writev() and async_readv() are also order sensitive

You can use asynchronize API to overlap computation and data moving.

tensors = []

for _ in range(10):
    tensor = torch.rand(2, 2)
    tensors.append(tensor)
    offloader.sync_write(tensor)

offloader.sync_read(tensors[0])

for i, tensor in enumerate(tensors):
    offloader.sync_read_events()
    if i + 1 < len(tensors):
        offloader.async_read(tensors[i+1])
    tensor.mul_(2.0)
    # compute with tensor
    offloader.sync_write_events()
    offloader.async_write(tensor)
offloader.synchronize()

How to test

We have C++ test scrpits for AsyncIO and SpaceManager class. Make sure you have installed liburing and libaio, and set environment variables correctly before testing. To run the tests:

mkdir build
cd build
cmake ..
make
./test_asyncio
./test_space_mgr

We also have python unit tests. Make sure you have installed pytest. To run:

pytest ./tests

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tensornvme-0.1.0.tar.gz (13.6 kB view details)

Uploaded Source

File details

Details for the file tensornvme-0.1.0.tar.gz.

File metadata

  • Download URL: tensornvme-0.1.0.tar.gz
  • Upload date:
  • Size: 13.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.9

File hashes

Hashes for tensornvme-0.1.0.tar.gz
Algorithm Hash digest
SHA256 d22aab70ab44adba69c2c8ca8dbfbf4698b9214097d13fd1198768fb8352c227
MD5 013f818500bd093300c0ffe669a56b88
BLAKE2b-256 46d91e36ffe41c95740f4e24772cf9fc9f9f9cd5c979f9a2173cdf39b747fd0b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page