Fast interface between pytorch and Segate CORTX
Project description
cortx_pytorch
Fast interface between pytorch and Segate CORTX
This package let you encode and upload a pytorch computer vision dataset (of the shape (image,label)) to CORTX.
1: Install
pip install cortx_pytorch
2: Convert and upload your dataset
from cortx_pytorch import upload_cv_dataset, make_client
from torchvision import datasets
if __name__ == '__main__':
# Define the connection settings for our client
client = make_client(URL, ACCESS_KEY, SECRET_KEY)
bucket = 'testbucket' # Bucket where to read/write our ML dataset
folder = 'imagenet-val' # Folder where this particular dataset will be
# We use a pytorch dataset as a source to prime the content of CORTX
# Once we have encoded and uploaded it we don't need it anymore
# Here we use a locally available Imagenet dataset
ds = ds = datasets.ImageFolder('/scratch/datasets/imagenet-pytorch/val')
# Packs and upload any computer vision dataset on cortx
#
# It only needs to be done once !
# Image are groupped in objects of size at most `masize` and at most
# `maxcount` images. We use `workers` processes to prepare the data
# in parallel
upload_cv_dataset(ds, client=client, bucket=bucket,
base_folder=folder, maxsize=1e8,
maxcount=100000, workers=30
2: Use the dataset like any pytorch dataset
fimport torch as ch
from tqdm import tqdm
from cortx_pytorch import RemoteDataset, make_client
from torchvision import transforms
preproc = transforms.Compose([
transforms.RandomResizedCrop(224),
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
])
if __name__ == '__main__':
# Define the connection settings for our client
client = make_client(URL, ACCESS_KEY, SECRET_KEY)
bucket = 'testbucket' # Bucket where to read/write our ML dataset
folder = 'imagenet-val' # Folder where this particular dataset will be
# Now that we have created and upload the dataset on CORTX we can use
# it in Pytorch
dataset = (RemoteDataset(client, bucket, folder)
.decode("pil") # Decode the data as PIL images
.to_tuple("jpg;png", "cls") # Extract images and labels from the dataset
.map_tuple(preproc, lambda x: x) # Apply data augmentations
.batched(64) # Make batches of 64 images
)
# We create a regular pytorch data loader as we would do for regular data sets
dataloader = ch.utils.data.DataLoader(dataset, num_workers=3, batch_size=None)
for image, label in tqdm((x for x in dataloader), total = 100000 / 60):
# Train / evaluate ML models on this batch of data
pass
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
cortx_pytorch-1.0.1.tar.gz
(3.3 kB
view details)
File details
Details for the file cortx_pytorch-1.0.1.tar.gz
.
File metadata
- Download URL: cortx_pytorch-1.0.1.tar.gz
- Upload date:
- Size: 3.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.7.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 25e157d357c50caab16da11e41fb91e35f316345dd54299c086ffb8b76d5013a |
|
MD5 | 090177cfcc96812fe1f7f4ea154b4b8d |
|
BLAKE2b-256 | c5b5d30ff15c739e75273feaf3fbcf69f595171c7c0fb6dcd662cea9c356c548 |