Skip to main content

Fast interface between pytorch and Segate CORTX

Project description

cortx_pytorch

Fast interface between pytorch and Segate CORTX

This package let you encode and upload a pytorch computer vision dataset (of the shape (image,label)) to CORTX.

1: Install

pip install cortx_pytorch

2: Convert and upload your dataset

from cortx_pytorch import upload_cv_dataset, make_client
from torchvision import datasets

if __name__ == '__main__':
    # Define the connection settings for our client
    client = make_client(URL, ACCESS_KEY, SECRET_KEY)


    bucket = 'testbucket'  # Bucket where to read/write our ML dataset
    folder = 'imagenet-val'  # Folder where this particular dataset will be

    # We use a pytorch dataset as a source to prime the content of CORTX
    # Once we have encoded and uploaded it we don't need it anymore
    # Here we use a locally available Imagenet dataset
    ds = ds = datasets.ImageFolder('/scratch/datasets/imagenet-pytorch/val')

    # Packs and upload any computer vision dataset on cortx
    #
    # It only needs to be done once !
    # Image are groupped in objects of size at most `masize` and at most
    # `maxcount` images. We use `workers` processes to prepare the data
    # in parallel
    upload_cv_dataset(ds, client=client, bucket=bucket,
                      base_folder=folder, maxsize=1e8,
                      maxcount=100000, workers=30

2: Use the dataset like any pytorch dataset

fimport torch as ch
from tqdm import tqdm

from cortx_pytorch import RemoteDataset, make_client
from torchvision import transforms

preproc = transforms.Compose([
    transforms.RandomResizedCrop(224),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
])
        
if __name__ == '__main__':

    # Define the connection settings for our client
    client = make_client(URL, ACCESS_KEY, SECRET_KEY)

    bucket = 'testbucket'  # Bucket where to read/write our ML dataset
    folder = 'imagenet-val'  # Folder where this particular dataset will be
    
    # Now that we have created and upload the dataset on CORTX we can use
    # it in Pytorch
    dataset = (RemoteDataset(client, bucket, folder)
        .decode("pil") # Decode the data as PIL images
        .to_tuple("jpg;png", "cls") # Extract images and labels from the dataset
        .map_tuple(preproc, lambda x: x) # Apply data augmentations
        .batched(64)  # Make batches of 64 images
    )
    # We create a regular pytorch data loader as we would do for regular data sets
    dataloader = ch.utils.data.DataLoader(dataset, num_workers=3, batch_size=None)
    for image, label in tqdm((x for x in dataloader), total = 100000 / 60):
        # Train / evaluate ML models on this batch of data
        pass

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cortx_pytorch-1.0.1.tar.gz (3.3 kB view details)

Uploaded Source

File details

Details for the file cortx_pytorch-1.0.1.tar.gz.

File metadata

  • Download URL: cortx_pytorch-1.0.1.tar.gz
  • Upload date:
  • Size: 3.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.7.10

File hashes

Hashes for cortx_pytorch-1.0.1.tar.gz
Algorithm Hash digest
SHA256 25e157d357c50caab16da11e41fb91e35f316345dd54299c086ffb8b76d5013a
MD5 090177cfcc96812fe1f7f4ea154b4b8d
BLAKE2b-256 c5b5d30ff15c739e75273feaf3fbcf69f595171c7c0fb6dcd662cea9c356c548

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page