Skip to main content

A package to return files stored in Google drive and Google Cloud Storage as a torch.utils.Dataset that can be used with the torch dataloaders.

Project description

GCP-IO

Generate PyTorch datasets from files stored in Google cloud storage and Google drive.

Installation

Run the following to install this package:

pip install gcpio

Environment Variables Required

LOG_FILE_PATH: Path to store the logs in a log file
LOG_LEVEL: LOG LEVEL(Example: INFO, DEBUG, ERROR etc)
TOKEN_PATH_GDRIVE: path to the generated token.json file

Example usage:

Google Drive:

from gcpio.gdrive import Gdrive

# Get meta data of files present in a Google Drive folder
GDRIVE = Gdrive()
files=GDRIVE.get_files_metadata(folder_id="XXXXX",page_size=500,file_type="image/png",replace_query=None) # returns a dict with keys['files','len'] where files is a list of objects from the Drive folder

# Create the torch dataset from files
GDRIVE = Gdrive()
dataset = g.create_dataset(
    data_folder_id=folder_id,
    labels_folder_id=folder_id,
    page_size=1000,
    data_file_type="image/png",
    labels_file_type="text/csv",
    skip_labels=skip_labels,
)

# load this dataset into torch dataloader.
dataloader = DataLoader(
    dataset,
    batch_size,
    sampler=BatchSampler(
        SequentialSampler(dataset), batch_size=batch_size, drop_last=True
    ),
)
#get a sample batch
samples = next(iter(dataloader))
images = []
    for i in range(len(samples)):
        # ignoring labels and collecting images only for verification
        img, _ = samples[i]
        images.append(img)
    images = torch.cat(images)

grid = make_grid(images, nrow=20)
save_image(grid, f"./examples/generated_images.png")

outputs

output

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gcp-io-0.0.0.2.tar.gz (8.2 kB view hashes)

Uploaded Source

Built Distribution

gcp_io-0.0.0.2-py3-none-any.whl (7.2 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page