A package to return files stored in Google drive and Google Cloud Storage as a torch.utils.Dataset that can be used with the torch dataloaders.
Project description
GCP-IO
Generate PyTorch datasets from files stored in Google cloud storage and Google drive.
Installation
Run the following to install this package:
pip install gcpio
Environment Variables Required
LOG_FILE_PATH: Path to store the logs in a log file
LOG_LEVEL: LOG LEVEL(Example: INFO, DEBUG, ERROR etc)
TOKEN_PATH_GDRIVE: path to the generated token.json file
Example usage:
Google Drive:
from gcpio.gdrive import Gdrive
# Get meta data of files present in a Google Drive folder
GDRIVE = Gdrive()
files=GDRIVE.get_files_metadata(folder_id="XXXXX",page_size=500,file_type="image/png",replace_query=None) # returns a dict with keys['files','len'] where files is a list of objects from the Drive folder
# Create the torch dataset from files
GDRIVE = Gdrive()
dataset = g.create_dataset(
data_folder_id=folder_id,
labels_folder_id=folder_id,
page_size=1000,
data_file_type="image/png",
labels_file_type="text/csv",
skip_labels=skip_labels,
)
# load this dataset into torch dataloader.
dataloader = DataLoader(
dataset,
batch_size,
sampler=BatchSampler(
SequentialSampler(dataset), batch_size=batch_size, drop_last=True
),
)
#get a sample batch
samples = next(iter(dataloader))
images = []
for i in range(len(samples)):
# ignoring labels and collecting images only for verification
img, _ = samples[i]
images.append(img)
images = torch.cat(images)
grid = make_grid(images, nrow=20)
save_image(grid, f"./examples/generated_images.png")
outputs
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
gcp-io-0.0.0.2.tar.gz
(8.2 kB
view hashes)