A Virtualitics S3 Utility Library with Local File System Mirror.

These details have not been verified by PyPI

Project description

Virt-S3 🪣

A Virtualitics utility package to handle file I/O with Object Storage Systems like AWS S3 and Minio.

With versatility in mind, virt-s3 was designed to be a relatively lightweight package that can either used independently or in conjunction with the larger Virtualitics AI platform. The virt-s3 module includes two primary submodules s3 and fs that implement each API function of the virt-s3 module specific to the target system: either S3/S3-like systems or local file systems.

We hope that you can use it, break it, and even help us improve it!

Prerequisites
Example Usage
Getting Started
Documentation

Prerequisites

Requires python>=3.11
Local File System features currently only support posix / pathing (Linux, Mac, etc.)
- Support for Windows \ pathing [Coming Soon]

Example Usage

Writing a File

import virt_s3
import pandas as pd
from io import BytesIO

# get default params (LocalFS or S3 determined by env variables)
# can also explicitly create instance of LocalFSParams or S3Params
params = virt_s3.get_default_params()

# test data -- write to in-memory buffer
df = pd.DataFrame([{'a': 1, 'b': 2}])
buffer = BytesIO()
df.to_csv(buffer, index=None)

# use context manager to manage session scope
with virt_s3.SessionManager(params=params) as session:
    # create bucket
    virt_s3.create_bucket('test-bucket', params=params, client=session)

    # upload data
    path = f"fixture/data/data.csv"
    saved_key = virt_s3.upload_data(buffer.getbuffer(), path, params=params, client=session)

Reading a File

import virt_s3
import pandas as pd

# get default params (LocalFS or S3 determined by env variables)
# can also explicitly create instance of LocalFSParams or S3Params
params = virt_s3.get_default_params()

# use context manager to manage session scope
with virt_s3.SessionManager(params=params) as session:
    # download data
    data = virt_s3.get_file(saved_key, bytes_io=True, params=params, client=session)
    df = pd.read_csv(data)

Getting Started

Create a fresh virtual environment with python >= 3.11
Install the necessary dependencies

$ pip install poetry
$ poetry install
$ pip show virt-s3
$ poetry install -E s3 -E dataframe -E image -E test -E docs

Note: the last command above will install the optional extra dependencies needed to do the following
- s3 = installs dependencies required to interact with object stores like Minio/S3 (primarily relying on boto3)
- dataframe = installs dependencies required for using numpy, pandas, and pyarrow dataframe/parquet operations
- image = installs dependencies required to utilize image operations (e.g. get file as an image)
- test = installs pytest related dependencies for testing
- docs = installs sphinx documentation generation dependencies
- e.g. If you want to use virt_s3, but can't install pandas or pyarrow in your restricted environment, then you can simply install virt_s3 without the dataframe extra dependencies. You won't be able to use virt_s3.extras.CSVFileValidator, virt_s3.extras.ParquetFileValidator, read_parquet_file_df, and write_parquet_file_df but these are also not necessarily core functions of the library (therefore extras).

Make sure the following environment variables are set

#########################################
# Required Custom Environment Variables #
#########################################
LOCAL_FS_USER=<your username>
# use the local fs mirror or s3/minio: 1 = True, 0 = False
LOCAL_FS=0
LOCAL_FS_ROOT_DIR=</path/to/your/data/dir/>

########################################################
# Required Virtualitics Platform Environment Variables #
########################################################
# e.g. http://mock-s3:9000 or http://localhost:9000
S3_URL=<your s3/minio url>
S3_DEFAULT_BUCKET=test-buck<your bucket name>
AWS_SECRET_ACCESS_KEY=<your aws secret access key>
AWS_ACCESS_KEY_ID=<your aws access key id>
# e.g. us-east-1
AWS_REGION=<your aws region>

Note: S3_URL can be replaced with a localhost url (e.g. http://localhost:9000) if not being run within a docker container

Run the above example usage

Code Documentation

API	Description
`PredictConnectionStoreParams`	Dataclass for Predict Connection Store Parameters
`S3Params`	Dataclass for S3 Boto3 Connection Parameters
`SessionManager`	General Session Context Manager for virt_s3 repo
`TransferConfig`	boto3.s3.TransferConfig used to configure higher throughput upload/download functions
`LocalFSParams`	Dataclass for Using Local File System for all S3 Calls
`ImageFormatType`	Enum class type for Image Format Types
`get_default_params()`	Function to get default parameters to use for all functions (default behavior is based off of ENV variables)
`get_session_client()`	Function to get session client based on passed in `S3Params` or `LocalFSParams`
`create_bucket()`	Function to create a bucket to read and write from
`get_file_chunked()`	Function to get a file using a chunking loop. This can be useful when trying to retrieve very large files
`get_file()`	Function to retrieve specified file as in-memory object
`get_image()`	Function to get image from either s3 or local file system
`get_files_generator()`	Generator function to quickly loop through reading a list of keys or file paths
`get_files_batch()`	Function to get list of file paths or key paths in batch
`list_dirs()`	Function to list valid 'folders' within 'bucket'
`get_valid_file_paths()`	Function to get list of valid file paths or keys within particular directory of bucket
`file_exists()`	Function to see if key or file path exists in bucket
`upload_data()`	Function to upload data to S3 or local file system
`delete_file()`	Function to delete a file from s3 or local file system
`delete_files_by_dir()`	Function to delete all files and subdirectories, etc. in a given folder
`archive_zip_as_buffer()`	Function to create a zip archive from dictionary of expected archive filepaths and data bytes
`extract_compressed_file()`	Function to extract zip file contents into bucket
`format_bytes()`	Funtion to take as input a number of bytes and return a formatted string for B, KB, MB, GB
`read_parquet_file_df()`	Convenience function to read parquet file as pandas DataFrame
`write_parquet_file_df()`	Convenience function to write pandas DataFrame to parquet file

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.1.6

May 1, 2026

0.1.5

Jun 5, 2025

0.1.4

Dec 5, 2024

0.1.3

Dec 5, 2024

0.1.2

Dec 3, 2024

0.1.1

Dec 3, 2024

This version

0.1.0

Dec 3, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

virt_s3-0.1.0.tar.gz (40.9 kB view details)

Uploaded Dec 3, 2024 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

virt_s3-0.1.0-py3-none-any.whl (48.5 kB view details)

Uploaded Dec 3, 2024 Python 3

File details

Details for the file virt_s3-0.1.0.tar.gz.

File metadata

Download URL: virt_s3-0.1.0.tar.gz
Upload date: Dec 3, 2024
Size: 40.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.7.1 CPython/3.11.6 Windows/10

File hashes

Hashes for virt_s3-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`918dacfb4d05de1c677a85998a8a0b72e53084d91f6615095c69414b5c2a26bb`
MD5	`fe03030829616a74ace9c994f7e97f47`
BLAKE2b-256	`fe3086ca6da8c93504d6a2d3ae5f03e5003460e3dc8fc95bdc817b3da03211e0`

See more details on using hashes here.

File details

Details for the file virt_s3-0.1.0-py3-none-any.whl.

File metadata

Download URL: virt_s3-0.1.0-py3-none-any.whl
Upload date: Dec 3, 2024
Size: 48.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.7.1 CPython/3.11.6 Windows/10

File hashes

Hashes for virt_s3-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7009363c3ae37509709a8b6a093943667f3943eb877bc97b1e8840675766754c`
MD5	`635a20ffed8d9a5a23dd7c07f0f925ea`
BLAKE2b-256	`dc3b89c2aec605384807620fd0d64a5029e2b9c78b5e50e2ef0ff14d6b9a66bd`

See more details on using hashes here.

virt-s3 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

Virt-S3 🪣

Table of Contents

Prerequisites

Example Usage

Writing a File

Reading a File

Getting Started

Code Documentation

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes