Skip to main content

Unibox provides unified interface for common file operations.

Project description

unibox

Python Python PyPI Version Code style: black License: MIT

unibox provides unified interface for common file operations.

Quick Start

# pip install unibox
import unibox as ub

some common use cases of unibox includes:


loading various file types in the same way:

  • supports json, txt, images, parquet, csv, feather, ....
  • uses appropriate best practices (such as orjson package for json) for speed ups
some_dict = ub.loads("some_file.json")    # json → dict
some_list = ub.loads("some_file.txt")     # txt  → list[str]
some_img  = ub.loads("some_image.jpg")    # webp/jpg/png/..etc → PIL.Image
some_df   = ub.loads("some_data.parquet") # parquet/csv/feather → pd.Dataframe
# .... for more: see uni_loader.py#L40

saving various python data structure in the same way:

  • similar as ub.loads but also for saving files
# mostly similar as above
ub.saves(some_dict, "some_file.json")
ub.saves(some_df, "some_df.parquet")

list s3 or local directories in the same way:

  • default optional params: relative_unix=True, debug_print=True
  • optimized s3 ls speed compared to boto3
files_under_dir = ub.traverses("/home/ubuntu/data")  # list local file

# needs to have `aws configure` pre-configured
files_under_s3  = ub.traverses("s3://dataset-pixiv/resized_1572864") # list s3 files

simplified logger class for easier debug:

  • a logger with functionalities pre-configured
  • includes caller frame info, emoji warnings, datetime, and more
logger = ub.UniLogger()
logger.info("....") 
logger.warn("....")
logger.error("....")

resizing millions of images efficiently:

  • (pre-configured omitted here for simplicity)
  • also able to resize by minimum or maximum of side lengths,
# Initialize resizer
resizer = ub.UniResizer(root_dir, dst_dir,
    target_pixels=int(1024 * 1024 * 1.5),
)

# Resize the images
images_to_resize = resizer.get_resize_jobs()
resizer.execute_resize_jobs(images_to_resize)

Install

install from pypi:

pip install unibox

build from source:

git clone https://github.com/trojblue/unibox

# pip install poetry
poetry install
poetry build
pip install dist/unibox-<version number>.whl

[OLD DOC] Features

The package is designed to be running with python 3.10, but targets 3.8+ for compatibility:

CLI:

  • unibox resize <dir>: resizes a directory of images using either pillow or libvips
    • customizable size / quality / encoding (png / webp / jpeg)
  • unibox copy <dir>: an awscli-like tool for copying files with certain suffix to a new dir, keeping the same directory structure.
    • bypasses windows explorer so it's much faster.
  • unibox move <dir>: like copy, but moves instead

utils:

  • UniLogger: uniformed logger class (logger = unibox.UniLogger(), and use logger.info(...))
  • UniLoader: uniformed data loader class (unibox.loads(<filename>))
  • UniSaver: uniformed data saver class (unibox.saves(<data>, <filename>))
  • UniTraverser: uniformed directory traverser class, with callbacks in multiple stages
  • UniResizer: uniformed image resizer class, with callbacks in multiple stages

callables:

  • unibox.traverses(dir, include, exclude, relative_unix): traverse a directory using specified exclude / include extensions, and return a list of files
  • unibox.loads(filepath): load arbitrary data from a file into suitable formats, with automatic detection of file type
    • supported formats: see UniLoader class implementation
  • unibox.saves(data, filepath): saves arbitrary data to a file, with automatic detection of file type

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

unibox-0.4.0.tar.gz (22.9 kB view details)

Uploaded Source

Built Distribution

unibox-0.4.0-py3-none-any.whl (27.5 kB view details)

Uploaded Python 3

File details

Details for the file unibox-0.4.0.tar.gz.

File metadata

  • Download URL: unibox-0.4.0.tar.gz
  • Upload date:
  • Size: 22.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.9.19

File hashes

Hashes for unibox-0.4.0.tar.gz
Algorithm Hash digest
SHA256 b304cecd6bbe062f3c38a884b7525236dd5e23b6715ead75bab27425e7b371c9
MD5 25c33413c1407e0a573528ef4c9caf25
BLAKE2b-256 232b5f844033c6427f93aeee3279623afdf5f3a6fa808fbd64ae55b6f330ae09

See more details on using hashes here.

File details

Details for the file unibox-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: unibox-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 27.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.9.19

File hashes

Hashes for unibox-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 10187459bcc7642c4a7098d4034c47cadaa08640eafdf822e7899928769baef4
MD5 6877434a7a417d0e3322c4db0aa8c40a
BLAKE2b-256 c00afdf1d62323c9c88f999e6302398163ab75a7db6108f0e588a2770b21a426

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page