Unibox provides unified interface for common file operations.
Project description
unibox
unibox provides unified interface for common file operations.
Quick Start
# pip install unibox
import unibox as ub
some common use cases of unibox includes:
loading various file types in the same way:
- supports json, txt, images, parquet, csv, feather, ....
- uses appropriate best practices (such as
orjson
package for json) for speed ups
some_dict = ub.loads("some_file.json") # json → dict
some_list = ub.loads("some_file.txt") # txt → list[str]
some_img = ub.loads("some_image.jpg") # webp/jpg/png/..etc → PIL.Image
some_df = ub.loads("some_data.parquet") # parquet/csv/feather → pd.Dataframe
# .... for more: see uni_loader.py#L40
saving various python data structure in the same way:
- similar as
ub.loads
but also for saving files
ub.saves(some_dict, "some_file.json") # similar as above
ub.saves(some_df, "some_df.parquet")
list s3 or local directories in the same way:
- default optional params:
relative_unix=True, debug_print=True
- optimized
s3 ls
speed compared to boto3
files_under_dir = ub.traverses("/home/ubuntu/data") # list local file
# needs to have `aws configure` pre-configured
files_under_s3 = ub.traverses("s3://dataset-pixiv/resized_1572864") # list s3 files
simplified logger class for easier debug:
- a logger with functionalities pre-configured
- includes caller frame info, emoji warnings, datetime, and more
import unibox as ub
logger = ub.UniLogger()
def some_function():
logger.info("some info")
# logger.warning("....")
# logger.error("....")
some_function()
# 2024-05-08 17:57:23,149 [INFO] UniLogger: some_function: some info
resize millions of images efficiently:
- (pre-configured omitted here for simplicity; saves to 98% quality WEBP by default)
- also able to resize by minimum or maximum of side lengths,
# root_dir: where the images to be resized are
target_pixels = int(1024 * 1024 * 1.5)
resizer = ub.UniResizer(root_dir, dst_dir, target_pixels)
# get resize jobs
images_to_resize = resizer.get_resize_jobs()
# execute resize jobs
resizer.execute_resize_jobs(images_to_resize)
Install
install from pypi:
pip install unibox
build from source:
git clone https://github.com/trojblue/unibox
# pip install poetry
poetry install
poetry build
pip install dist/unibox-<version number>.whl
[OLD DOC] Features
The package is designed to be running with python 3.10, but targets 3.8+ for compatibility:
CLI:
unibox resize <dir>
: resizes a directory of images using eitherpillow
orlibvips
- customizable size / quality / encoding (png / webp / jpeg)
unibox copy <dir>
: an awscli-like tool for copying files with certain suffix to a new dir, keeping the same directory structure.- bypasses windows explorer so it's much faster.
unibox move <dir>
: likecopy
, but moves instead
utils:
UniLogger
: uniformed logger class (logger = unibox.UniLogger()
, and uselogger.info(...)
)UniLoader
: uniformed data loader class (unibox.loads(<filename>)
)UniSaver
: uniformed data saver class (unibox.saves(<data>, <filename>)
)UniTraverser
: uniformed directory traverser class, with callbacks in multiple stagesUniResizer
: uniformed image resizer class, with callbacks in multiple stages
callables:
unibox.traverses(dir, include, exclude, relative_unix)
: traverse a directory using specified exclude / include extensions, and return a list of filesunibox.loads(filepath)
: load arbitrary data from a file into suitable formats, with automatic detection of file type- supported formats: see UniLoader class implementation
unibox.saves(data, filepath)
: saves arbitrary data to a file, with automatic detection of file type
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
unibox-0.4.10.tar.gz
(26.2 kB
view hashes)
Built Distribution
unibox-0.4.10-py3-none-any.whl
(31.4 kB
view hashes)