Unibox provides unified interface for common file operations.
Project description
unibox
unibox provides unified interface for common file operations.
Quick Start
# pip install unibox
import unibox as ub
some common use cases of unibox includes:
loading various file types in the same way:
- supports json, txt, images, parquet, csv, feather, ....
- uses appropriate best practices (such as
orjson
package for json) for speed ups
some_dict = ub.loads("some_file.json") # json → dict
some_list = ub.loads("some_file.txt") # txt → list[str]
some_img = ub.loads("some_image.jpg") # webp/jpg/png/..etc → PIL.Image
some_df = ub.loads("some_data.parquet") # parquet/csv/feather → pd.Dataframe
# .... for more: see uni_loader.py#L40
saving various python data structure in the same way:
- similar as
ub.loads
but also for saving files
ub.saves(some_dict, "some_file.json") # similar as above
ub.saves(some_df, "some_df.parquet")
list s3 or local directories in the same way:
- default optional params:
relative_unix=True, debug_print=True
- optimized
s3 ls
speed compared to boto3
files_under_dir = ub.traverses("/home/ubuntu/data") # list local file
# needs to have `aws configure` pre-configured
files_under_s3 = ub.traverses("s3://dataset-pixiv/resized_1572864") # list s3 files
simplified logger class for easier debug:
- a logger with functionalities pre-configured
- includes caller frame info, emoji warnings, datetime, and more
import unibox as ub
logger = ub.UniLogger()
def some_function():
logger.info("some info")
# logger.warning("....")
# logger.error("....")
some_function()
# 2024-05-08 17:57:23,149 [INFO] UniLogger: some_function: some info
resize millions of images efficiently:
- (pre-configured omitted here for simplicity; saves to 98% quality WEBP by default)
- also able to resize by minimum or maximum of side lengths,
# root_dir: where the images to be resized are
target_pixels = int(1024 * 1024 * 1.5)
resizer = ub.UniResizer(root_dir, dst_dir, target_pixels)
# get resize jobs
images_to_resize = resizer.get_resize_jobs()
# execute resize jobs
resizer.execute_resize_jobs(images_to_resize)
view and label images within jupyter notebook:
import unibox as ub
uris = ["https://cdn.donmai.us/180x180/8e/ea/8eea944690c0c0b27e303420cb1e65bd.jpg"] * 9
labels = ['Image 1', 'Image 2', 'Image 3'] * 3
# label data interactively
ub.label_gallery(uris, labels)
# or: view images only
# ub.gallery(uris, labels)
Install
install from pypi:
pip install unibox
build from source:
git clone https://github.com/trojblue/unibox
# pip install poetry
poetry install
poetry build
pip install dist/unibox-<version number>.whl
[OLD DOC] Features
The package is designed to be running with python 3.10, but targets 3.8+ for compatibility:
CLI:
unibox resize <dir>
: resizes a directory of images using eitherpillow
orlibvips
- customizable size / quality / encoding (png / webp / jpeg)
unibox copy <dir>
: an awscli-like tool for copying files with certain suffix to a new dir, keeping the same directory structure.- bypasses windows explorer so it's much faster.
unibox move <dir>
: likecopy
, but moves instead
utils:
UniLogger
: uniformed logger class (logger = unibox.UniLogger()
, and uselogger.info(...)
)UniLoader
: uniformed data loader class (unibox.loads(<filename>)
)UniSaver
: uniformed data saver class (unibox.saves(<data>, <filename>)
)UniTraverser
: uniformed directory traverser class, with callbacks in multiple stagesUniResizer
: uniformed image resizer class, with callbacks in multiple stages
callables:
unibox.traverses(dir, include, exclude, relative_unix)
: traverse a directory using specified exclude / include extensions, and return a list of filesunibox.loads(filepath)
: load arbitrary data from a file into suitable formats, with automatic detection of file type- supported formats: see UniLoader class implementation
unibox.saves(data, filepath)
: saves arbitrary data to a file, with automatic detection of file type
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file unibox-0.4.13.tar.gz
.
File metadata
- Download URL: unibox-0.4.13.tar.gz
- Upload date:
- Size: 28.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.20
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1ab0aef8652c59fd7bfa4160b31e1fcafb505d04bd05ad9eb3bb1dcf4f199f80 |
|
MD5 | c3c541cf7acf1e6c9736301f72530681 |
|
BLAKE2b-256 | 1084abfe95074348baeb876c3fdd286997e4cc6c97d9755093750bbf2a94f839 |
File details
Details for the file unibox-0.4.13-py3-none-any.whl
.
File metadata
- Download URL: unibox-0.4.13-py3-none-any.whl
- Upload date:
- Size: 33.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.20
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1cc9ba4077da71603b532623278581dff2b57224b27ae8c767a6fc151e2ea0e7 |
|
MD5 | 145655ca9fadfc1891bee86d48c206ae |
|
BLAKE2b-256 | 0f58c0058b44deb989254d23f496efca1bdf4b2821527f1c0fe5636586547cef |