DataVisor a tool for producing binary aggregated data packs for machine learning trainings
Project description
📊 Visor: Efficient Dataset Management
Visor is a powerful Python library designed for efficient management and processing of large image datasets. It provides a streamlined solution for handling image data along with associated metadata and OCR information.
🚀 Features
- ✅ Efficient storage of image data, metadata, and OCR information
- ✅ Customizable image resizing options
- ✅ Fast reading and writing of large datasets
- ✅ Built-in data validation and error handling
- ✅ Easy-to-use API for dataset manipulation
- ✅ Support for compressed storage (optional)
📋 Table of Contents
🔧 Installation
currently used as direct code, however I will be making this into a pip package at some point after doing some more tests, this is a very half cooked code for the time being.
🚀 Quick Start
Here's a simple example to get you started with Visor:
from visor import Config, VisorWriter, VisorReader, ImageHandler
# Create a configuration
config = Config(
data_type='image',
max_entries_per_file=1000
)
# Write data
writer = VisorWriter(config, output_dir='./output')
handler = ImageHandler(config)
for image_path in image_paths:
with open(image_path, 'rb') as f:
image_data = f.read()
entry = handler.process(image_data, image_path, ocr_data)
writer.write_entries([entry])
writer.finalize()
# Read data
reader = VisorReader('./output/visor_metadata.json')
for entry in reader:
# Process each entry
pass
🔍 Usage
Writing Data
To write data to a Visor file:
- Create a
Config
object with your desired settings. - Initialize a
VisorWriter
with the config and output directory. - Use an
ImageHandler
to process your image data. - Write entries using the
write_entries
method. - Call
finalize()
to complete the writing process.
Reading Data
To read data from a Visor file:
- Create a
VisorReader
with the path to the metadata file. - Iterate over the reader to access entries.
- Use methods like
get_entry()
orget_metadata()
for random access.
⚙️ Configuration
The Config
class allows you to customize Visor's behavior:
data_type
: Type of data ('image' or 'text')image_dimensions
: Target dimensions for resizing (optional)max_entries_per_file
: Maximum number of entries per Visor filecompression
: Enable/disable data compression
📚 API Reference
VisorWriter
write_entries(entries)
: Write a list of entries to Visor filesfinalize()
: Complete the writing process and generate metadata
VisorReader
__iter__()
: Iterate over all entriesget_entry(index)
: Get a specific entry by indexget_metadata(index)
: Get metadata for a specific entryprint_summary()
: Display a summary of the dataset
ImageHandler
process(data, original_name, ocr_data)
: Process image data and create an Entry
🤝 Contributing
We welcome contributions to Visor! Please see our Contributing Guide for more details.
📄 License
Visor is released under the MIT License. See the LICENSE file for more details.
📧 For any questions or support, please open an issue
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file datavisor-0.3.0.tar.gz
.
File metadata
- Download URL: datavisor-0.3.0.tar.gz
- Upload date:
- Size: 16.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 98d489e5bbeaa87e0b20c02e3e3b3b4c52317dcc90d945340789aca52e351b84 |
|
MD5 | d55579dede7c916cb1f896964687948d |
|
BLAKE2b-256 | ca726815c0d6d2348da09807ca0d3aff4ef2fc3ff176fbd144598b0e8d8458de |
File details
Details for the file datavisor-0.3.0-py3-none-any.whl
.
File metadata
- Download URL: datavisor-0.3.0-py3-none-any.whl
- Upload date:
- Size: 17.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f20861ed8bb6bace514045a5d26579ae09ac1f6064cf1a6f341d44f27fcf8122 |
|
MD5 | 17d4926f4ed60e47f178c410284d04d4 |
|
BLAKE2b-256 | 4fb963b49aea93b7e3f75deddf4076c6ac6355b08187ae315fcc54c9f5ee0053 |