Skip to main content

DataVisor a tool for producing binary aggregated data packs for machine learning trainings

Project description

📊 Visor: Efficient Dataset Management

Visor Logo

Python Version License Build Status

Visor is a powerful Python library designed for efficient management and processing of large image datasets. It provides a streamlined solution for handling image data along with associated metadata and OCR information.

🚀 Features

  • ✅ Efficient storage of image data, metadata, and OCR information
  • ✅ Customizable image resizing options
  • ✅ Fast reading and writing of large datasets
  • ✅ Built-in data validation and error handling
  • ✅ Easy-to-use API for dataset manipulation
  • ✅ Support for compressed storage (optional)

📋 Table of Contents

🔧 Installation

currently used as direct code, however I will be making this into a pip package at some point after doing some more tests, this is a very half cooked code for the time being.

🚀 Quick Start

Here's a simple example to get you started with Visor:

from visor import Config, VisorWriter, VisorReader, ImageHandler

# Create a configuration
config = Config(
    data_type='image',
    max_entries_per_file=1000
)

# Write data
writer = VisorWriter(config, output_dir='./output')
handler = ImageHandler(config)

for image_path in image_paths:
    with open(image_path, 'rb') as f:
        image_data = f.read()
    entry = handler.process(image_data, image_path, ocr_data)
    writer.write_entries([entry])

writer.finalize()

# Read data
reader = VisorReader('./output/visor_metadata.json')
for entry in reader:
    # Process each entry
    pass

🔍 Usage

Writing Data

To write data to a Visor file:

  1. Create a Config object with your desired settings.
  2. Initialize a VisorWriter with the config and output directory.
  3. Use an ImageHandler to process your image data.
  4. Write entries using the write_entries method.
  5. Call finalize() to complete the writing process.

Reading Data

To read data from a Visor file:

  1. Create a VisorReader with the path to the metadata file.
  2. Iterate over the reader to access entries.
  3. Use methods like get_entry() or get_metadata() for random access.

⚙️ Configuration

The Config class allows you to customize Visor's behavior:

  • data_type: Type of data ('image' or 'text')
  • image_dimensions: Target dimensions for resizing (optional)
  • max_entries_per_file: Maximum number of entries per Visor file
  • compression: Enable/disable data compression

📚 API Reference

VisorWriter

  • write_entries(entries): Write a list of entries to Visor files
  • finalize(): Complete the writing process and generate metadata

VisorReader

  • __iter__(): Iterate over all entries
  • get_entry(index): Get a specific entry by index
  • get_metadata(index): Get metadata for a specific entry
  • print_summary(): Display a summary of the dataset

ImageHandler

  • process(data, original_name, ocr_data): Process image data and create an Entry

🤝 Contributing

We welcome contributions to Visor! Please see our Contributing Guide for more details.

📄 License

Visor is released under the MIT License. See the LICENSE file for more details.


📧 For any questions or support, please open an issue

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datavisor-0.3.0.tar.gz (16.5 kB view details)

Uploaded Source

Built Distribution

datavisor-0.3.0-py3-none-any.whl (17.6 kB view details)

Uploaded Python 3

File details

Details for the file datavisor-0.3.0.tar.gz.

File metadata

  • Download URL: datavisor-0.3.0.tar.gz
  • Upload date:
  • Size: 16.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.10

File hashes

Hashes for datavisor-0.3.0.tar.gz
Algorithm Hash digest
SHA256 98d489e5bbeaa87e0b20c02e3e3b3b4c52317dcc90d945340789aca52e351b84
MD5 d55579dede7c916cb1f896964687948d
BLAKE2b-256 ca726815c0d6d2348da09807ca0d3aff4ef2fc3ff176fbd144598b0e8d8458de

See more details on using hashes here.

File details

Details for the file datavisor-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: datavisor-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 17.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.10

File hashes

Hashes for datavisor-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f20861ed8bb6bace514045a5d26579ae09ac1f6064cf1a6f341d44f27fcf8122
MD5 17d4926f4ed60e47f178c410284d04d4
BLAKE2b-256 4fb963b49aea93b7e3f75deddf4076c6ac6355b08187ae315fcc54c9f5ee0053

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page