Skip to main content

A Visionary Toolkit Made With Peace & Love

Project description

Overview

Introduction

logo.svg

license schedule

vkit is a toolkit designed for CV (Computer Vision) developers, especially targeting document image analysis and optical character recognition workloads:

  • Supporting rich data augmentation strategies:
    • Common photometric distortion strategies such as various colorspace manipulation methods and image noise related techniques
    • ⭐ Common geometric distortion strategies such as various affine transformations and non-linear transformations (e.g. similarity MLS, camera-model based 3D surface curving, folding effect, etc.)
    • ⭐ Simultaneously transforming labeled data while performing geometric distortion. As an example, while an image was rotated, vkit will rotate the corresponding positional label (e.g. image mask, polygons) at the same time without manual intervention.
  • Supporting comprehensive data type encapsulation and the corresponding visualization:
    • Image type (encapsulation based on PIL, supporting reading/writing various image file types)
    • Labeled data type: mask, score map, box, polygon and so on
  • Industrial-grade code quality:
    • Auto-completion and type hint friendly, making it practical to be used in production
    • Matured package and dependency management
    • Automated code style enforcement (based on flake8) and static type checker (based on pyright)

Remarks: ⭐ Highlights (features that other similar projects have not, or not elegantly supported)

Demo!

camera_cubic_curve:

home_page_camera_cubic_curve.gif

rotate:

home_page_camera_cubic_curve.gif

Objectives

The author, as a CV/NLP engineer, wishes to bring the convenience to developers in the aforementioned disciplines through this project:

  • To free developers from the tedious data governance tasks, therefore more time can be spent on actual high-value work such as the data governance strategies, model designing and fine tuning
  • To consolidate common data augmentation techniques, aiming to aid document image analysis and recognition researches, and their industrial practices. The author wishes to make the "secret sauce", i.e. the industrial grade data augmentation methods, available to public
  • To construct open-source industrial document image analysis and recognition solutions powered by vkit:
    • Distortion correction
    • Hyper resolution
    • OCR
    • Layout Analysis
    • ...

Installation

CPython version requirement: 3.8 or above

To install the stable release:

pip install vkit

To install the nightly version (tracking the latest commit in main branch):

pip install vkit-nightly

(click here to visit the nightly documentation)

Recent release plans

  • 22.2.0
    • Improve element classes design.
    • Improve element visualization.
    • Support dataset pipeline for OCR text detection
    • Support CPython 3.10
  • 22.3.0
    • Support dataset pipeline for OCR text recognition
  • 22.3.1
    • Improve documentation
    • Release resources for pipelines

Recent stable releases

  • 22.1.0
    • Use the CalVer versioning convention
    • Complete CI testing pipeline
    • Redesign project structure
    • Support font rendering
    • Add more data augmentation methods
    • Support data augmentation policy
  • 0.1.2
    • Remove strict dependency versioning
  • 0.1.1
    • User manual (English version)
    • GitHub Page for serving user manual
  • 0.1.0
    • Support CPython 3.9
    • Support CPython 3.8
    • Image type encapsulation
    • Labeled data type encapsulation
    • Common photometric distortion strategies
    • Common geometric distortion strategies
    • User manual

Communication

Your kind understanding will be greatly appreciated if the response is slow on these forums as the author is busy with his work while he cannot devote his full time into this project

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

vkit_nightly-22.703.519-py3-none-any.whl (138.3 kB view details)

Uploaded Python 3

File details

Details for the file vkit_nightly-22.703.519-py3-none-any.whl.

File metadata

File hashes

Hashes for vkit_nightly-22.703.519-py3-none-any.whl
Algorithm Hash digest
SHA256 6aae0935d0b15178fda1bcdc488b8818b717d44d2e4d4f195fe03d672fd1fba5
MD5 050d52f5825f75c60cc9d0a0fa318de5
BLAKE2b-256 cc51078cfe1418162cf1818dd00f5eeae03ce4d56d81eea7ac818d7f51353ef4

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page