Skip to main content

CVAT annotation tools for Docling document processing and evaluation

Project description

docling-cvat-tools

CVAT annotation tools for Docling document processing and evaluation.

This package provides comprehensive tools for working with CVAT (Computer Vision Annotation Tool) annotations in the context of Docling document processing and evaluation workflows.

Features

  • CVAT XML Parsing: Parse and validate CVAT XML annotation files
  • Document Conversion: Convert CVAT annotations to DoclingDocument format
  • Validation: Validate CVAT annotations for correctness and completeness
  • Visualization: Generate HTML visualizations of annotated documents
  • CLI Tools: Command-line utilities for common CVAT workflows

Installation

pip install docling-cvat-tools

Or install as an optional dependency of docling-eval:

pip install "docling-eval[campaign-tools]"

Requirements

  • Python >=3.10,<4.0
  • docling-core (document types)
  • docling (for document processing)

Usage

CLI Tools

Validate CVAT annotations

docling-cvat-validator path/to/annotations.xml

Convert CVAT to DoclingDocument

docling-cvat-to-docling --input_path path/to/cvat_folder --output-dir output/

Python API

from docling_cvat_tools.cvat_tools.parser import parse_cvat_file
from docling_cvat_tools.cvat_tools.cvat_to_docling import convert_cvat_to_docling
from docling_cvat_tools.cvat_tools.validator import validate_cvat_sample

# Parse CVAT XML file
parsed = parse_cvat_file(Path("annotations.xml"))

# Validate annotations
validation_result = validate_cvat_sample(
    xml_path=Path("annotations.xml"),
    image_filename="page_000001.png"
)

# Convert CVAT folder to DoclingDocuments
results = convert_cvat_to_docling(
    xml_path=Path("annotations.xml"),
    input_path=Path("document.pdf"),
    image_identifier="page_000001.png",
    output_dir=Path("output")
)

Integration with docling-eval

This package is designed to work seamlessly with docling-eval. When installed as an optional dependency, it enables CVAT-specific features in the evaluation framework:

  • CVAT dataset builders (CvatDatasetBuilder, CvatPreannotationBuilder)
  • CVAT evaluation pipelines

Package Structure

  • docling_cvat_tools.cvat_tools: Core CVAT parsing, conversion, and validation
  • docling_cvat_tools.datamodels: CVAT-specific data models
  • docling_cvat_tools.visualisation: HTML visualization utilities
  • docling_cvat_tools.cli: Command-line interface tools
  • docling_cvat_tools.utils: Utility functions

Development

# Install in development mode
uv sync

# Run tests
uv run pytest

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

docling_cvat_tools-0.0.1.tar.gz (80.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

docling_cvat_tools-0.0.1-py3-none-any.whl (81.6 kB view details)

Uploaded Python 3

File details

Details for the file docling_cvat_tools-0.0.1.tar.gz.

File metadata

  • Download URL: docling_cvat_tools-0.0.1.tar.gz
  • Upload date:
  • Size: 80.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for docling_cvat_tools-0.0.1.tar.gz
Algorithm Hash digest
SHA256 9d9fab119d8db24d46bd73a10fcd5eebfaa604f0d67e9c6d37762904695bc864
MD5 7b70c2575da03a0bad9703c48be94f2f
BLAKE2b-256 b0418a2e84ae30b83d134aad88cb59be63e1f587f524b406cb9efdd669e67e40

See more details on using hashes here.

Provenance

The following attestation bundles were made for docling_cvat_tools-0.0.1.tar.gz:

Publisher: pypi.yml on docling-project/docling-cvat-tools

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file docling_cvat_tools-0.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for docling_cvat_tools-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 bc966f2854e26a2dd64ccd7977f8ded8f2651bf8436dcea222120942bc3108c6
MD5 3e02a1a9b1a34dce60d7f6c7218942a1
BLAKE2b-256 0a67600cd20ddcbb53fa2867a45d9854005817cdd925760261dc5ae4fa4a7cae

See more details on using hashes here.

Provenance

The following attestation bundles were made for docling_cvat_tools-0.0.1-py3-none-any.whl:

Publisher: pypi.yml on docling-project/docling-cvat-tools

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page