Skip to main content

Export validated JSON to standard formats.

Project description

Dorsal

Stop writing custom parsers. Export AI-extracted JSON to industry-standard formats instantly.

PyPI version codecov License

Dorsal Adapters translates validated JSON records into various industry-standard formats.

Supported Formats

Currently supports two-way conversion (exporting and parsing) for the following domains and formats:

Document Extraction (open/document-extraction)

Convert complex spatial bounding boxes, text blocks, and multi-polygons into layout-aware formats:

  • md: RAG-Optimized Markdown — Injects semantic headings, hallucination warnings, and visual placeholders directly into the text stream for LLM consumption.
  • html: Semantic HTML (.html) — Renders a responsive, visually inferred 2D DOM layout from raw spatial coordinates.
  • hocr: hOCR (.hocr.html) — An industry-standard OCR output format embedding layout, confidence scores, and style information in standard HTML.
  • tsv: Tab-Separated Values — Perfect for spreadsheet ingestion and tabular data analysis.
  • txt: Plain Text — Flattens the document layout into clean, stitched paragraphs.

Audio Transcription (open/audio-transcription)

Convert rich transcription data (including speaker diarization, non-verbal events, and timestamps) into standard media formats:

  • srt: SubRip Text (.srt) — The most widely used plaintext subtitle format.
  • vtt: WebVTT (.vtt) — The W3C standard web subtitle format for HTML5 video players.
  • md: RAG-Optimized Markdown — Merges speaker tags, non-verbal events (e.g., [laughter]), and low-confidence warnings into clean markdown.
  • tsv: Tab-Separated Values — Organizes segments, start/end times, and speakers into a neat table.
  • txt: Plain Text — A continuous, readable transcript.

Installation

Dorsal Adapters is available on PyPI as dorsalhub-adapters:

pip install dorsalhub-adapters

Usage

Adapters are Python classes with methods for exporting to and parsing from the supported file formats:

  • export(record) / export_file(record, fp): Converts a JSON record into a standard format.
  • parse(content) / parse_file(fp): Best-effort conversion from a standard format back into a Dorsal JSON Record.

Example: Audio to Subtitles (SRT)

In this example, a valid open/audio-transcription record is converted into a subtitle file.

from dorsal_adapters.registry import get_adapter

# 1. The raw JSON record from your model
transcription = {
    "track_id": 1,
    "language": "eng",
    "segments": [
        {
            "start_time": 0.5,
            "end_time": 4.75,
            "text": "Welcome back! Today, my guest is the renowned chef, Jean-Pierre."
        }
    ]
}

# 2. Retrieve the adapter for the schema and target format
adapter = get_adapter("audio-transcription", "srt")

# 3. Export to the target format (.srt)
srt_string = adapter.export(transcription)
print(srt_string)

# 4. Parse the formatted string back into a Dorsal record
parsed_record = adapter.parse(srt_string)

Tip: You can programmatically check what formats are supported for a given schema using list_formats:

from dorsal_adapters.registry import list_formats
print(list_formats("document-extraction"))

Contributing

We welcome contributions! If you have written a translation script for an Open Validation Schema, please open a PR.

License

Dorsal Adapters is open source and provided under the Apache 2.0 license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dorsalhub_adapters-0.2.0.tar.gz (37.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dorsalhub_adapters-0.2.0-py3-none-any.whl (39.1 kB view details)

Uploaded Python 3

File details

Details for the file dorsalhub_adapters-0.2.0.tar.gz.

File metadata

  • Download URL: dorsalhub_adapters-0.2.0.tar.gz
  • Upload date:
  • Size: 37.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.5.11

File hashes

Hashes for dorsalhub_adapters-0.2.0.tar.gz
Algorithm Hash digest
SHA256 c91e911181cbb5c07321c3178004710f42aec04d5f4918e70fa4985449853eac
MD5 bb11612cf1c91e0ddf62ac773a04c67c
BLAKE2b-256 1de89a54670da1034d4d80fffc7d7d68ac1e6cb066b9f3a1913f3e0fe96a9a3c

See more details on using hashes here.

File details

Details for the file dorsalhub_adapters-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for dorsalhub_adapters-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 279ab20265b49a190ed708ba6e1e55b9088318d6304f79dcc215fa18c2acd927
MD5 1403acbd94ed7b21e6c909216727678c
BLAKE2b-256 6e6580ac85176c7b22e71803bae14c89cc9b551c5032f3dd669a6926b1b454b1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page