Export validated JSON to standard formats.
Project description
Stop writing custom parsers. Export AI-extracted JSON to industry-standard formats instantly.
Dorsal Adapters translates validated JSON records into various industry-standard formats.
Supported Formats
Currently supports two-way conversion (exporting and parsing) for the following domains and formats:
Document Extraction (open/document-extraction)
Convert complex spatial bounding boxes, text blocks, and multi-polygons into layout-aware formats:
md: RAG-Optimized Markdown — Injects semantic headings, hallucination warnings, and visual placeholders directly into the text stream for LLM consumption.html: Semantic HTML (.html) — Renders a responsive, visually inferred 2D DOM layout from raw spatial coordinates.hocr: hOCR (.hocr.html) — An industry-standard OCR output format embedding layout, confidence scores, and style information in standard HTML.tsv: Tab-Separated Values — Perfect for spreadsheet ingestion and tabular data analysis.txt: Plain Text — Flattens the document layout into clean, stitched paragraphs.
Audio Transcription (open/audio-transcription)
Convert rich transcription data (including speaker diarization, non-verbal events, and timestamps) into standard media formats:
srt: SubRip Text (.srt) — The most widely used plaintext subtitle format.vtt: WebVTT (.vtt) — The W3C standard web subtitle format for HTML5 video players.md: RAG-Optimized Markdown — Merges speaker tags, non-verbal events (e.g.,[laughter]), and low-confidence warnings into clean markdown.tsv: Tab-Separated Values — Organizes segments, start/end times, and speakers into a neat table.txt: Plain Text — A continuous, readable transcript.
Installation
Dorsal Adapters is available on PyPI as dorsalhub-adapters:
pip install dorsalhub-adapters
Usage
Adapters are Python classes with methods for exporting to and parsing from the supported file formats:
export(record)/export_file(record, fp): Converts a JSON record into a standard format.parse(content)/parse_file(fp): Best-effort conversion from a standard format back into a Dorsal JSON Record.
Example: Audio to Subtitles (SRT)
In this example, a valid open/audio-transcription record is converted into a subtitle file.
from dorsal_adapters.registry import get_adapter
# 1. The raw JSON record from your model
transcription = {
"track_id": 1,
"language": "eng",
"segments": [
{
"start_time": 0.5,
"end_time": 4.75,
"text": "Welcome back! Today, my guest is the renowned chef, Jean-Pierre."
}
]
}
# 2. Retrieve the adapter for the schema and target format
adapter = get_adapter("audio-transcription", "srt")
# 3. Export to the target format (.srt)
srt_string = adapter.export(transcription)
print(srt_string)
# 4. Parse the formatted string back into a Dorsal record
parsed_record = adapter.parse(srt_string)
Tip: You can programmatically check what formats are supported for a given schema using list_formats:
from dorsal_adapters.registry import list_formats
print(list_formats("document-extraction"))
Contributing
We welcome contributions! If you have written a translation script for an Open Validation Schema, please open a PR.
License
Dorsal Adapters is open source and provided under the Apache 2.0 license.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dorsalhub_adapters-0.2.0.tar.gz.
File metadata
- Download URL: dorsalhub_adapters-0.2.0.tar.gz
- Upload date:
- Size: 37.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.5.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c91e911181cbb5c07321c3178004710f42aec04d5f4918e70fa4985449853eac
|
|
| MD5 |
bb11612cf1c91e0ddf62ac773a04c67c
|
|
| BLAKE2b-256 |
1de89a54670da1034d4d80fffc7d7d68ac1e6cb066b9f3a1913f3e0fe96a9a3c
|
File details
Details for the file dorsalhub_adapters-0.2.0-py3-none-any.whl.
File metadata
- Download URL: dorsalhub_adapters-0.2.0-py3-none-any.whl
- Upload date:
- Size: 39.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.5.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
279ab20265b49a190ed708ba6e1e55b9088318d6304f79dcc215fa18c2acd927
|
|
| MD5 |
1403acbd94ed7b21e6c909216727678c
|
|
| BLAKE2b-256 |
6e6580ac85176c7b22e71803bae14c89cc9b551c5032f3dd669a6926b1b454b1
|