Aliyun Bailian powered audio/video to DoclingDocument transcriber
Project description
docling-av-transcriber
独立的 Docling 兼容音/视频转结构化文档库,复用 Docling 文档模型,同时把底层语音识别切换为阿里云百炼 ASR(Paraformer、SenseVoice 等)。
特性
- 支持 WAV/MP3/FLAC 等音频,以及 MP4/AVI/MOV 等视频(先抽取音轨)
AliyunBailianAsrClient统一封装百炼 ASR 接口,默认使用paraformer-v1- 输出
DoclingDocument,可无缝接入 Docling 生态 - 通过环境变量
ALIYUN_BAILIAN_API_KEY管理密钥 - 结构化目录,易于扩展或替换模型
快速开始
pip install -e . # 或普通 pip install
export ALIYUN_BAILIAN_API_KEY=sk-xxxx
python examples/basic_usage.py sample.mp4 --language zh
调试模式
为了更好地诊断问题,我们提供了增强日志的调试脚本:
python examples/debug_usage.py sample.mp4 --language zh --debug
调试日志将输出到控制台和 debug_transcription.log 文件中,包含详细的处理步骤和错误信息。
模块
docling_av_transcriber.api:顶层 API (transcribe_file/transcribe_bytes)docling_av_transcriber.models:ASR 客户端抽象与百炼实现docling_av_transcriber.media:输入验证与音轨抽取docling_av_transcriber.pipelines:轻量 Pipeline 与 DoclingDocument 构建
获取抽取音轨与 Docling 文档
在 RAG / 检索场景中,通常需要同时拿到结构化文本与抽取后的 WAV 音轨以便上传 OSS 或 Supabase。
自 v0.x 起可以直接调用新的 *_with_artifacts API:
from docling_av_transcriber import transcribe_file_with_artifacts
result = transcribe_file_with_artifacts("sample.mp4", language="zh")
doc = result.document # DoclingDocument,可直接入库或向量化
wav_path = result.audio_path # 16kHz/单声道 WAV 临时文件
# 将 wav_path 上传到自定义存储后即可手动清理
# wav_path.unlink(missing_ok=True)
若是内存字节流,可调用 transcribe_bytes_with_artifacts(data, filename="input.mp3"),内部会先写入临时文件并统一转码到 16kHz/单声道 WAV。
需要注意 audio_path 来自 tempfile.mkstemp,生命周期由调用方管理:将文件上传或复制后请记得删除,以避免 /tmp 目录堆积。
测试
pip install -e .[dev]
pytest -q
发布
python -m buildtwine check dist/*twine upload dist/*
更多细节见 CHANGELOG.md 与示例脚本。
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file docling_av_transcriber-0.1.2-py3-none-any.whl.
File metadata
- Download URL: docling_av_transcriber-0.1.2-py3-none-any.whl
- Upload date:
- Size: 23.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2c9a6d09284052a393d9be1b018995ffd449afbfd09fcd10210b75a0ea5ea60d
|
|
| MD5 |
4c53ab47a7f92b239b5e17b4facc8272
|
|
| BLAKE2b-256 |
f09093c21bb627ada0638087fb9a1eb93b884d4ac07f256a65d549032ba7afbd
|