Skip to main content

Aliyun Bailian powered audio/video to DoclingDocument transcriber

Project description

docling-av-transcriber

独立的 Docling 兼容音/视频转结构化文档库,复用 Docling 文档模型,同时把底层语音识别切换为阿里云百炼 ASR(Paraformer、SenseVoice 等)。

特性

  • 支持 WAV/MP3/FLAC 等音频,以及 MP4/AVI/MOV 等视频(先抽取音轨)
  • AliyunBailianAsrClient 统一封装百炼 ASR 接口,默认使用 paraformer-v1
  • 输出 DoclingDocument,可无缝接入 Docling 生态
  • 通过环境变量 ALIYUN_BAILIAN_API_KEY 管理密钥
  • 结构化目录,易于扩展或替换模型

快速开始

pip install -e .  # 或普通 pip install
export ALIYUN_BAILIAN_API_KEY=sk-xxxx
python examples/basic_usage.py sample.mp4 --language zh

调试模式

为了更好地诊断问题,我们提供了增强日志的调试脚本:

python examples/debug_usage.py sample.mp4 --language zh --debug

调试日志将输出到控制台和 debug_transcription.log 文件中,包含详细的处理步骤和错误信息。

模块

  • docling_av_transcriber.api:顶层 API (transcribe_file/transcribe_bytes)
  • docling_av_transcriber.models:ASR 客户端抽象与百炼实现
  • docling_av_transcriber.media:输入验证与音轨抽取
  • docling_av_transcriber.pipelines:轻量 Pipeline 与 DoclingDocument 构建

获取抽取音轨与 Docling 文档

在 RAG / 检索场景中,通常需要同时拿到结构化文本与抽取后的 WAV 音轨以便上传 OSS 或 Supabase。
自 v0.x 起可以直接调用新的 *_with_artifacts API:

from docling_av_transcriber import transcribe_file_with_artifacts

result = transcribe_file_with_artifacts("sample.mp4", language="zh")
doc = result.document          # DoclingDocument,可直接入库或向量化
wav_path = result.audio_path   # 16kHz/单声道 WAV 临时文件

# 将 wav_path 上传到自定义存储后即可手动清理
# wav_path.unlink(missing_ok=True)

若是内存字节流,可调用 transcribe_bytes_with_artifacts(data, filename="input.mp3"),内部会先写入临时文件并统一转码到 16kHz/单声道 WAV。
需要注意 audio_path 来自 tempfile.mkstemp,生命周期由调用方管理:将文件上传或复制后请记得删除,以避免 /tmp 目录堆积。

测试

pip install -e .[dev]
pytest -q

发布

  1. python -m build
  2. twine check dist/*
  3. twine upload dist/*

更多细节见 CHANGELOG.md 与示例脚本。

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

docling_av_transcriber-0.1.1.tar.gz (7.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

docling_av_transcriber-0.1.1-py3-none-any.whl (6.3 kB view details)

Uploaded Python 3

File details

Details for the file docling_av_transcriber-0.1.1.tar.gz.

File metadata

  • Download URL: docling_av_transcriber-0.1.1.tar.gz
  • Upload date:
  • Size: 7.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.8

File hashes

Hashes for docling_av_transcriber-0.1.1.tar.gz
Algorithm Hash digest
SHA256 a93ad9721a18b2e14f09f6701aca9b0022486b4920a459fb1f1802e5f7bdbf20
MD5 570ac0fcbae929775beb23ae78094576
BLAKE2b-256 2e2e480a0d6974603f1300b800b4805960f6a60292bbbb2ca6c87707d7b82c80

See more details on using hashes here.

File details

Details for the file docling_av_transcriber-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for docling_av_transcriber-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 bf69ef1f9dd548c42dd2bd0d2adf7ef1759451b0317e434cf18e09af71741eeb
MD5 e9eb74338f5bc21b52a3d1b00d5db8a6
BLAKE2b-256 c9fcba08e82e869feddd2f40eecf8f9445f35b42a98e91b454f3593e69ac6606

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page