DataoceanAI Open-source Large Speech Model

These details have not been verified by PyPI

Project links

Homepage

Project description

Dolphin

Paper Github Huggingface Modelscope Openi Wisemodel

Dolphin is a multilingual, multitask ASR model developed through a collaboration between Dataocean AI and Tsinghua University. It supports 40 Eastern languages across East Asia, South Asia, Southeast Asia, and the Middle East, while also supporting 22 Chinese dialects. It is trained on over 210,000 hours of data, which includes both DataoceanAI's proprietary datasets and open-source datasets. The model can perform speech recognition, voice activity detection (VAD), segmentation, and language identification (LID).

🔥 News

[2026-05-09] Dolphin-CN-Dialect small/base released, including base, base.streaming, small, small.prompt, small.streaming.

Approach

Mulitask data format Dolphin largely follows the innovative design approach of Whisper and OWSM. A joint CTC-Attention architecture is adopted, with encoder based on E-Branchformer and decoder based on standard Transformer. Several key modifications are introduced for its specific focus on ASR. Dolphin does not support translation tasks, and eliminates the use of previous text and its related tokens.

A significant enhancement in Dolphin is the introduction of a two-level language token system to better handle linguistic and regional diversity, especially in Dataocean AI dataset. The first token specifies the language (e.g., <zh>, <ja>), while the second token indicates the region (e.g., <CN>, <JP>). See details in paper.

Setup

Dolphin requires FFmpeg to convert audio file to WAV format. If FFmpeg is not installed on your system, please install it first:

# Ubuntu or Debian
sudo apt update && sudo apt install ffmpeg

# MacOS
brew install ffmpeg

# Windows
choco install ffmpeg

You can install the latest version of Dolphin using the following command:

pip install -U dataoceanai-dolphin

Alternatively, it can also be installed from the source:

pip install git+https://github.com/SpeechOceanTech/Dolphin.git

Available Models and Languages

Models

There are 4 models in Dolphin, and 2 of them are available now. See details in paper.

Model	Parameters	Publicly Available
base	0.1 B	✅
small	0.4 B	✅
medium	0.9 B
large	1.7B
base.cn	0.1 B	✅
base.cn.streaming	0.1 B	✅
small.cn	0.4 B	✅
small.cn.streaming	0.4 B	✅
small.cn.prompt	0.4 B	✅

Languages

Dolphin supports 40 Eastern languages and 22 Chinese dialects. For a complete list of supported languages, see languages.md.

Supported Devices

Device Type	Support Status
CUDA	✅Supported
MPS (Apple)	✅Supported
Ascend NPU (Huawei)	✅Supported
CPU	✅Supported

To run Dolphin on Ascend NPU, you need to install the corresponding torch_npu package and configure the environment ASCEND_RT_VISIBLE_DEVICES. The tested configuration is: CANN==8.0.1, torch==2.2.0, torch_npu==2.2.0. With this setup, the model has been verified to run inference correctly on the Ascend NPU.

Usage

Command-line usage

# default model:small
dolphin audio.wav

# Download model and specify the model path
dolphin audio.wav --model small.cn

# Specify language and region
dolphin audio.wav --model small.cn --lang_sym "zh" --region_sym "CN"

# Specify the hotwords file with Encoder-biased method
dolphin audio.wav --model small.cn --hotword_list_path hotwords.txt --use_deep_biasing true

# Using prompt-based model
dolphin audio.wav --model small.cn.prompt --hotword_list_path hotwords.txt --use_prompt_hotword true --use_two_stage_filter true

Python usage

import dolphin
from dolphin import transcribe

model_name = 'small.cn'
model = dolphin.load_model(model_name, device="cuda")

result = transcribe(model, 'audio.wav')
print(result.text)

# Specify language
result = transcribe(model, 'audio.wav', lang_sym="zh")
print(result.text)

# Specify language and region and encoder-biased hotwords
result = transcribe(model, 'audio.wav', lang_sym="zh", region_sym="CN", hotwords=['诺香丹青牌科研胶囊'], use_deep_biasing=True, use_two_stage_filter=True)
print(result.text)

## prompt-based hotwords

model_name = 'small.cn.prompt'
model = dolphin.load_model(model_name, device="cuda")

result = transcribe(model, 'audio.wav', hotwords=['诺香丹青牌科研胶囊'], use_prompt_hotword=True, use_two_stage_filter=True, decoding_method='attention')

print(result.text)

Acknowledgements

Thanks to the following excellent open-source works:

License

Dolphin's code and model weights are released under the Apache 2.0 License.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

20260513

May 13, 2026

This version

20260511

May 11, 2026

20260508

May 8, 2026

20250716

Jul 16, 2025

20250519

May 19, 2025

20250515

May 15, 2025

20250507

May 7, 2025

20250409

Apr 9, 2025

20250327

Mar 27, 2025

20250317

Mar 18, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dataoceanai_dolphin-20260511.tar.gz (650.3 kB view details)

Uploaded May 11, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

dataoceanai_dolphin-20260511-py3-none-any.whl (658.8 kB view details)

Uploaded May 11, 2026 Python 3

File details

Details for the file dataoceanai_dolphin-20260511.tar.gz.

File metadata

Download URL: dataoceanai_dolphin-20260511.tar.gz
Upload date: May 11, 2026
Size: 650.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.18

File hashes

Hashes for dataoceanai_dolphin-20260511.tar.gz
Algorithm	Hash digest
SHA256	`faa665929ede969b7e99b4e15ad2412795ff28929386b8f83984239c0dd1d715`
MD5	`f7e3fde811edc07645df2085b5e4bec2`
BLAKE2b-256	`c7c519d016f2fcc26c3bf47e7567a515f0719f26b005b00120eab020530b218b`

See more details on using hashes here.

File details

Details for the file dataoceanai_dolphin-20260511-py3-none-any.whl.

File metadata

Download URL: dataoceanai_dolphin-20260511-py3-none-any.whl
Upload date: May 11, 2026
Size: 658.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.18

File hashes

Hashes for dataoceanai_dolphin-20260511-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2d98fdd616faba4d6cb82c83f2d81aa78a57dce697b98a1c4d65e39ea0b7873a`
MD5	`c0c6c0e55f359e62f8548aae2273c001`
BLAKE2b-256	`1bd519f58f60b353328d8d57e591350c6de828a4a2f9730164f03226e6ff5b06`

See more details on using hashes here.

dataoceanai-dolphin 20260511

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

Dolphin

🔥 News

Approach

Setup

Available Models and Languages

Models

Languages

Supported Devices

Usage

Command-line usage

Python usage

Acknowledgements

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes