Skip to main content

DataoceanAI Open-source Large Speech Model

Project description

Dolphin

Paper Github Huggingface Modelscope Openi Wisemodel

Dolphin is a multilingual, multitask ASR model developed through a collaboration between Dataocean AI and Tsinghua University. It supports 40 Eastern languages across East Asia, South Asia, Southeast Asia, and the Middle East, while also supporting 22 Chinese dialects. It is trained on over 210,000 hours of data, which includes both DataoceanAI's proprietary datasets and open-source datasets. The model can perform speech recognition, voice activity detection (VAD), segmentation, and language identification (LID).

Approach

Mulitask data format Dolphin largely follows the innovative design approach of Whisper and OWSM. A joint CTC-Attention architecture is adopted, with encoder based on E-Branchformer and decoder based on standard Transformer. Several key modifications are introduced for its specific focus on ASR. Dolphin does not support translation tasks, and eliminates the use of previous text and its related tokens.

A significant enhancement in Dolphin is the introduction of a two-level language token system to better handle linguistic and regional diversity, especially in Dataocean AI dataset. The first token specifies the language (e.g., <zh>, <ja>), while the second token indicates the region (e.g., <CN>, <JP>). See details in paper.

Setup

Dolphin requires FFmpeg to convert audio file to WAV format. If FFmpeg is not installed on your system, please install it first:

# Ubuntu or Debian
sudo apt update && sudo apt install ffmpeg

# MacOS
brew install ffmpeg

# Windows
choco install ffmpeg

You can install the latest version of Dolphin using the following command:

pip install -U dataoceanai-dolphin

Alternatively, it can also be installed from the source:

pip install git+https://github.com/SpeechOceanTech/Dolphin.git 

Available Models and Languages

Models

There are 4 models in Dolphin, and 2 of them are available now. See details in paper.

Model Parameters Publicly Available
base 0.1 B
small 0.4 B
medium 0.9 B
large 1.7B
base.fangyan 0.1 B
base.fangyan.streaming 0.1 B
small.fangyan 0.4 B
small.fangyan.streaming 0.4 B
small.fangyan.prompt 0.4 B

Languages

Dolphin supports 40 Eastern languages and 22 Chinese dialects. For a complete list of supported languages, see languages.md.

Supported Devices

Device Type Support Status
CUDA ✅Supported
MPS (Apple) ✅Supported
Ascend NPU (Huawei) ✅Supported
CPU ✅Supported

To run Dolphin on Ascend NPU, you need to install the corresponding torch_npu package and configure the environment ASCEND_RT_VISIBLE_DEVICES. The tested configuration is: CANN==8.0.1, torch==2.2.0, torch_npu==2.2.0. With this setup, the model has been verified to run inference correctly on the Ascend NPU.

Usage

Command-line usage

dolphin audio.wav

# Download model and specify the model path
dolphin audio.wav --model small --model_dir /data/models/dolphin/

# Specify language and region
dolphin audio.wav --model small --model_dir /data/models/dolphin/ --lang_sym "zh" --region_sym "CN"

# padding speech to 30 seconds
dolphin audio.wav --model small --model_dir /data/models/dolphin/ --lang_sym "zh" --region_sym "CN" --padding_speech true

Python usage

import dolphin

waveform = dolphin.load_audio("audio.wav")
model = dolphin.load_model("small", "/data/models/dolphin", "cuda")
result = model(waveform)

# Specify language
result = model(waveform, lang_sym="zh")

# Specify language and region
result = model(waveform, lang_sym="zh", region_sym="CN")
print(result.text)

License

Dolphin's code and model weights are released under the Apache 2.0 License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dataoceanai_dolphin-20260508.tar.gz (649.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dataoceanai_dolphin-20260508-py3-none-any.whl (658.4 kB view details)

Uploaded Python 3

File details

Details for the file dataoceanai_dolphin-20260508.tar.gz.

File metadata

  • Download URL: dataoceanai_dolphin-20260508.tar.gz
  • Upload date:
  • Size: 649.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.18

File hashes

Hashes for dataoceanai_dolphin-20260508.tar.gz
Algorithm Hash digest
SHA256 332ef83810b769aded34a368b8973572d0e4bcda9dcc45dbed48ce84ec5a3edb
MD5 890b23b3186a81ceaf79b806bc6e8481
BLAKE2b-256 bcfde070ed18733fdb7bdf229f5f198aa2ee3238bd4a88700b04d0aa510d6fe7

See more details on using hashes here.

File details

Details for the file dataoceanai_dolphin-20260508-py3-none-any.whl.

File metadata

File hashes

Hashes for dataoceanai_dolphin-20260508-py3-none-any.whl
Algorithm Hash digest
SHA256 f1d1ce9a1787c49def3e09c6f36159dc9f0bd625541a29e8a86524907242a48a
MD5 210e780f6bdc4c9f7cc9c623836c79f9
BLAKE2b-256 09275d4f0f010871d64a467202bebcaf29eb7958d71dee3cc0dd734488414ead

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page