Python toolkit for the Qwen3-ASR API—parallel high‑throughput calls, robust long‑audio transcription, multi‑sample‑rate support.

These details have not been verified by PyPI

Project links

Homepage

Project description

Qwen3-ASR-Toolkit

An advanced, high-performance Python command-line toolkit for using the Qwen-ASR API (formerly Qwen3-ASR-Flash). This implementation overcomes the API's 3-minute audio length limitation by intelligently splitting long audio/video files and processing them in parallel, enabling rapid transcription of hours-long content.

🚀 Key Features

Break the 3-Minute Limit: Seamlessly transcribe audio and video files of any length by bypassing the official API's duration constraint.
Smart Audio Splitting: Utilizes Voice Activity Detection (VAD) to split audio into meaningful chunks at natural silent pauses. This ensures that words and sentences are not awkwardly cut off.
High-Speed Parallel Processing: Leverages multi-threading to send audio chunks to the Qwen-ASR API concurrently, dramatically reducing the total transcription time for long files.
Intelligent Post-Processing: Automatically detects and removes common ASR hallucinations and repetitive artifacts for cleaner, more accurate transcripts.
Automatic Audio Resampling: Automatically converts audio from any sample rate and channel count to the 16kHz mono format required by the Qwen-ASR API. You can use any audio file without worrying about pre-processing.
Universal Media Support: Supports virtually any audio and video format (e.g., .mp4, .mov, .mkv, .mp3, .wav, .m4a) thanks to its reliance on FFmpeg.
Simple & Easy to Use: A straightforward command-line interface allows you to get started with just a single command.

⚙️ How It Works

This tool follows a robust pipeline to deliver fast and accurate transcriptions for long-form media:

Media Loading: The script first loads your media file, whether it's a local file or a remote URL.
VAD-based Chunking: It analyzes the audio stream using Voice Activity Detection (VAD) to identify silent segments.
Intelligent Splitting: The audio is then split into smaller chunks based on the detected silences. Each chunk is kept under the 3-minute API limit, preventing mid-sentence cuts.
Parallel API Calls: A thread pool is initiated to upload and process these chunks concurrently using the DashScope Qwen-ASR API.
Result Aggregation & Cleaning: The transcribed text segments from all chunks are collected, re-ordered, and then post-processed to remove detected repetitions and hallucinations.
Output Generation: The final, cleaned transcription is printed to the console and saved to a text file.

🏁 Getting Started

Follow these steps to set up and run the project on your local machine.

Prerequisites

Python 3.8 or higher.
FFmpeg: The script requires FFmpeg to be installed on your system to handle media files.
- Ubuntu/Debian: sudo apt update && sudo apt install ffmpeg
- macOS: brew install ffmpeg
- Windows: Download from the official FFmpeg website and add it to your system's PATH.
DashScope API Key: You need an API key from Alibaba Cloud's DashScope.
- You can obtain one from the DashScope Console. If you are calling the API services of Tongyi Qwen for the first time, you can follow the tutorial on this website to create your own API Key.
- For better security and convenience, it is highly recommended to set your API key as an environment variable named DASHSCOPE_API_KEY. The script will automatically use it, and you won't need to pass the --api-key argument in the command.
  
  On Linux/macOS:
```
export DASHSCOPE_API_KEY="your_api_key_here"
```
  (To make this permanent, add the line to your ~/.bashrc, ~/.zshrc, or ~/.profile file.)
  
  On Windows (Command Prompt):
```
set DASHSCOPE_API_KEY="your_api_key_here"
```
  On Windows (PowerShell):
```
$env:DASHSCOPE_API_KEY="your_api_key_here"
```
  (For a permanent setting on Windows, search for "Edit the system environment variables" in the Start Menu and add DASHSCOPE_API_KEY to your user variables.)

Installation

We recommend installing the tool directly from PyPI for the simplest setup.

Option 1: Install from PyPI (Recommended)

Simply run the following command in your terminal. This will install the package and make the qwen3-asr command available system-wide.

pip install qwen3-asr-toolkit

Option 2: Install from Source

If you want to install the latest development version or contribute to the project, you can install from the source code.

Clone the repository:

git clone https://github.com/QwenLM/Qwen3-ASR-Toolkit.git
cd Qwen3-ASR-Toolkit

Install the package:
```
pip install .
```

📖 Usage

Once installed, you can use the qwen3-asr command directly from your terminal. By default, the tool will print progress information.

Command

qwen3-asr -i <input_file_or_url> [-key <api_key>] [-j <num_threads>] [-c <context>] [-t <tmp_dir>] [-s]

Arguments

Argument	Short	Description	Required/Optional
`--input-file`	`-i`	Path to the local media file or a remote URL (http/https) to transcribe.	Required
`--context`	`-c`	Text context to guide the ASR model, improving recognition of specific terms.	Optional, Default: `""`
`--dashscope-api-key`	`-key`	Your DashScope API Key.	Optional (if `DASHSCOPE_API_KEY` is set)
`--num-threads`	`-j`	The number of concurrent threads to use for API calls.	Optional, Default: 4
`--tmp-dir`	`-t`	Path to a directory for storing temporary chunk files.	Optional, Default: `~/qwen3-asr-cache`
`--silence`	`-s`	Silence mode. Suppresses detailed progress and chunking information on the terminal.	Optional

Output

The full transcription result will be printed to the terminal (unless in --silence mode) and also saved in a .txt file in the same directory as the input file. For example, if you process my_video.mp4, the output will be saved to my_video.txt.

✨ Examples

Here are a few examples of how to use the tool.

1. Basic Transcription of a Local File

Transcribe a video file using the default 4 threads. This command assumes you have set the DASHSCOPE_API_KEY environment variable.

qwen3-asr -i "/path/to/my/long_lecture.mp4"

2. Transcribe a Remote Audio File

Directly process an audio file from a URL.

qwen3-asr -i "https://somewebsite.com/audios/podcast_episode.mp3"

3. Increase Concurrency and Pass API Key

Transcribe a long audio file using 8 parallel threads and pass the API key directly via the command line.

qwen3-asr -i "/path/to/my/podcast_episode_01.wav" -j 8 -key "your_api_key_here"

4. Provide Context to Improve Accuracy

If your audio contains specific jargon, names, or acronyms, use the -c flag to provide context, which helps the model recognize them correctly.

qwen3-asr -i "/path/to/my/tech_talk.mp4" -c "Qwen-ASR, DashScope, FFmpeg, VAD"

5. Run in Silence Mode

Use the -s or --silence flag to prevent progress details from being printed to the terminal. The final transcript will still be saved to a .txt file.

qwen3-asr -i "/path/to/my/meeting_recording.m4a" -s

🤝 Contributing

Contributions are welcome! If you have suggestions for improvements, please feel free to fork the repo, create a feature branch, and open a pull request. You can also open an issue with the "enhancement" tag.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

1.0.4

Sep 22, 2025

This version

1.0.3

Sep 19, 2025

1.0.2

Sep 18, 2025

1.0.1

Sep 17, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

qwen3_asr_toolkit-1.0.3.tar.gz (14.2 kB view details)

Uploaded Sep 19, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

qwen3_asr_toolkit-1.0.3-py3-none-any.whl (11.9 kB view details)

Uploaded Sep 19, 2025 Python 3

File details

Details for the file qwen3_asr_toolkit-1.0.3.tar.gz.

File metadata

Download URL: qwen3_asr_toolkit-1.0.3.tar.gz
Upload date: Sep 19, 2025
Size: 14.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.18

File hashes

Hashes for qwen3_asr_toolkit-1.0.3.tar.gz
Algorithm	Hash digest
SHA256	`c6d199d11b7aa62e642a40e4ff59c301c891603536b6ca13d71f81a8ef2823ba`
MD5	`88478c76fb5bddcdd904f8654289fd46`
BLAKE2b-256	`ccc8b92e6ec900173efb39e7a85cd329f830e145cbb6f444ced700c7ff2d1051`

See more details on using hashes here.

File details

Details for the file qwen3_asr_toolkit-1.0.3-py3-none-any.whl.

File metadata

Download URL: qwen3_asr_toolkit-1.0.3-py3-none-any.whl
Upload date: Sep 19, 2025
Size: 11.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.18

File hashes

Hashes for qwen3_asr_toolkit-1.0.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`61c1ded3a90354137e0a7ca41a56d6e7c506170f707968e79d974d3c1be3a3dc`
MD5	`44bcd8a996767555adfb9b203f326c8f`
BLAKE2b-256	`248eb3ea6529d0f52c570729a8f42825bbb768fafcb4e75e8e5e690aa966c40f`

See more details on using hashes here.

qwen3-asr-toolkit 1.0.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Qwen3-ASR-Toolkit

🚀 Key Features

⚙️ How It Works

🏁 Getting Started

Prerequisites

Installation

Option 1: Install from PyPI (Recommended)

Option 2: Install from Source

📖 Usage

Command

Arguments

Output

✨ Examples

1. Basic Transcription of a Local File

2. Transcribe a Remote Audio File

3. Increase Concurrency and Pass API Key

4. Provide Context to Improve Accuracy

5. Run in Silence Mode

🤝 Contributing

📄 License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes