A CLI tool to synchronize Anki notes with high-quality Azure TTS audio.
Project description
ankiazvox
ankiazvox is a professional-grade CLI tool that synchronizes Anki notes with high-quality Azure Neural TTS audio. By leveraging cloud-based Neural voices, it automates text extraction, sanitization, and card updates via AnkiConnect, transforming text-only decks into immersive audio-visual learning tools.
✨ New in v0.6.0
- Concurrency / Performance: Parallel synthesis with
--workers/-wto speed up large sync jobs. - Overwrite & Debug:
--overwritereplaces existing audio;--debugprints extra diagnostics for troubleshooting.
📌 v0.5.0 Release Highlights
- azv init: An interactive onboarding setup that walks you through connecting your Azure account and setting your preferred default voice.
- Field Mapping: Efficiency-focused syncing that processes multiple fields simultaneously via the
--fieldsflag (e.g.,Word:Audio;Sent:SentAudio). - Prosody Control: Fine-tune the listening experience with
--rateand--pitchflags, allowing you to slow down complex phrases or adjust tone for clarity. - SSML Support: Enhanced processing that preserves natural phrasing by converting
<br>to pauses and providing full support for raw SSML input fields.
🚀 Installation
1. Prerequisites
- Anki Desktop: Must be running with the AnkiConnect add-on installed and configured.
- Azure Speech Service: An active subscription key and region from the Azure portal (Azure offers a generous free tier for speech services).
2. Setup & Configuration
Install the package and run the initializer to create your azv_config.yaml file:
pip install ankiazvox
azv init
🛠 Usage
1. Synchronize Audio (sync)
The sync command generates audio for notes that match a specific Anki search query. It handles the batch processing of voice synthesis and media management automatically.
Basic Single-Field Sync:
Sync text from "Front" and save audio tag to "Audio"
azv sync -q "deck:English::Vocabulary" -s "Front" -t "Audio"
Advanced Multi-Field Sync with Prosody:
Process Word and Sentence fields at 85% speed with a slight pitch increase
azv sync -q "deck:JP::Grammar" -f "Word:WordAud;Sent:SentAud" --rate 0.85 --pitch +5%
| Option | Short | Description |
|---|---|---|
--config |
Path to a config file (yaml or .env). The tool also auto-detects azv_config.yml or .env if present |
|
--query |
-q |
Anki search query (standard Anki search syntax) |
--fields |
-f |
Key-value mapping: source1:target1;source2:target2 |
--source |
-s |
Name of the field containing source text |
--target |
-t |
Name of the field to store the [sound:...] tag |
--rate |
Synthesis speed (1.0 is normal; 0.8 is 80% speed) | |
--pitch |
Pitch adjustment (e.g., +10% or -5%) |
|
--voice |
-v |
Override the default neural voice for this session |
--overwrite |
Replace existing audio in the target field if present | |
--ssml-source |
Treat the source field as raw SSML when it begins with <speak> |
|
--workers |
-w |
Number of concurrent synthesis workers (default: 1) |
--debug |
Enable debug logging for troubleshooting | |
--yes |
-y |
Skip the confirmation prompt and proceed immediately |
2. Sample & List Voices
Before running a large sync, it is recommended to sample voices to find the best fit for your language material.
Preview a voice at a slower speed to check clarity
azv sample --voice en-US-AndrewNeural --text "The quick brown fox" --rate 0.8 --play
List all Japanese neural voices to find a specific dialect or tone
azv list-voices --locale ja-JP
📝 Formatting & SSML
- HTML Sanitization: The tool cleans up Anki's internal HTML (like
<div>and<span>) to ensure the TTS engine only reads the text. - Smart Pauses: It preserves line breaks by converting
<br>tags into 400ms SSML pauses, which helps in separating sentences or definitions. - Raw SSML: For advanced users, if a field's content starts with the
<speak>tag, ankiazvox treats it as raw SSML. This allows you to manually insert custom breaks, emphasis, or phoneme corrections directly into your Anki notes.
Additional notes:
- Language detection from voice names: When wrapping text into SSML the tool extracts the language code from typical voice names (e.g.,
en-US-AndrewNeural) so the TTS engine receives the correctxml:langattribute. - Cross-platform playback:
azv sample --playuses the system player (afplayon macOS,ffplayelsewhere) when available. - Temporary files cleaned: Temporary synthesis files are removed after sync to avoid cluttering your project folder.
🤝 Contributing
Contributions are welcome! Whether it's a bug fix, a new feature, or an improvement to the documentation, feel free to open an issue or submit a Pull Request on GitHub.
📄 License
This project is open-source and released under the MIT License.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ankiazvox-0.6.0.tar.gz.
File metadata
- Download URL: ankiazvox-0.6.0.tar.gz
- Upload date:
- Size: 9.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.2.1 CPython/3.11.13 Darwin/25.2.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d5e34720b39e19837f7ca3d9ab3ec821d026fbf89302dee04cd124b237e1a3ec
|
|
| MD5 |
fd0f5a0f56ac57f33186bca12f9a2584
|
|
| BLAKE2b-256 |
8a5938738aca021c86bdf1fb2b70272ba809723ec2007af5c0e2c6aa67ffbf4b
|
File details
Details for the file ankiazvox-0.6.0-py3-none-any.whl.
File metadata
- Download URL: ankiazvox-0.6.0-py3-none-any.whl
- Upload date:
- Size: 10.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.2.1 CPython/3.11.13 Darwin/25.2.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dc41917355b39bf3745184230b3c7d36ed9d4554f337bdabc44007c954350923
|
|
| MD5 |
e92be94cd1cbfc792c95c19f25fe9565
|
|
| BLAKE2b-256 |
336c73a9319b0e735953c59e7b1a234202ec4449db827caf7da858be5dbe48a5
|