Skip to main content

A CLI tool to synchronize Anki notes with high-quality Azure TTS audio.

Project description

ankiazvox

ankiazvox is a professional-grade CLI tool that synchronizes Anki notes with high-quality Azure Neural TTS audio. By leveraging cloud-based Neural voices, it automates text extraction, sanitization, and card updates via AnkiConnect, transforming text-only decks into immersive audio-visual learning tools.

✨ New in v0.6.0

  • Concurrency / Performance: Parallel synthesis with --workers/-w to speed up large sync jobs.
  • Overwrite & Debug: --overwrite replaces existing audio; --debug prints extra diagnostics for troubleshooting.

📌 v0.5.0 Release Highlights

  • azv init: An interactive onboarding setup that walks you through connecting your Azure account and setting your preferred default voice.
  • Field Mapping: Efficiency-focused syncing that processes multiple fields simultaneously via the --fields flag (e.g., Word:Audio;Sent:SentAudio).
  • Prosody Control: Fine-tune the listening experience with --rate and --pitch flags, allowing you to slow down complex phrases or adjust tone for clarity.
  • SSML Support: Enhanced processing that preserves natural phrasing by converting <br> to pauses and providing full support for raw SSML input fields.

🚀 Installation

1. Prerequisites

  • Anki Desktop: Must be running with the AnkiConnect add-on installed and configured.
  • Azure Speech Service: An active subscription key and region from the Azure portal (Azure offers a generous free tier for speech services).

2. Setup & Configuration

Install the package and run the initializer to create your azv_config.yaml file:

pip install ankiazvox  
azv init

🛠 Usage

1. Synchronize Audio (sync)

The sync command generates audio for notes that match a specific Anki search query. It handles the batch processing of voice synthesis and media management automatically.

Basic Single-Field Sync:

Sync text from "Front" and save audio tag to "Audio"

azv sync -q "deck:English::Vocabulary" -s "Front" -t "Audio"

Advanced Multi-Field Sync with Prosody:

Process Word and Sentence fields at 85% speed with a slight pitch increase

azv sync -q "deck:JP::Grammar" -f "Word:WordAud;Sent:SentAud" --rate 0.85 --pitch +5%
Option Short Description
--config Path to a config file (yaml or .env). The tool also auto-detects azv_config.yml or .env if present
--query -q Anki search query (standard Anki search syntax)
--fields -f Key-value mapping: source1:target1;source2:target2
--source -s Name of the field containing source text
--target -t Name of the field to store the [sound:...] tag
--rate Synthesis speed (1.0 is normal; 0.8 is 80% speed)
--pitch Pitch adjustment (e.g., +10% or -5%)
--voice -v Override the default neural voice for this session
--overwrite Replace existing audio in the target field if present
--ssml-source Treat the source field as raw SSML when it begins with <speak>
--workers -w Number of concurrent synthesis workers (default: 1)
--debug Enable debug logging for troubleshooting
--yes -y Skip the confirmation prompt and proceed immediately

2. Sample & List Voices

Before running a large sync, it is recommended to sample voices to find the best fit for your language material.

Preview a voice at a slower speed to check clarity

azv sample --voice en-US-AndrewNeural --text "The quick brown fox" --rate 0.8 --play

List all Japanese neural voices to find a specific dialect or tone

azv list-voices --locale ja-JP

📝 Formatting & SSML

  • HTML Sanitization: The tool cleans up Anki's internal HTML (like <div> and <span>) to ensure the TTS engine only reads the text.
  • Smart Pauses: It preserves line breaks by converting <br> tags into 400ms SSML pauses, which helps in separating sentences or definitions.
  • Raw SSML: For advanced users, if a field's content starts with the <speak> tag, ankiazvox treats it as raw SSML. This allows you to manually insert custom breaks, emphasis, or phoneme corrections directly into your Anki notes.

Additional notes:

  • Language detection from voice names: When wrapping text into SSML the tool extracts the language code from typical voice names (e.g., en-US-AndrewNeural) so the TTS engine receives the correct xml:lang attribute.
  • Cross-platform playback: azv sample --play uses the system player (afplay on macOS, ffplay elsewhere) when available.
  • Temporary files cleaned: Temporary synthesis files are removed after sync to avoid cluttering your project folder.

🤝 Contributing

Contributions are welcome! Whether it's a bug fix, a new feature, or an improvement to the documentation, feel free to open an issue or submit a Pull Request on GitHub.

📄 License

This project is open-source and released under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ankiazvox-0.6.0.tar.gz (9.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ankiazvox-0.6.0-py3-none-any.whl (10.6 kB view details)

Uploaded Python 3

File details

Details for the file ankiazvox-0.6.0.tar.gz.

File metadata

  • Download URL: ankiazvox-0.6.0.tar.gz
  • Upload date:
  • Size: 9.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.2.1 CPython/3.11.13 Darwin/25.2.0

File hashes

Hashes for ankiazvox-0.6.0.tar.gz
Algorithm Hash digest
SHA256 d5e34720b39e19837f7ca3d9ab3ec821d026fbf89302dee04cd124b237e1a3ec
MD5 fd0f5a0f56ac57f33186bca12f9a2584
BLAKE2b-256 8a5938738aca021c86bdf1fb2b70272ba809723ec2007af5c0e2c6aa67ffbf4b

See more details on using hashes here.

File details

Details for the file ankiazvox-0.6.0-py3-none-any.whl.

File metadata

  • Download URL: ankiazvox-0.6.0-py3-none-any.whl
  • Upload date:
  • Size: 10.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.2.1 CPython/3.11.13 Darwin/25.2.0

File hashes

Hashes for ankiazvox-0.6.0-py3-none-any.whl
Algorithm Hash digest
SHA256 dc41917355b39bf3745184230b3c7d36ed9d4554f337bdabc44007c954350923
MD5 e92be94cd1cbfc792c95c19f25fe9565
BLAKE2b-256 336c73a9319b0e735953c59e7b1a234202ec4449db827caf7da858be5dbe48a5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page