Automated Japanese vocabulary mining from anime subtitles with Anki integration
Project description
Anki Miner
Turn native Japanese content into Anki vocabulary cards.
Please leave a ⭐ star if Anki Miner helped you - it helps others find it :).
Mining Demo
Example cards
| ⬇️ MP4 (sound) | ⬇️ MP4 (sound) | ⬇️ MP4 (sound) |
Installation
Requirements
- ffmpeg only if installing from pip/pipx, .deb, or from source.
- Anki with the AnkiConnect add-on. In Anki: Tools → Add-ons → Get Add-ons, paste code
2055492159, restart.
Download
Grab the installer for your platform from the latest release:
| Platform | Installer | Portable |
|---|---|---|
| Windows | AnkiMiner-*-Setup.exe |
AnkiMiner-Windows-x86_64.zip |
| Linux (Debian/Ubuntu) | anki-miner_*_amd64.deb |
AnkiMiner-*-Linux-x86_64.AppImage |
| Linux (other) | — | AnkiMiner-Linux-x86_64.tar.gz |
| macOS (Apple Silicon) | — | AnkiMiner-macOS-arm64.tar.gz |
Install from PyPI (Python 3.11+)
pipx install anki-miner # or: pip install anki-miner
Install from source
git clone https://github.com/0xzerolight/anki_miner.git
cd anki_miner
pip install -e .
For full development setup, see CONTRIBUTING.md.
Tabs
- Episode Mining: mine one video/subtitle pair with word curation.
- Batch Mining: batch mine a folder of episode/subtitle pairs for sequential processing. Files are paired by episode number, so each folder / queue item should hold a single show (use Multi-Anime Queue for mining multiple series at a time).
- Deck Builder: point at a folder of episode/subtitle pairs and mine the full series into one named deck. Ranked by frequency; pick how many to include (all, top N, or a coverage target) and preview before cards are created.
- YouTube: paste one or more URLs, then mine the queue.
- Audiobook: queue local audiobook/subtitle pairs and mine them audio-only; embedded cover art stands in for screenshots.
- Analytics: history, series difficulty rankings, milestones, undo.
- Settings: Anki, Media, Dictionary, Filtering, YouTube, Themes. Saved to
~/.anki_miner/gui_config.json.
Other Features
- Extensive filtering options (i+1 filter, frequency limits, word blacklist, subtitle regex filtering, wordset filtering, and more).
- Offline Yomitan dictionary import (definitions, pitch accent, frequency data) with priority ordering.
- Definition styling presets (like Yomitan) or custom CSS.
- Subtitle timing preview with adjustable offset.
- Animated screenshots (see example card gifs).
Built-in themes (29)
- Ayu — Light, Mirage, Dark
- Catppuccin — Latte (light); Frappé, Macchiato, Mocha (dark)
- Dracula — Dracula, Alucard
- Everforest — Light, Dark
- GitHub — Light; Dark, Dark Dimmed
- Gruvbox — Light Medium, Dark Medium
- Kanagawa — Lotus (light), Wave (dark)
- Rosé Pine — Dawn (light); Main, Moon (dark)
- Solarized — Light, Dark
- Standalone — Light, Dark, Sakura, Nord, One Dark, Tokyo Night
Theme licenses: LICENSE-THEMES.md. Want another theme added? Suggest in a GitHub Issue.
How It Works
- Read the subtitles and split Japanese into individual words.
- Filter to content words you don't already know.
- Grab a screenshot and audio clip from the video for each line.
- Look up definitions in your configured offline dictionaries, optionally falling back to Jisho online if enabled (slower, rate-limited).
- Send the finished cards to Anki.
Recommended Resources
| Type | Resource | What you get | Download | Add via |
|---|---|---|---|---|
| Dictionary | Jitendex | JMdict successor; structured formatting, examples, tags | Yomitan zip | Add Dictionary… |
| Dictionary | JMdict | Plain glosses; smaller, faster to index | Yomitan zip | Add Dictionary… |
| Dictionary | Bee's Character Dictionary | Character names from your AniList/VNDB lists, with roles and descriptions | Generated on site | Add Dictionary… |
| Pitch | Kanjium | ~124k patterns; drop-in TSV, no import step | TSV | Dictionary → Pitch Accent File |
| Pitch | アクセント辞典v2 | Richer NHK notation | Drive | Dictionary → Pitch Accent File |
| Frequency | JPDB v2.2 Kana | All-round default for media | Yomitan zip | Filtering → Frequency List File |
| Frequency | BCCWJ SUW+LUW | Balanced corpus; pairs well with news/novels | Yomitan zip | Filtering → Frequency List File |
Dictionaries are indexed once into ~/.anki_miner/dicts/ (drag to reorder the chain).
The pitch and frequency pickers accept a raw CSV/TSV or a Yomitan zip, auto-converted to ~/.anki_miner/pitch_accent.csv / frequency.csv on Save.
Bee's Character Dictionary builds a custom Yomitan dictionary from your AniList/VNDB media lists, so character names in the shows you mine resolve to real definitions; re-generate and re-import when your lists change.
Proper-noun filtering uses bundled name wordsets derived from JMnedict (JMdict/EDICT project, EDRDG, CC BY-SA 4.0).
Updates
Anki Miner checks GitHub for new releases on startup (toggle in Settings). When an update is available, a banner offers a one-click download of the asset that matches your install: .deb for Debian/Ubuntu, .AppImage for AppImage, the Inno installer on Windows, the macOS arm64 archive, or the release page for pip/source installs. "Skip this version" suppresses the prompt for that release; the next release prompts again.
Troubleshooting
| Issue | Solution |
|---|---|
| "Cannot connect to Anki" | Start Anki and ensure AnkiConnect is installed. |
| "Deck not found" | The deck is created automatically when mining starts; if you meant a different deck, update the name in Settings. |
| "Note type not found" | Configure your note type's field names in Settings → Anki. |
| "ffmpeg not found" | Install ffmpeg and add it to PATH. |
| No definitions found | Add a Yomitan dictionary in Settings → Add Dictionary… (recommended), or enable the Jisho fallback (slower, rate-limited). |
| Audio is wrong language | The tool tries Japanese audio tracks first, then falls back to the default. |
| Subtitles out of sync | Use the subtitle offset control in the GUI (range ±300 seconds). |
| AV1 video won't preview | In-app preview is disabled for AV1 to avoid decoder error spam. Mining still works normally — only the preview is skipped. |
Contributing
Contributions are welcome — bug fixes, dictionary integrations, GUI polish, doc improvements, all sizes.
- New here? Start with CONTRIBUTING.md.
- Architecture overview: ARCHITECTURE.md.
- Testing strategy: TESTING.md.
- Code of Conduct: CODE_OF_CONDUCT.md.
- Security: SECURITY.md.
Bug reports and feature requests → Issues. General questions and discussion → Discussions.
Special Thanks
Sincere thanks to people who made exceptional contributions to the project:
★ StyraxBenzoin - Brilliant feature suggestions, new release testing, community building
See CONTRIBUTORS.md for everyone who has made any kind of contribution to the project.
License
GNU General Public License v3.0. See LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file anki_miner-2.6.3.tar.gz.
File metadata
- Download URL: anki_miner-2.6.3.tar.gz
- Upload date:
- Size: 2.4 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cc5b27cda14e29a156d2b74a18db13ea375e71b57910768de2999c3819debfab
|
|
| MD5 |
5de4d64b6fb5b36388db78d25b903847
|
|
| BLAKE2b-256 |
a875bf123b1fedbb594313f97a4351e5af951eb2564d461415c2b8051a3fdd32
|
Provenance
The following attestation bundles were made for anki_miner-2.6.3.tar.gz:
Publisher:
publish.yml on 0xzerolight/anki_miner
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
anki_miner-2.6.3.tar.gz -
Subject digest:
cc5b27cda14e29a156d2b74a18db13ea375e71b57910768de2999c3819debfab - Sigstore transparency entry: 1818451615
- Sigstore integration time:
-
Permalink:
0xzerolight/anki_miner@ebdfef4bc2c8485ed1a45d2fefec9fca51758a21 -
Branch / Tag:
refs/tags/v2.6.3 - Owner: https://github.com/0xzerolight
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@ebdfef4bc2c8485ed1a45d2fefec9fca51758a21 -
Trigger Event:
push
-
Statement type:
File details
Details for the file anki_miner-2.6.3-py3-none-any.whl.
File metadata
- Download URL: anki_miner-2.6.3-py3-none-any.whl
- Upload date:
- Size: 2.5 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
36a20612ad1063021112846b43e6b3673f156e688b7051ad7b369f7aeca7a98b
|
|
| MD5 |
4c8293048de62b637a9212da136da684
|
|
| BLAKE2b-256 |
e508da2b6550d8004f7676ba057efb57440481d1c43fd9a5acc30437ed3f48b3
|
Provenance
The following attestation bundles were made for anki_miner-2.6.3-py3-none-any.whl:
Publisher:
publish.yml on 0xzerolight/anki_miner
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
anki_miner-2.6.3-py3-none-any.whl -
Subject digest:
36a20612ad1063021112846b43e6b3673f156e688b7051ad7b369f7aeca7a98b - Sigstore transparency entry: 1818451639
- Sigstore integration time:
-
Permalink:
0xzerolight/anki_miner@ebdfef4bc2c8485ed1a45d2fefec9fca51758a21 -
Branch / Tag:
refs/tags/v2.6.3 - Owner: https://github.com/0xzerolight
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@ebdfef4bc2c8485ed1a45d2fefec9fca51758a21 -
Trigger Event:
push
-
Statement type: