Automated Japanese vocabulary mining from anime subtitles with Anki integration
Project description
Anki Miner
Batch-mines Japanese vocabulary from anime and YouTube into Anki cards. Given a season folder or a YouTube URL, it produces cards containing screenshots, sentence audio, furigana, pitch accent, and frequency data.
Suited to batch processing after viewing, rather than real-time lookup during playback (the asbplayer and Yomitan workflow).
Showcase
Example cards
Generated from video and subtitle files. Each card contains a screenshot, sentence audio, furigana, and definition.
How It Works
- Parse subtitles: tokenize Japanese text with MeCab morphological analysis.
- Filter words: keep content words (nouns, verbs, adjectives, adverbs); drop words already in your Anki collection or on your blacklist.
- Extract media: capture screenshots and audio clips from the video at each subtitle's timestamp via ffmpeg.
- Fetch definitions: look up definitions through your configured dictionary chain (Yomitan-format dictionaries, with Jisho as optional online fallback).
- Create cards: batch upload to Anki via AnkiConnect.
Features
- Lapis-compatible cards with furigana, pitch accent, and word frequency fields.
- YouTube support: paste a URL, mine the video.
- Queue a folder of episode/subtitle pairs for sequential processing.
- Pluggable dictionary chain: load any Yomitan-format dictionaries, reorder freely, with Jisho online as optional fallback.
- Preview and curate the word list before any cards are created.
- Parallel ffmpeg extraction for screenshots and sentence audio. Configurable audio codec (MP3 or Opus) and bitrate in Settings → Media for storage-conscious collections.
- Analytics dashboard with history, undo, and series difficulty rankings.
- Four themes (Light, Dark, Sakura, Tokyo Night) plus custom JSON themes.
Installation
Requirements
- ffmpeg on PATH.
- macOS:
brew install ffmpeg - Ubuntu/Debian:
sudo apt install ffmpeg - Windows: download from ffmpeg.org and add to PATH.
- macOS:
- Anki with the AnkiConnect add-on. In Anki: Tools → Add-ons → Get Add-ons, paste code
2055492159, restart.
Download
Grab the installer for your platform from the latest release:
| Platform | Installer | Portable |
|---|---|---|
| Windows | AnkiMiner-*-Setup.exe |
AnkiMiner-Windows-x86_64.zip |
| Linux (Debian/Ubuntu) | anki-miner_*_amd64.deb |
AnkiMiner-*-Linux-x86_64.AppImage |
| Linux (other) | — | AnkiMiner-Linux-x86_64.tar.gz |
| macOS (Apple Silicon) | — | AnkiMiner-macOS-arm64.tar.gz |
No Python required. Installers and portable archives bundle all dependencies.
Install from PyPI (Python 3.10+)
pipx install anki-miner # or: pip install anki-miner
Install from source
git clone https://github.com/0xzerolight/anki_miner.git
cd anki_miner
pip install .
Quick Start
After installing, launch Anki Miner from your Start Menu, Applications folder, or app menu. If you installed from PyPI or source, run anki_miner_gui from a terminal. A desktop shortcut is created on first launch; re-run it from Tools -> Create Desktop Shortcut... inside the app.
Anki must be running with AnkiConnect installed before mining starts.
Tabs:
- Single Episode: mine one video/subtitle pair with file selectors and progress tracking.
- Batch Processing: queue multiple series for sequential processing.
- YouTube: paste a URL, fetch metadata, then mine.
- Analytics: history, series difficulty, milestones.
- Settings: Anki connection, media extraction, dictionary, word filtering. Saved to
~/.anki_miner/gui_config.json.
Recommended Setup
Lapis Note Type
Anki Miner uses the Lapis note type fields by default. For custom note types, rename the fields in Settings/Anki.
- Download the latest
.apkgfrom Lapis releases. - In Anki: File → Import and select the
.apkg.
Default field mapping:
| Anki Miner Field | Note Field | Content |
|---|---|---|
| word | Expression | Dictionary form of the word |
| sentence | Sentence | Original subtitle line |
| definition | MainDefinition | English definitions |
| picture | Picture | Screenshot from the video |
| audio | SentenceAudio | Audio clip of the sentence |
| expression_furigana | ExpressionFurigana | Word with furigana reading |
| sentence_furigana | SentenceFurigana | Sentence with furigana reading |
| pitch_position | (unmapped) | Pitch accent position number |
| pitch_category | (unmapped) | Pitch accent category |
| frequency | (unmapped) | Word frequency rank |
Fields marked (unmapped) have no default Lapis mapping. Map them in Settings if your note type has equivalents. Any note type with the required fields works.
Dictionaries
Anki Miner looks up definitions through a provider chain you configure. Each lookup tries the providers in order; the first hit wins. Mix any number of offline Yomitan-format dictionaries with the Jisho online fallback, in any order.
Add a dictionary in Settings → Add Dictionary… by pointing at a Yomitan .zip archive. Drag entries to reorder the chain. Installed dictionaries are indexed once into ~/.anki_miner/dicts/<dict_id>/index.sqlite and loaded on startup. Structured-content entries are rendered to HTML on import, so card definitions preserve the source dictionary's formatting (definition lists, examples, tags).
Recommended Japanese → English dictionaries — both are JMdict-derived; pick whichever fits your cards (or load both and order them as you like):
- Jitendex — modern JMdict successor with structured-content formatting, example sentences, and richer tags. Best for visually rich cards. Grab the Yomitan archive from the Jitendex releases page.
- JMdict — the original community JMdict project. Plain-text glosses, smaller index, faster to add. Yomitan builds are available from the Yomitan dictionary list or you can rebuild from the EDRDG source.
Install via Settings → Add Dictionary… in either case.
Without any local dictionary, lookups fall back to the Jisho API (slower, online, rate-limited).
Upgrading from a pre-multi-dictionary release? A legacy
~/.anki_miner/JMdict_efile is auto-migrated to the new SQLite index on first launch. The legacy XML can be deleted after migration.
YouTube Mining
Paste a URL, click Fetch Info to probe metadata (title, duration, subtitle availability), then click Mine. The fetch downloads the video and its Japanese subtitle track into a per-run temporary directory, then passes both files to the same pipeline used for file-based mining.
Auto-captions are accepted only when native Japanese. Tracks that YouTube generates by machine-translating from English are rejected, since mining them yields unusable results. Native auto-captions remain lower quality than manual subtitles because they lack sentence boundaries.
Gotchas:
- Bot-detection prompts: if YouTube asks "Sign in to confirm you're not a bot", open Settings -> Cookies -> Browser and pick Firefox or Chrome. yt-dlp pulls cookies from that browser's profile on every fetch.
- Age-restricted videos: same fix.
- Max duration: defaults to 120 minutes. The probe aborts before downloading if the video is longer. Adjust in Settings.
Updates
Anki Miner checks GitHub for new releases on startup (toggle in Settings). When an update is available, a banner offers a one-click download of the asset that matches your install: .deb for Debian/Ubuntu, .AppImage for AppImage, the Inno installer on Windows, the macOS arm64 archive, or the release page for pip/source installs. "Skip this version" suppresses the prompt for that release; the next release prompts again.
Troubleshooting
| Issue | Solution |
|---|---|
| "Cannot connect to Anki" | Start Anki and ensure AnkiConnect is installed. |
| "Deck not found" | Create the deck in Anki or update the deck name in Settings. |
| "Note type not found" | Import Lapis (see above) or configure your own in Settings. |
| "ffmpeg not found" | Install ffmpeg and add it to PATH. |
| No definitions found | Add a Yomitan dictionary in Settings → Add Dictionary…, or enable the Jisho fallback. |
| Audio is wrong language | The tool tries Japanese audio tracks first, then falls back to the default. |
| Subtitles out of sync | Use the subtitle offset control in the GUI. |
Contributing
Contributions are welcome — bug fixes, dictionary integrations, GUI polish, doc improvements, all sizes.
- New here? Start with CONTRIBUTING.md.
- Architecture overview: ARCHITECTURE.md.
- Testing strategy: docs/TESTING.md.
- Code of Conduct: CODE_OF_CONDUCT.md.
- Security: SECURITY.md.
Bug reports and feature requests → Issues. General questions and discussion → Discussions.
License
GNU General Public License v3.0. See LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file anki_miner-2.4.1.tar.gz.
File metadata
- Download URL: anki_miner-2.4.1.tar.gz
- Upload date:
- Size: 372.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6c07e65cd388fe979cd4ef9d6be895e119d16c04e634dcc99764f38983f3bbd7
|
|
| MD5 |
5a8cbca7d9766e05ae3ad8c6317dbc60
|
|
| BLAKE2b-256 |
212eb3b57e96014794961319136c1a527087609f2332bf36d577763fa00c0717
|
Provenance
The following attestation bundles were made for anki_miner-2.4.1.tar.gz:
Publisher:
publish.yml on 0xzerolight/anki_miner
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
anki_miner-2.4.1.tar.gz -
Subject digest:
6c07e65cd388fe979cd4ef9d6be895e119d16c04e634dcc99764f38983f3bbd7 - Sigstore transparency entry: 1575152870
- Sigstore integration time:
-
Permalink:
0xzerolight/anki_miner@9ddb8c08759b8ccb674f9e0bd6e3236cbcb2f04e -
Branch / Tag:
refs/tags/v2.4.1 - Owner: https://github.com/0xzerolight
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@9ddb8c08759b8ccb674f9e0bd6e3236cbcb2f04e -
Trigger Event:
push
-
Statement type:
File details
Details for the file anki_miner-2.4.1-py3-none-any.whl.
File metadata
- Download URL: anki_miner-2.4.1-py3-none-any.whl
- Upload date:
- Size: 423.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a6276135a3819c7071b9eadf17b5baf4171e36d3e43b1f0f108c2619ae0da393
|
|
| MD5 |
c635c8495ccedc00d164339f6a8965e6
|
|
| BLAKE2b-256 |
9150277ad491055e6093d00d63b6c63c436282ed9a2d38d3c2eb35dd4c3d3db4
|
Provenance
The following attestation bundles were made for anki_miner-2.4.1-py3-none-any.whl:
Publisher:
publish.yml on 0xzerolight/anki_miner
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
anki_miner-2.4.1-py3-none-any.whl -
Subject digest:
a6276135a3819c7071b9eadf17b5baf4171e36d3e43b1f0f108c2619ae0da393 - Sigstore transparency entry: 1575152904
- Sigstore integration time:
-
Permalink:
0xzerolight/anki_miner@9ddb8c08759b8ccb674f9e0bd6e3236cbcb2f04e -
Branch / Tag:
refs/tags/v2.4.1 - Owner: https://github.com/0xzerolight
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@9ddb8c08759b8ccb674f9e0bd6e3236cbcb2f04e -
Trigger Event:
push
-
Statement type: