Skip to main content

Automated Japanese vocabulary mining from anime subtitles with Anki integration

Project description

Anki Miner

PyPI version License: GPL v3 Python 3.10+

Batch-mines Japanese vocabulary from anime and YouTube into Anki cards. Given a season folder or a YouTube URL, it produces cards containing screenshots, sentence audio, furigana, pitch accent, and frequency data.

Suited to batch processing after viewing, rather than real-time lookup during playback (the asbplayer and Yomitan workflow).

Showcase

Anki Miner Showcase

Example cards

Cowboy Bebop Frieren Steins;Gate

Generated from video and subtitle files. Each card contains a screenshot, sentence audio, furigana, and definition.

How It Works

  1. Parse subtitles: tokenize Japanese text with MeCab morphological analysis.
  2. Filter words: keep content words (nouns, verbs, adjectives, adverbs); drop words already in your Anki collection or on your blacklist.
  3. Extract media: capture screenshots and audio clips from the video at each subtitle's timestamp via ffmpeg.
  4. Fetch definitions: look up English definitions from JMdict (offline) or the Jisho API.
  5. Create cards: batch upload to Anki via AnkiConnect.

Features

  • Lapis-compatible cards with furigana, pitch accent, and word frequency fields.
  • YouTube support: paste a URL, mine the video.
  • Queue a folder of episode/subtitle pairs for sequential processing.
  • Offline JMdict dictionary with Jisho API fallback.
  • Preview and curate the word list before any cards are created.
  • Parallel ffmpeg extraction for screenshots and sentence audio.
  • Analytics dashboard with history, undo, and series difficulty rankings.
  • Four themes (Light, Dark, Sakura, Tokyo Night) plus custom JSON themes.

Installation

Requirements

  • ffmpeg on PATH.
  • Anki with the AnkiConnect add-on. In Anki: Tools → Add-ons → Get Add-ons, paste code 2055492159, restart.

Download

Grab the installer for your platform from the latest release:

Platform Installer Portable
Windows AnkiMiner-*-Setup.exe AnkiMiner-Windows-x86_64.zip
Linux (Debian/Ubuntu) anki-miner_*_amd64.deb AnkiMiner-*-Linux-x86_64.AppImage
Linux (other) AnkiMiner-Linux-x86_64.tar.gz
macOS (Apple Silicon) AnkiMiner-macOS-arm64.tar.gz

No Python required. Installers and portable archives bundle all dependencies.

Install from PyPI (Python 3.10+)
pipx install anki-miner   # or: pip install anki-miner
Install from source
git clone https://github.com/0xzerolight/anki_miner.git
cd anki_miner
pip install .

Quick Start

After installing, launch Anki Miner from your Start Menu, Applications folder, or app menu. If you installed from PyPI or source, run anki_miner_gui from a terminal. A desktop shortcut is created on first launch; re-run it from Tools -> Create Desktop Shortcut... inside the app.

Anki must be running with AnkiConnect installed before mining starts.

Tabs:

  • Single Episode: mine one video/subtitle pair with file selectors and progress tracking.
  • Batch Processing: queue multiple series for sequential processing.
  • YouTube: paste a URL, fetch metadata, then mine.
  • Analytics: history, series difficulty, milestones.
  • Settings: Anki connection, media extraction, dictionary, word filtering. Saved to ~/.anki_miner/gui_config.json.

Recommended Setup

Lapis Note Type

Anki Miner uses the Lapis note type fields by default. For custom note types, rename the fields in Settings/Anki.

  1. Download the latest .apkg from Lapis releases.
  2. In Anki: File → Import and select the .apkg.

Default field mapping:

Anki Miner Field Note Field Content
word Expression Dictionary form of the word
sentence Sentence Original subtitle line
definition MainDefinition English definitions
picture Picture Screenshot from the video
audio SentenceAudio Audio clip of the sentence
expression_furigana ExpressionFurigana Word with furigana reading
sentence_furigana SentenceFurigana Sentence with furigana reading
pitch_position (unmapped) Pitch accent position number
pitch_category (unmapped) Pitch accent category
frequency (unmapped) Word frequency rank

Fields marked (unmapped) have no default Lapis mapping. Map them in Settings if your note type has equivalents. Any note type with the required fields works.

JMdict Offline Dictionary

For fast offline lookups:

mkdir -p ~/.anki_miner
wget -O ~/.anki_miner/JMdict_e.gz http://ftp.edrdg.org/pub/Nihongo/JMdict_e.gz
gunzip ~/.anki_miner/JMdict_e.gz

Without JMdict, lookups fall back to the Jisho API (slower, online, rate-limited).

YouTube Mining

Paste a URL, click Fetch Info to probe metadata (title, duration, subtitle availability), then click Mine. The fetch downloads the video and its Japanese subtitle track into a per-run temporary directory, then passes both files to the same pipeline used for file-based mining.

Auto-captions are accepted only when native Japanese. Tracks that YouTube generates by machine-translating from English are rejected, since mining them yields unusable results. Native auto-captions remain lower quality than manual subtitles because they lack sentence boundaries.

Gotchas:

  • Bot-detection prompts: if YouTube asks "Sign in to confirm you're not a bot", open Settings -> Cookies -> Browser and pick Firefox or Chrome. yt-dlp pulls cookies from that browser's profile on every fetch.
  • Age-restricted videos: same fix.
  • Max duration: defaults to 120 minutes. The probe aborts before downloading if the video is longer. Adjust in Settings.

Updates

Anki Miner checks GitHub for new releases on startup (toggle in Settings). When an update is available, a banner offers a one-click download of the asset that matches your install: .deb for Debian/Ubuntu, .AppImage for AppImage, the Inno installer on Windows, the macOS arm64 archive, or the release page for pip/source installs. "Skip this version" suppresses the prompt for that release; the next release prompts again.

Troubleshooting

Issue Solution
"Cannot connect to Anki" Start Anki and ensure AnkiConnect is installed.
"Deck not found" Create the deck in Anki or update the deck name in Settings.
"Note type not found" Import Lapis (see above) or configure your own in Settings.
"ffmpeg not found" Install ffmpeg and add it to PATH.
"JMdict file not found" Download to ~/.anki_miner/ (see above) or disable offline dictionary.
Audio is wrong language The tool tries Japanese audio tracks first, then falls back to the default.
Subtitles out of sync Use the subtitle offset control in the GUI.

Issues, Feature Ideas, Contributing

Report bugs in Issues.

See CONTRIBUTING.md for development setup.

New feature ideas in Discussions.

License

GNU General Public License v3.0. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

anki_miner-2.3.3.tar.gz (332.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

anki_miner-2.3.3-py3-none-any.whl (378.8 kB view details)

Uploaded Python 3

File details

Details for the file anki_miner-2.3.3.tar.gz.

File metadata

  • Download URL: anki_miner-2.3.3.tar.gz
  • Upload date:
  • Size: 332.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for anki_miner-2.3.3.tar.gz
Algorithm Hash digest
SHA256 036383ffdaaf18afdd31e62c4e20ec1eb3c8bcfb7f3d93198d568f9686585a3a
MD5 92ea20749043cb047b39a3438fab5b3a
BLAKE2b-256 74a593c5af79c8a59ac8f6be008674c0d7fe559c86e73d37b36469548a305b2c

See more details on using hashes here.

Provenance

The following attestation bundles were made for anki_miner-2.3.3.tar.gz:

Publisher: publish.yml on 0xzerolight/anki_miner

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file anki_miner-2.3.3-py3-none-any.whl.

File metadata

  • Download URL: anki_miner-2.3.3-py3-none-any.whl
  • Upload date:
  • Size: 378.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for anki_miner-2.3.3-py3-none-any.whl
Algorithm Hash digest
SHA256 d8323d59059ac734ae8ad01291962c771895e9680e489ed15be24f2b6f7565e8
MD5 7c500645ea94b96c1ffde3373e8a15b7
BLAKE2b-256 3df238aad9a277dfc9e3657b4d8925477de447cfe3968c8bc985e43aafd08d2e

See more details on using hashes here.

Provenance

The following attestation bundles were made for anki_miner-2.3.3-py3-none-any.whl:

Publisher: publish.yml on 0xzerolight/anki_miner

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page