Skip to main content

Automated Japanese vocabulary mining from anime subtitles with Anki integration

Project description

Anki Miner

PyPI version License: GPL v3 Python 3.10+

Automated Japanese vocabulary mining from anime subtitles. Extracts unknown words, fetches definitions, pulls screenshots and audio from video files, and creates Anki flashcards automatically.

Showcase

App Showcase

Anki Miner Showcase

Cards Created with Anki Miner

Cowboy Bebop Frieren Steins;Gate

How It Works

  1. Parse subtitles — Tokenizes Japanese text using MeCab morphological analysis
  2. Filter words — Keeps content words (nouns, verbs, adjectives, adverbs) and removes words already in your Anki collection
  3. Extract media — Captures screenshots and audio clips from the video at each subtitle's timestamp using ffmpeg
  4. Fetch definitions — Looks up English definitions from JMdict (offline) or the Jisho API
  5. Create cards — Batch uploads everything to Anki via AnkiConnect

Features

  • CLI and GUI — Use from the terminal or through a desktop application
  • Batch processing — Process entire anime series at once with automatic video/subtitle file pairing
  • Offline dictionary — Fast JMdict lookups with Jisho API fallback
  • Parallel media extraction — Concurrent ffmpeg processes for speed
  • Preview mode — See what words would be mined without creating any cards
  • Smart filtering — Skips particles, pronouns, onomatopoeia, sound effects, and words you already know
  • Theming — Light, Dark, and Japanese-inspired GUI themes

Installation

Requirements

  • Python 3.10+download
  • ffmpeg — must be on your PATH
  • Anki with AnkiConnect installed
    • In Anki, go to Tools > Add-ons > Get Add-ons and paste code 2055492159
    • Restart Anki — AnkiConnect runs in the background while Anki is open

Install Anki Miner

Install with pipx (recommended, creates an isolated environment):

pipx install anki-miner

Don't have pipx? Install it first: pip install pipx && pipx ensurepath, then restart your terminal.

Or install with pip directly:

pip install anki-miner
Download standalone executable (no Python required)

Download the latest release for your platform:

Platform Download
Windows AnkiMiner-Windows-x86_64.zip
macOS AnkiMiner-macOS-arm64.tar.gz
Linux AnkiMiner-Linux-x86_64.tar.gz

Note: You still need ffmpeg installed and Anki running with the AnkiConnect add-on.

Manual installation (from source)
git clone https://github.com/0xzerolight/anki_miner.git
cd anki_miner
python -m venv venv
source venv/bin/activate  # Linux/macOS
# or: venv\Scripts\activate  # Windows
pip install .

Create Desktop Shortcut (Optional)

Create a clickable shortcut to launch Anki Miner from your desktop or app menu:

anki_miner create-shortcut
  • Linux: Adds "Anki Miner" to your application menu
  • Windows: Creates an "Anki Miner" shortcut on your Desktop and Start Menu

Recommended Setup

These steps are optional but improve the experience.

Lapis Note Type

Anki Miner uses the Lapis note type fields by default (an open-source Anki note type for Japanese learning).

  1. Download the latest .apkg from Lapis releases
  2. In Anki, go to File > Import and select the .apkg file

The default field mapping:

Anki Miner Field Note Field Content
word Expression Dictionary form of the word
sentence Sentence Original subtitle line
definition MainDefinition English definitions
picture Picture Screenshot from the video
audio SentenceAudio Audio clip of the sentence

You can use a different note type by changing the field mappings in the GUI settings. As long as the note type contains all the 'Anki Miner' fields, it should work well with the app.

JMdict Offline Dictionary

For fast offline lookups, download JMdict:

mkdir -p ~/.anki_miner
wget -O ~/.anki_miner/JMdict_e.gz http://ftp.edrdg.org/pub/Nihongo/JMdict_e.gz
gunzip ~/.anki_miner/JMdict_e.gz

Without JMdict, Anki Miner falls back to the Jisho API (slower, requires internet, rate-limited).

Quick Start

CLI

# Mine a single episode
anki_miner mine video.mkv subs.ass

# Preview words without creating cards
anki_miner mine video.mkv subs.ass --preview

# Adjust subtitle timing (negative = earlier, positive = later)
anki_miner mine video.mkv subs.ass --offset -2.5

# Batch process a folder of episodes
anki_miner mine-folder ./episodes/

# Batch preview
anki_miner mine-folder ./episodes/ --preview

GUI

anki_miner_gui

The GUI provides three tabs:

  • Single Episode — Mine one video/subtitle pair with file selectors and progress tracking
  • Batch Processing — Queue multiple series for sequential processing
  • Settings — Configure Anki connection, media extraction, dictionary, and word filtering options

Configuration

All settings can be adjusted in the GUI Settings tab. Here are the key options:

Setting Default Description
anki_deck_name "Anki Miner" Target Anki deck
anki_note_type "Lapis" Note type to use
audio_padding 0.3 Seconds added before/after audio clips
screenshot_offset 1.0 Seconds after subtitle start for screenshot
min_word_length 2 Minimum characters per word
max_parallel_workers 6 Concurrent ffmpeg processes
use_offline_dict true Use JMdict instead of Jisho API
subtitle_offset 0.0 Global subtitle timing adjustment

GUI settings are saved to ~/.anki_miner/gui_config.json. CLI commands use the default values shown above.

Troubleshooting

Issue Solution
"Cannot connect to Anki" Start Anki and ensure AnkiConnect is installed
"Deck not found" Create the deck in Anki or update the deck name in settings
"Note type not found" Import the Lapis note type (see Installation above) or configure your own
"ffmpeg not found" Install ffmpeg and add to PATH
"JMdict file not found" Download to ~/.anki_miner/ (see Installation above) or disable offline dictionary
Audio is wrong language The tool tries Japanese audio tracks first, then falls back to the default track
Subtitles out of sync Use --offset (CLI) or the subtitle offset control (GUI) to adjust timing

Issues and Contributing

Found a bug or have an idea for a feature? Open an issue — all bug reports and suggestions are welcome.

Pull requests are also welcome. See CONTRIBUTING.md for development setup and guidelines.

License

This project is licensed under the GNU General Public License v3.0 — see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

anki_miner-2.0.4.tar.gz (116.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

anki_miner-2.0.4-py3-none-any.whl (156.2 kB view details)

Uploaded Python 3

File details

Details for the file anki_miner-2.0.4.tar.gz.

File metadata

  • Download URL: anki_miner-2.0.4.tar.gz
  • Upload date:
  • Size: 116.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for anki_miner-2.0.4.tar.gz
Algorithm Hash digest
SHA256 08854141af8657f1feaab2ab1dfef157c6d9c630ed832c3298c9306c7f5c9dcf
MD5 644f1bad72aa9ef65a451fd46204f065
BLAKE2b-256 efc5cf31ba3d8516190f4735c3a4674f26f7d7c88a0e21ad75893d0ee5af955f

See more details on using hashes here.

Provenance

The following attestation bundles were made for anki_miner-2.0.4.tar.gz:

Publisher: publish.yml on 0xzerolight/anki_miner

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file anki_miner-2.0.4-py3-none-any.whl.

File metadata

  • Download URL: anki_miner-2.0.4-py3-none-any.whl
  • Upload date:
  • Size: 156.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for anki_miner-2.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 bab602fc8e5a5374c543e91956f251da8a97e28e75d4d242005e1efa3dc679fb
MD5 1383383ce9ae9e184a3c3d726d42d9e6
BLAKE2b-256 839a4e317c1c39cf26f2995b5781bc8f27974d2aead69784963df25673fba24d

See more details on using hashes here.

Provenance

The following attestation bundles were made for anki_miner-2.0.4-py3-none-any.whl:

Publisher: publish.yml on 0xzerolight/anki_miner

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page