Skip to main content

A tool for mining sentences from games. Update: Dependencies, replay buffer based line searching, and bug fixes.

Project description

gamesentenceminer

GSM (GameSentenceMiner)

Turn your gaming time into language mastery.


🎮 See it in Action

Demo Gif

  • OCR to get get text from a game that doesn't support text hooks.
  • Look up words with Yomitan in game.
  • Create Anki cards with game audio + screenshot (or gif) automatically.

What does it do?

GSM is an application designed to automate the process of creating flashcards while you play. It sits between your game and Anki, handling audio recording, screenshots, and OCR so you don't have to interrupt your gameplay.

📝 Anki Card Enhancement

GSM automatically adds context to your Anki cards whenever you create them.

  • Audio Capture: Uses Voice Activity Detection (VAD) to record and trim the specific voice line associated with the text.
  • Screenshots: Captures the game state the moment the line is spoken. GIFs and Black Bar Removal are supported.
  • Mine from History: Go back and create cards from previous lines you've encountered (i.e. cutscenes).
  • Multi-Line Support: Capture multiple lines of dialogue into one card using the built-in Texthooker.
  • AI Translation: Optional integration to provide sentence translations using your own API key.

https://github.com/user-attachments/assets/df6bc38e-d74d-423e-b270-8a82eec2394c

👁️ OCR (Text Recognition)

For games that don't have a text hook (Agent/Textractor), GSM uses a custom fork of OwOCR to read text directly from the screen.

This opens up all kinds of posssibilities for games that would otherwise be inaccessible for language learning/sentence mining. For example I've made cards with games like Metal Gear Solid 1+2, Titanfall 2, and Sekiro, all using GSM's OCR.

  • Easy Setup: Managed installation means you don't need to fiddle with terminals.
  • Two-Pass System: Clean, fast output similar to as if you had a hook.
  • Customizable Capture Zones: Define exactly where text appears on your screen for optimal results.

https://github.com/user-attachments/assets/07240472-831a-40e6-be22-c64b880b0d66

🖥️ Overlay

GSM includes a transparent overlay for instant dictionary lookups.

  • Hover over characters in-game to see definitions via Yomitan.
  • Create cards without ever leaving the game window.
  • Automatically Generated Furigana Display In Game.

Overlay Demo

📊 Statistics

Track your immersion habits with the stats dashboard.

  • Kanji Grid: View every Kanji you've encountered and click them to see their source sentences.
  • Goals: Set daily reading targets.
  • Tools: Clean up and organize your mining history.

stats


🚀 Getting Started

  1. Download: Get the latest release.
  2. Install: Watch the Installation Guide.
  3. Requirements:
    • An Anki tool (Yomitan, JL, etc.)
    • A text source (Agent, Textractor, or GSM's built-in OCR)
    • A game

📚 Documentation

For full setup guides and configuration details, check the Wiki (Currently WIP).

❤️ Acknowledgements

Integrated Components

This project includes modified versions of the following libraries, I got tired of submodule hell so I've included them directly here for easier management all credits go to the original authors:

Star History

Star History Chart

Sponsors

Free code signing provided by SignPath.io, certificate by SignPath Foundation.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gamesentenceminer-2026.5.10.tar.gz (28.8 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gamesentenceminer-2026.5.10-py3-none-any.whl (29.1 MB view details)

Uploaded Python 3

File details

Details for the file gamesentenceminer-2026.5.10.tar.gz.

File metadata

  • Download URL: gamesentenceminer-2026.5.10.tar.gz
  • Upload date:
  • Size: 28.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for gamesentenceminer-2026.5.10.tar.gz
Algorithm Hash digest
SHA256 da5ca29e44e8aafbec8b16d1888856b0b0b210a8118b571a0c58421a9cf1997b
MD5 d1370a7b6442cc5ca9e586bdbc9f4e61
BLAKE2b-256 11cdc0203edece7647d225ec8eb9e1a892487b7c7717a95ab97f792a9d9c0e29

See more details on using hashes here.

File details

Details for the file gamesentenceminer-2026.5.10-py3-none-any.whl.

File metadata

File hashes

Hashes for gamesentenceminer-2026.5.10-py3-none-any.whl
Algorithm Hash digest
SHA256 f4225689ca7fc52642f1166ed8b7eeab2b86d9ee59656ddae874b8c5aecc6ecf
MD5 4ada75f377acc5607c6daab2c74ca864
BLAKE2b-256 8b2726c7bd55e594bd1c5bcc67d2ef71e47c073aeb1c2f2cae2305f92682a04d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page