Skip to main content

A utility to extract vocabulary lists from manga.

Project description

Manga Wordlist Extractor

This script allows you to automatically scan through manga and generate a csv with all contained words.

It is intended to be used with the community deck feature of Bunpro, hence the csv format. Once the csv import feature will be published, I will adjust the format of the csv. If any other outputs are desired, let me know!

Installation

You need to have python installed (ideally Python 3.12).

Using pip

Install the package using

pip install manga-wordlist-extractor

Using the source code directly

Download this repository (using the "code -> download zip" option above the files list at the top). Open a command prompt in the downloaded folder after extracting.

Run this to install all dependencies:

pip install -r requirements.txt

You can now run the tool from the src/main/main.py file.

Usage

manga-wordlist-extractor [-h] [--parent] folder

Replace folder with the path containing the manga files. Make sure to surround it with quotation marks if there are spaces in the path!

If you enter a parent folder containing multiple volumes, add "--parent" before the folder path.

This will generate a vocab.csv file containing all words.

Notices

If you run into errors, look into the mokuro repository linked at the bottom. There might be some issues with python version compatibility.

Also important: This script is not perfect. The text recognition can make mistakes and some of the extracted vocab can be wrong. If this proves to be a big issue I will look for a different method to parse vocabulary from the text.

TODO

  • Live Output from Mokuro (it can take very long)
  • Separate outputs for each volume
  • Added translations through dictionary lookup?

Acknowledgements

This is hardly my work, I just stringed together some amazing libraries:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

manga_wordlist_extractor-0.2.1.tar.gz (16.9 kB view details)

Uploaded Source

Built Distribution

manga_wordlist_extractor-0.2.1-py3-none-any.whl (18.3 kB view details)

Uploaded Python 3

File details

Details for the file manga_wordlist_extractor-0.2.1.tar.gz.

File metadata

File hashes

Hashes for manga_wordlist_extractor-0.2.1.tar.gz
Algorithm Hash digest
SHA256 109f86c7faff469941e3b0eff0b99a83dc60e0681d6ad852739b62f5ec7c63eb
MD5 bf30c50b606cab22f4b523c85ea2e49a
BLAKE2b-256 5a9fb0320ef2b8d9dabd7ea2797f5ad92953199db0bcfa31504d5a1fd9dc6a25

See more details on using hashes here.

File details

Details for the file manga_wordlist_extractor-0.2.1-py3-none-any.whl.

File metadata

File hashes

Hashes for manga_wordlist_extractor-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 ae4fca0ea99626c25dacee51e09dc47c9b28f897c82873f834e8a19a4ee023a7
MD5 cb73aeb06a69a92c3640908bf9f7a9d4
BLAKE2b-256 e5c91edff9db78d24c8ac4ce2481c9e35cbdc5fe0aae62aabce02d886adbe19d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page