Skip to main content

A utility to extract vocabulary lists from manga.

Project description

Manga Wordlist Extractor

This script allows you to automatically scan through manga and generate a csv with all contained words.

It is intended to be used with the community deck feature of Bunpro, hence the csv format. Once the csv import feature will be published, I will adjust the format of the csv. If any other outputs are desired, let me know!

Usage

You need to have python installed (ideally Python 3.12).

Download this repository (using the "code -> download zip" option above the files list at the top). Open a command prompt in the downloaded folder after extracting.

Run this to install all dependencies:

pip install -r requirements.txt

Once this is done, navigate to the src/main folder in your command prompt. You can now run the tool with this command:

python main.py "FOLDER_PATH"

Replace FOLDER_PATH with the path containing the manga files. If you enter a parent folder containing multiple volumes, add "--parent" before the folder path.

This will generate a vocab.csv file containing all words.

Notices

If you run into errors, look into the mokuro repository linked at the bottom. There might be some issues with python version compatibility.

Also important: This script is not perfect. The text recognition can make mistakes and some of the extracted vocab can be wrong. If this proves to be a big issue I will look for a different method to parse vocabulary from the text.

TODO

  • Upload to PyPi and make usage much easier and simpler
  • Live Output from Mokuro (it can take very long)
  • Separate outputs for each volume
  • Added translations through dictionary lookup?

Acknowledgements

This is hardly my work, I just stringed together some amazing libraries:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

manga_wordlist_extractor-0.1.8.tar.gz (14.8 kB view details)

Uploaded Source

Built Distribution

manga_wordlist_extractor-0.1.8-py3-none-any.whl (15.0 kB view details)

Uploaded Python 3

File details

Details for the file manga_wordlist_extractor-0.1.8.tar.gz.

File metadata

File hashes

Hashes for manga_wordlist_extractor-0.1.8.tar.gz
Algorithm Hash digest
SHA256 3bb10853d7c1d943501b9a3c6728f0ef0dba1bc9cc68379035311a8525bcb6e9
MD5 802a2a68eca47cfb0d16e45fba70f280
BLAKE2b-256 7f19f3930f267f012153f30443e746205606b93c191ba6fd0cfd99f1a7a1e1fd

See more details on using hashes here.

File details

Details for the file manga_wordlist_extractor-0.1.8-py3-none-any.whl.

File metadata

File hashes

Hashes for manga_wordlist_extractor-0.1.8-py3-none-any.whl
Algorithm Hash digest
SHA256 3b1809b97a601d91458798e40dfc625606633a763717ca12cc42debf4c8d6cd7
MD5 e38fbc1eaa3df3d0f4da5fa8c0fb40c6
BLAKE2b-256 bae1035f27303d60b7702f3019a7ba4da12f87fc4ebfe503a36b4c0822b21a0c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page