OCR for Japanese manga

These details have been verified by PyPI

Project links

Homepage

GitHub Statistics

Maintainers

kha-white

These details have not been verified by PyPI

Project description

Manga OCR

Optical character recognition for Japanese text, with the main focus being Japanese manga. It uses a custom end-to-end model built with Transformers' Vision Encoder Decoder framework.

Manga OCR can be used as a general purpose printed Japanese OCR, but its main goal was to provide a high quality text recognition, robust against various scenarios specific to manga:

both vertical and horizontal text
text with furigana
text overlaid on images
wide variety of fonts and font styles
low quality images

Unlike many OCR models, Manga OCR supports recognizing multi-line text in a single forward pass, so that text bubbles found in manga can be processed at once, without splitting them into lines.

Installation

You need Python 3.6 or newer. Please note, that the newest Python release might not be supported due to a PyTorch dependency, which often breaks with new Python releases and needs some time to catch up. Refer to PyTorch website for a list of supported Python versions.

Some users have reported problems with Python installed from Microsoft Store. If you see an error: ImportError: DLL load failed while importing fugashi: The specified module could not be found., try installing Python from the official site.

If you want to run with GPU, install PyTorch as described here, otherwise this step can be skipped.

Troubleshooting

ImportError: DLL load failed while importing fugashi: The specified module could not be found. - might be because of Python installed from Microsoft Store, try installing Python from the official site
problem with installing mecab-python3 on ARM architecture - try this workaround

Usage

Python API

from manga_ocr import MangaOcr

mocr = MangaOcr()
text = mocr('/path/to/img')

import PIL.Image

from manga_ocr import MangaOcr

mocr = MangaOcr()
img = PIL.Image.open('/path/to/img')
text = mocr(img)

Running in the background

Manga OCR can run in the background and process new images as they appear.

You might use a tool like ShareX or Flameshot to manually capture a region of the screen and let the OCR read it either from the system clipboard, or a specified directory. By default, Manga OCR will write recognized text to clipboard, from which it can be read by a dictionary like Yomichan.

Clipboard mode on Linux requires wl-copy for Wayland sessions or xclip for X11 sessions. You can find out which one your system needs by running echo $XDG_SESSION_TYPE in the terminal.

Your full setup for reading manga in Japanese with a dictionary might look like this:

capture region with ShareX -> write image to clipboard -> Manga OCR -> write text to clipboard -> Yomichan

https://user-images.githubusercontent.com/22717958/150238361-052b95d1-0152-485f-a441-48a957536239.mp4

To read images from clipboard and write recognized texts to clipboard, run in command line:
```
manga_ocr
```
To read images from ShareX's screenshot folder, run in command line:
```
manga_ocr "/path/to/sharex/screenshot/folder"
```

Note that when running in the clipboard scanning mode, any image that you copy to clipboard will be processed by OCR and replaced by recognized text. If you want to be able to copy and paste images as usual, you should use the folder scanning mode instead and define a separate task in ShareX just for OCR, which saves screenshots to some folder without copying them to clipboard.

When running for the first time, downloading the model (~400 MB) might take a few minutes. The OCR is ready to use after OCR ready message appears in the logs.

To see other options, run in command line:
```
manga_ocr --help
```

If manga_ocr doesn't work, you might also try replacing it with python -m manga_ocr.

Usage tips

OCR supports multi-line text, but the longer the text, the more likely some errors are to occur. If the recognition failed for some part of a longer text, you might try to run it on a smaller portion of the image.
The model was trained specifically to handle manga well, but should do a decent job on other types of printed text, such as novels or video games. It probably won't be able to handle handwritten text though.
The model always attempts to recognize some text on the image, even if there is none. Because it uses a transformer decoder (and therefore has some understanding of the Japanese language), it might even "dream up" some realistically looking sentences! This shouldn't be a problem for most use cases, but it might get improved in the next version.

Examples

Here are some cherry-picked examples showing the capability of the model.

image	Manga OCR result
	素直にあやまるしか
	立川で見た〝穴〟の下の巨大な眼は：
	実戦剣術も一流です
	第３０話重苦しい闇の奥で静かに呼吸づきながら
	よかったじゃないわよ！何逃げてるのよ！！早くあいつを退治してよ！
	ぎゃっ
	ピンポーーン
	ＬＩＮＫ！私達７人の力でガノンの塔の結界をやぶります
	ファイアパンチ
	少し黙っている
	わかるかな〜？
	警察にも先生にも町中の人達に！！

Contact

For any inquiries, please feel free to contact me at kha-white@mail.com

Acknowledgments

This project was done with the usage of:

Manga109-s dataset
CC-100 dataset

Project details

These details have been verified by PyPI

Project links

Homepage

GitHub Statistics

Maintainers

kha-white

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.14

Jan 1, 2025

0.1.13

Oct 11, 2024

0.1.12

Jun 21, 2024

0.1.12b5 pre-release

Jun 21, 2024

0.1.11

Aug 27, 2023

0.1.10

May 7, 2023

0.1.9

May 7, 2023

0.1.8

Nov 5, 2022

0.1.7

Mar 9, 2022

0.1.6 yanked

Mar 9, 2022

Reason this release was yanked:

bug

0.1.5

Jan 23, 2022

0.1.4

Jan 21, 2022

0.1.3

Jan 20, 2022

0.1.2

Jan 20, 2022

0.1.1

Jan 17, 2022

0.1.0

Jan 17, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

manga_ocr-0.1.14.tar.gz (1.2 MB view details)

Uploaded Jan 1, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

manga_ocr-0.1.14-py3-none-any.whl (69.4 kB view details)

Uploaded Jan 1, 2025 Python 3

File details

Details for the file manga_ocr-0.1.14.tar.gz.

File metadata

Download URL: manga_ocr-0.1.14.tar.gz
Upload date: Jan 1, 2025
Size: 1.2 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.0.1 CPython/3.12.8

File hashes

Hashes for manga_ocr-0.1.14.tar.gz
Algorithm	Hash digest
SHA256	`6c82b560fd60123433ca79b5f8e8d31eae1f86cbb732c73879dd5564a66aaf84`
MD5	`15fba6b399fbea4a9dbb75a0ce15d47f`
BLAKE2b-256	`ccae86feb76b749f1f599964d8699b9e8fd7e8aa49e7e5ef1e73ae36102c7cf9`

See more details on using hashes here.

Provenance

The following attestation bundles were made for manga_ocr-0.1.14.tar.gz:

Publisher: publish-to-pypi.yml on kha-white/manga-ocr

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: manga_ocr-0.1.14.tar.gz
- Subject digest: 6c82b560fd60123433ca79b5f8e8d31eae1f86cbb732c73879dd5564a66aaf84
- Sigstore transparency entry: 158761298
- Sigstore integration time: Jan 1, 2025
Source repository:
- Permalink: kha-white/manga-ocr@194e7b54cfa5e5d2f910cf99a83a383f372187ee
- Branch / Tag: refs/tags/v0.1.14
- Owner: https://github.com/kha-white
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish-to-pypi.yml@194e7b54cfa5e5d2f910cf99a83a383f372187ee
- Trigger Event: push

File details

Details for the file manga_ocr-0.1.14-py3-none-any.whl.

File metadata

Download URL: manga_ocr-0.1.14-py3-none-any.whl
Upload date: Jan 1, 2025
Size: 69.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.0.1 CPython/3.12.8

File hashes

Hashes for manga_ocr-0.1.14-py3-none-any.whl
Algorithm	Hash digest
SHA256	`23730d277b568be37a9d89046bddc1f806bf1d57483e4073fe49bc6b17351545`
MD5	`18f2b89b6482f5d3297faa98b1d4a25a`
BLAKE2b-256	`395ea3d37403ab385cf50e7e825f5bbc6bd029d63006d62c9106968e833fab40`

See more details on using hashes here.

Provenance

The following attestation bundles were made for manga_ocr-0.1.14-py3-none-any.whl:

Publisher: publish-to-pypi.yml on kha-white/manga-ocr

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: manga_ocr-0.1.14-py3-none-any.whl
- Subject digest: 23730d277b568be37a9d89046bddc1f806bf1d57483e4073fe49bc6b17351545
- Sigstore transparency entry: 158761299
- Sigstore integration time: Jan 1, 2025
Source repository:
- Permalink: kha-white/manga-ocr@194e7b54cfa5e5d2f910cf99a83a383f372187ee
- Branch / Tag: refs/tags/v0.1.14
- Owner: https://github.com/kha-white
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish-to-pypi.yml@194e7b54cfa5e5d2f910cf99a83a383f372187ee
- Trigger Event: push

manga-ocr 0.1.14

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

Manga OCR

Installation

Troubleshooting

Usage

Python API

Running in the background

Usage tips

Examples

Contact

Acknowledgments

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance