ocrodjvu

OCR for DjVu (Python 3 fork)

These details have been verified by PyPI

Project links

Homepage

GitHub Statistics

Maintainers

FriedrichFroebel

These details have not been verified by PyPI

Development Status
- 4 - Beta
Environment
- Console
Intended Audience
- End Users/Desktop
License
- OSI Approved :: GNU General Public License (GPL)
Operating System
- OS Independent
Programming Language
Topic
- Multimedia :: Graphics
- Text Processing

Project description

Overview

ocrodjvu is a wrapper for OCR systems that allows you to perform OCR on DjVu files.

Example

$ wget -q 'https://sources.debian.org/data/main/o/ocropus/0.3.1-3/data/pages/alice_1.png'
$ gm convert -threshold 50% 'alice_1.png' 'alice.pbm'
$ cjb2 'alice.pbm' 'alice.djvu'
$ ocrodjvu --in-place 'alice.djvu'
Processing 'alice.djvu':
- Page #1
$ djvused -e print-txt 'alice.djvu'
(page 0 0 2488 3507
 (column 470 2922 1383 2978
  (para 470 2922 1383 2978
   (line 470 2922 1383 2978
    (word 470 2927 499 2976 "1")
    (word 588 2926 787 2978 "Down")
    (word 817 2925 927 2977 "the")
    (word 959 2922 1383 2976 "Rabbit-Hole"))))
 (column 451 707 2076 2856
  (para 463 2626 2076 2856
   (line 465 2803 2073 2856
    (word 465 2819 569 2856 "Alice")
    (word 592 2819 667 2841 "was")
    (word 690 2808 896 2854 "beginning")
⋮

Requisites

The following software is required to run ocrodjvu:

Python 3
an OCR engine:
- Cuneiform ≥ 0.7
- Ocrad ≥ 0.10
- GOCR ≥ 0.40
- Tesseract ≥ 2.00
DjVuLibre ≥ 3.5.26
djvulibre-python ≥ 0.9
lxml ≥ 2.0

Additionally, some optional features require the following software:

PyICU ≥ 1.0.1 — required for the --word-segmentation=uax29 option
html5lib — required for the --html5 option

The following software is required to rebuild the manual pages from source:

Installation

The easiest way to install ocrodjvu is from PyPI:

pip install ocrodjvu

Alternatively, you can use ocrodjvu without installing it, straight out of an unpacked source tarball or a VCS checkout.

It’s also possible to install it from source for the current interpreter with:

pip install .

The man pages can be deployed using:

make install_manpage

By default, make install_manpage installs them to /usr/local/. You can specify a different installation prefix by setting the PREFIX variable, e.g.:

make install PREFIX="$HOME/.local"

About this fork

This repository is a port of the original repository to Python 3.

The process involved the 2to3 tool and manual fixes afterwards to get the existing tests to pass. While this port started from scratch to already include the latest upstream changes, the fork by @rmast which accumulated previous porting attempts provided some great help (see Issue #39 as well).

Due to the upstream repository having been archived (Issue #46), this fork will now be maintained on its own. Please note that I do not have any plans on implementing completely new features for now. Nevertheless, I am going to try to keep this fork working for at least the parts which I actually use on a regular basis.

Differences from upstream

Package requires Python ≥ 3.6.
Migrate from nose to plain unittest stdlib module.
Conform to PEP8 coding style.
Use standardized setup.py-based installation.
Rename lib to ocrodjvu and migrate ocrodjvu binary to __main__.py and console script version.
Drop support for ocropus/ocropy as only the rather old legacy versions ≤ 0.3.1 from 2008 have been supported.

Acknowledgment

ocrodjvu development was supported by the Polish Ministry of Science and Higher Education’s grant no. N N519 384036 (2009–2012, https://bitbucket.org/jsbien/ndt).

Project details

These details have been verified by PyPI

Project links

Homepage

GitHub Statistics

Maintainers

FriedrichFroebel

These details have not been verified by PyPI

Development Status
- 4 - Beta
Environment
- Console
Intended Audience
- End Users/Desktop
License
- OSI Approved :: GNU General Public License (GPL)
Operating System
- OS Independent
Programming Language
Topic
- Multimedia :: Graphics
- Text Processing

Release history Release notifications | RSS feed

This version

0.14

Dec 13, 2024

0.13.2

Nov 15, 2024

0.13.1

May 15, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ocrodjvu-0.14.tar.gz (53.9 kB view details)

Uploaded Dec 13, 2024 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ocrodjvu-0.14-py3-none-any.whl (58.2 kB view details)

Uploaded Dec 13, 2024 Python 3

File details

Details for the file ocrodjvu-0.14.tar.gz.

File metadata

Download URL: ocrodjvu-0.14.tar.gz
Upload date: Dec 13, 2024
Size: 53.9 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.0.1 CPython/3.12.8

File hashes

Hashes for ocrodjvu-0.14.tar.gz
Algorithm	Hash digest
SHA256	`f81d34e6f5f6f76456d83ce259c8b81cd657b9ca4ce7529e144a41be0c013847`
MD5	`a8a616051b0c36dec51b12a5a1e98280`
BLAKE2b-256	`80aeed7eb357386c4678de0d1915f5364fff4de9b92d490ea22001de6e642337`

See more details on using hashes here.

Provenance

The following attestation bundles were made for ocrodjvu-0.14.tar.gz:

Publisher: release.yml on FriedrichFroebel/ocrodjvu

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: ocrodjvu-0.14.tar.gz
- Subject digest: f81d34e6f5f6f76456d83ce259c8b81cd657b9ca4ce7529e144a41be0c013847
- Sigstore transparency entry: 155208113
- Sigstore integration time: Dec 13, 2024
Source repository:
- Permalink: FriedrichFroebel/ocrodjvu@925756865683782ac82efa2686262a295a7eed44
- Branch / Tag: refs/tags/0.14
- Owner: https://github.com/FriedrichFroebel
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@925756865683782ac82efa2686262a295a7eed44
- Trigger Event: release

File details

Details for the file ocrodjvu-0.14-py3-none-any.whl.

File metadata

Download URL: ocrodjvu-0.14-py3-none-any.whl
Upload date: Dec 13, 2024
Size: 58.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.0.1 CPython/3.12.8

File hashes

Hashes for ocrodjvu-0.14-py3-none-any.whl
Algorithm	Hash digest
SHA256	`010652603b621e59ab927cf48ebef06ab6bb0e623a021ebeb09620b690f24645`
MD5	`a83df7dbc75da1681deb1970117a5e5c`
BLAKE2b-256	`097a541f8f8ef1eadf587372beb3ccfe30b6b9229f39340de9cfa1891aa8671d`

See more details on using hashes here.

Provenance

The following attestation bundles were made for ocrodjvu-0.14-py3-none-any.whl:

Publisher: release.yml on FriedrichFroebel/ocrodjvu

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: ocrodjvu-0.14-py3-none-any.whl
- Subject digest: 010652603b621e59ab927cf48ebef06ab6bb0e623a021ebeb09620b690f24645
- Sigstore transparency entry: 155208114
- Sigstore integration time: Dec 13, 2024
Source repository:
- Permalink: FriedrichFroebel/ocrodjvu@925756865683782ac82efa2686262a295a7eed44
- Branch / Tag: refs/tags/0.14
- Owner: https://github.com/FriedrichFroebel
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@925756865683782ac82efa2686262a295a7eed44
- Trigger Event: release

ocrodjvu 0.14

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

Overview

Example

Requisites

Installation

About this fork

Differences from upstream

Acknowledgment

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance