Skip to main content

[CHaracter Ocr COordination for MUFI iN texts] A simple script to maintain a reasonable training set of HTR/OCR characters

Project description

Choco-Mufin

[CHaracter Ocr COordination for MUFI iN texts]

Tools for normalizing the use of some characters and checking file consistencies. Mainly target at dealing with overly diverse ways to transcribe medieval data (allographetic and graphematic for example) while keeping information such as abbreviation, hence MUFI.

Install

pip install chocomufin

Commands

The workflow is generally the following: you generate a conversion table (chocomufin generate table.csv your-files.xml), then use this table to either control (chocomufin control table.csv your-files.xml) or convert them (chocomufin convert table.csv your-files.xml). Conversion will automatically add a suffix which you can define with --suffix.

Example table of conversion

char,name,replacement,codepoint,mufidecode
ī,LATIN SMALL LETTER I WITH MACRON,ĩ,012B,i
ı,LATIN SMALL LETTER DOTLESS I,i,0131,i
ff,LATIN SMALL LIGATURE FF,ff,FB00,ff
A,LATIN CAPITAL LETTER A,A,0041,A
B,LATIN CAPITAL LETTER B,B,0042,B
C,LATIN CAPITAL LETTER C,C,0043,C
D,LATIN CAPITAL LETTER D,D,0044,D

As table:

char name replacement codepoint mufidecode
ī LATIN SMALL LETTER I WITH MACRON ĩ 012B i
ı LATIN SMALL LETTER DOTLESS I i 0131 i
LATIN SMALL LIGATURE FF ff FB00 ff
A LATIN CAPITAL LETTER A A 0041 A
B LATIN CAPITAL LETTER B B 0042 B
C LATIN CAPITAL LETTER C C 0043 C
D LATIN CAPITAL LETTER D D 0044 D

Github Action Template

Just replace the path to table.csv and the file that needs to be tested, then save this file on your repository in .github/workflows/chocomufin.yml:

# This workflow will install Python dependencies, run tests and lint with a single version of Python
# For more information see: https://help.github.com/actions/language-and-framework-guides/using-python-with-github-actions

name: ChocoMufin

on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v2
    - name: Set up Python 3.8
      uses: actions/setup-python@v2
      with:
        python-version: 3.8
    - name: Install dependencies
      run: |
        python -m pip install --upgrade pip
        pip install chocomufin
    - name: Run ChocoMufin
      run: |
        chocomufin control table.csv **/*.xml

Logo by Alix Chagué.

The file original_mufi_json's content is under CC BY-SA 4.0 and comes from https://mufi.info/m.php?p=mufiexport

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chocomufin-0.1.11.tar.gz (39.0 kB view details)

Uploaded Source

Built Distribution

chocomufin-0.1.11-py2.py3-none-any.whl (37.2 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file chocomufin-0.1.11.tar.gz.

File metadata

  • Download URL: chocomufin-0.1.11.tar.gz
  • Upload date:
  • Size: 39.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.6

File hashes

Hashes for chocomufin-0.1.11.tar.gz
Algorithm Hash digest
SHA256 29d067494864359b23e4d9fd20082532ebcc64c7ee0dc700c356ba51640580e5
MD5 0a4ddd8d5f6bdb595541f4a42ecb3922
BLAKE2b-256 131e8135c265fdece3c594fe1893b4ff16a182ab9dd0091a976175d07502adbf

See more details on using hashes here.

File details

Details for the file chocomufin-0.1.11-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for chocomufin-0.1.11-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 f035c7c6d7c467c92734144b9de1cebbafbc4d09ece63d2e634537cc95d12b96
MD5 6a5ead9e34d616d21fdc747620ce3d25
BLAKE2b-256 5823395564e4c950dae9372140e4d30a9df424439f3ed3b910cad0ad5a5a9c04

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page