[CHaracter Ocr COordination for MUFI iN texts] A simple script to maintain a reasonable training set of HTR/OCR characters
Project description
Choco-Mufin
[CHaracter Ocr COordination for MUFI iN texts]
Tools for normalizing the use of some characters and checking file consistencies. Mainly target at dealing with overly diverse ways to transcribe medieval data (allographetic and graphematic for example) while keeping information such as abbreviation, hence MUFI.
Install
pip install chocomufin
Commands
The workflow is generally the following: you generate a conversion table (chocomufin generate table.csv your-files.xml
), then
use this table to either control (chocomufin control table.csv your-files.xml
) or convert them (chocomufin convert table.csv your-files.xml
).
Conversion will automatically add a suffix which you can define with --suffix
.
Example table of conversion
char,name,replacement,codepoint,mufidecode
ī,LATIN SMALL LETTER I WITH MACRON,ĩ,012B,i
ı,LATIN SMALL LETTER DOTLESS I,i,0131,i
ff,LATIN SMALL LIGATURE FF,ff,FB00,ff
A,LATIN CAPITAL LETTER A,A,0041,A
B,LATIN CAPITAL LETTER B,B,0042,B
C,LATIN CAPITAL LETTER C,C,0043,C
D,LATIN CAPITAL LETTER D,D,0044,D
As table:
char | name | replacement | codepoint | mufidecode |
---|---|---|---|---|
ī | LATIN SMALL LETTER I WITH MACRON | ĩ | 012B | i |
ı | LATIN SMALL LETTER DOTLESS I | i | 0131 | i |
ff | LATIN SMALL LIGATURE FF | ff | FB00 | ff |
A | LATIN CAPITAL LETTER A | A | 0041 | A |
B | LATIN CAPITAL LETTER B | B | 0042 | B |
C | LATIN CAPITAL LETTER C | C | 0043 | C |
D | LATIN CAPITAL LETTER D | D | 0044 | D |
Github Action Template
Just replace the path to table.csv
and the file that needs to be tested, then save this file on your repository in
.github/workflows/chocomufin.yml
:
# This workflow will install Python dependencies, run tests and lint with a single version of Python
# For more information see: https://help.github.com/actions/language-and-framework-guides/using-python-with-github-actions
name: ChocoMufin
on: [push, pull_request]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Set up Python 3.8
uses: actions/setup-python@v2
with:
python-version: 3.8
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install chocomufin
- name: Run ChocoMufin
run: |
chocomufin control table.csv **/*.xml
Logo by Alix Chagué.
The file original_mufi_json
's content is under CC BY-SA 4.0
and comes from https://mufi.info/m.php?p=mufiexport
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file chocomufin-0.1.11.tar.gz
.
File metadata
- Download URL: chocomufin-0.1.11.tar.gz
- Upload date:
- Size: 39.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 29d067494864359b23e4d9fd20082532ebcc64c7ee0dc700c356ba51640580e5 |
|
MD5 | 0a4ddd8d5f6bdb595541f4a42ecb3922 |
|
BLAKE2b-256 | 131e8135c265fdece3c594fe1893b4ff16a182ab9dd0091a976175d07502adbf |
File details
Details for the file chocomufin-0.1.11-py2.py3-none-any.whl
.
File metadata
- Download URL: chocomufin-0.1.11-py2.py3-none-any.whl
- Upload date:
- Size: 37.2 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f035c7c6d7c467c92734144b9de1cebbafbc4d09ece63d2e634537cc95d12b96 |
|
MD5 | 6a5ead9e34d616d21fdc747620ce3d25 |
|
BLAKE2b-256 | 5823395564e4c950dae9372140e4d30a9df424439f3ed3b910cad0ad5a5a9c04 |