Document recognizer for multiple languages.
Project description
batukh
Detection of Languages using CRNN.
Installation
Using pip
For tensorflow only installation:
pip install batukh[tf]
For pytorch only installation:
pip install batukh[torch]
For tensorflow and pytorch installation:
pip install batukh[full]
:heavy_exclamation_mark: Warning:
A simple
pip install batukh
will install neither tensorflow nor pytorch dependencies.
Training
After all the dependencies have been installed, you can train any model.
For Page Extraction(tensorflow):
>>> from batukh.tensorflow.segmenter import PageExtractor
>>> page_extractor = PageExtractor()
>>> page_extractor.load_data("/path/to/data/")
>>> page_extractor.train(n_epochs=10, batch_size=1,repeat=1)
Initializing from scratch
Epoch: 1. Traininig: 100%|██████████| 70/70 [00:02<00:00, 23.95it/s, loss=0.0708]
Epoch: 2. Traininig: 100%|██████████| 70/70 [00:02<00:00, 24.35it/s, loss=0.0682]
Model saved to /content/tf_ckpts/Fri Oct 16 08:23:13 2020/ckpt-14280
Epoch: 3. Traininig: 100%|██████████| 70/70 [00:02<00:00, 23.69it/s, loss=0.0658]
Epoch: 4. Traininig: 100%|██████████| 70/70 [00:02<00:00, 24.74it/s, loss=0.0636]
Model saved to /content/tf_ckpts/Fri Oct 16 08:23:13 2020/ckpt-14420
Epoch: 5. Traininig: 100%|██████████| 70/70 [00:02<00:00, 23.68it/s, loss=0.0616]
Epoch: 6. Traininig: 100%|██████████| 70/70 [00:02<00:00, 23.95it/s, loss=0.0597]
Model saved to /content/tf_ckpts/Fri Oct 16 08:23:13 2020/ckpt-14560
Epoch: 7. Traininig: 100%|██████████| 70/70 [00:03<00:00, 23.24it/s, loss=0.0579]
Epoch: 8. Traininig: 100%|██████████| 70/70 [00:03<00:00, 23.23it/s, loss=0.0563]
Model saved to /content/tf_ckpts/Fri Oct 16 08:23:13 2020/ckpt-14700
Epoch: 9. Traininig: 100%|██████████| 70/70 [00:02<00:00, 23.44it/s, loss=0.0548]
Epoch: 10. Traininig: 100%|██████████| 70/70 [00:02<00:00, 23.54it/s, loss=0.0533]
Model saved to /content/tf_ckpts/Fri Oct 16 08:23:13 2020/ckpt-14840
For OCR(tensorflow):
>>> from batukh.tensorflow.ocr import OCR
>>> m = OCR()
>>> m.load_data("/path/to/data")
>>> m.train(10,batch_size=1,repeat=1)
Initializing from scratch
Epoch: 1. Traininig: 100%|██████████| 70/70 [00:04<00:00, 15.84it/s, loss=37.1]
Epoch: 2. Traininig: 100%|██████████| 70/70 [00:02<00:00, 23.57it/s, loss=29.7]
Model saved to tf_ckpts/Fri Oct 16 09:44:35 2020/ckpt-140
Epoch: 3. Traininig: 100%|██████████| 70/70 [00:02<00:00, 24.01it/s, loss=26.8]
Epoch: 4. Traininig: 100%|██████████| 70/70 [00:02<00:00, 23.84it/s, loss=25.3]
Model saved to tf_ckpts/Fri Oct 16 09:44:35 2020/ckpt-280
Epoch: 5. Traininig: 100%|██████████| 70/70 [00:02<00:00, 23.46it/s, loss=24.4]
Epoch: 6. Traininig: 100%|██████████| 70/70 [00:02<00:00, 24.33it/s, loss=23.8]
Model saved to tf_ckpts/Fri Oct 16 09:44:35 2020/ckpt-420
Epoch: 7. Traininig: 100%|██████████| 70/70 [00:02<00:00, 23.96it/s, loss=23.3]
Epoch: 8. Traininig: 100%|██████████| 70/70 [00:02<00:00, 23.67it/s, loss=22.9]
Model saved to tf_ckpts/Fri Oct 16 09:44:35 2020/ckpt-560
Epoch: 9. Traininig: 100%|██████████| 70/70 [00:03<00:00, 23.22it/s, loss=22.6]
Epoch: 10. Traininig: 100%|██████████| 70/70 [00:02<00:00, 23.52it/s, loss=22.3]
Model saved to tf_ckpts/Fri Oct 16 09:44:35 2020/ckpt-700
For Baseline Detection(pytorch):
>>> from batukh.torch.segmenter import BaselineDetector
>>> m = BaselineDetector()
<All keys matched successfully>
>>> m.load_data("/path/to/data")
>>> m.train(n_epochs=10, device="cpu")
For OCR(pytorch):
>>> from batukh.torch.ocr import OCR
>>> m = OCR()
>>> m.load_data("/path/to/train_dir", "/path/to/train_labels",
"/path/to/val_dir", "/path/to/val_labels")
Building Dictionary. . .
Building Dictionary. . .
>>> m.train(5)
Epoch: 0. Traininig: 100%|███████████████| 140/140 [00:04<00:00, 30.18it/s, loss=2.59]
Epoch: 0. Validating: 100%|███████████████| 140/140 [00:01<00:00, 112.06it/s, loss=2.59]
Models Saved!
Epoch: 1. Traininig: 100%|███████████████| 140/140 [00:04<00:00, 32.39it/s, loss=2.36]
Epoch: 1. Validating: 100%|███████████████| 140/140 [00:01<00:00, 121.36it/s, loss=2.18]
Models Saved!
Epoch: 2. Traininig: 100%|███████████████| 140/140 [00:04<00:00, 31.12it/s, loss=2.54]
Epoch: 2. Validating: 100%|███████████████| 140/140 [00:01<00:00, 108.65it/s, loss=2.48]
Models Saved!
Epoch: 3. Traininig: 100%|███████████████| 140/140 [00:04<00:00, 31.10it/s, loss=2.48]
Epoch: 3. Validating: 100%|███████████████| 140/140 [00:01<00:00, 109.46it/s, loss=2.42]
Models Saved!
Epoch: 4. Traininig: 100%|███████████████| 140/140 [00:04<00:00, 30.17it/s, loss=2.49]
Epoch: 4. Validating: 100%|███████████████| 140/140 [00:01<00:00, 110.09it/s, loss=2.42]
Models Saved!
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file batukh-0.1.1.1.tar.gz
.
File metadata
- Download URL: batukh-0.1.1.1.tar.gz
- Upload date:
- Size: 25.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.7.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | bc1983c1e1a0269107def1255fd5551a88c4bd6931afb0e02f03d744c1826159 |
|
MD5 | 96afc43ef8d2aa34e18f4dc12d430622 |
|
BLAKE2b-256 | 516e5a60abb3126841bbf725f59f00cc033e2a246a69711b8e58441a9684eae1 |
File details
Details for the file batukh-0.1.1.1-py3-none-any.whl
.
File metadata
- Download URL: batukh-0.1.1.1-py3-none-any.whl
- Upload date:
- Size: 43.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.7.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 984589ce6a54b6a712b66dbd88c1519143725bb4aa444c26c64bbfaa73401103 |
|
MD5 | 2e938709d86933d91fef93423fb12f63 |
|
BLAKE2b-256 | 0a881d4f430d904f672d1fa8fa03f8188d6fb7f212ef2a58eadf59346672db2e |