Normalize/clean text from PDF OCR/extraction (PUA bullets, quotes, dashes, NBSP, control chars)
Project description
textnormx
Cleaning extracted text (PDF/OCR): PUA bullets (\uf0b7), NBSP, quotation marks, dashes,
control characters, summary lines, etc.
Install
pip install textnormx
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
textnormx-0.2.1.tar.gz
(32.8 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
textnormx-0.2.1-py3-none-any.whl
(12.6 kB
view details)
File details
Details for the file textnormx-0.2.1.tar.gz.
File metadata
- Download URL: textnormx-0.2.1.tar.gz
- Upload date:
- Size: 32.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a856157838c73f1161356c3fd60bfeaade66bd4603c5775cdce924d4b10222e7
|
|
| MD5 |
c6e307aa017780f70f447de6fbd234fa
|
|
| BLAKE2b-256 |
d165855ee1aee69fc64200d46444f186340a0d9229a1d45e5e04d20aff3a8bf4
|
File details
Details for the file textnormx-0.2.1-py3-none-any.whl.
File metadata
- Download URL: textnormx-0.2.1-py3-none-any.whl
- Upload date:
- Size: 12.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b8b48eadf9bb539ac97ae4a607ce4e1e1a25facf1821e460df88dd98604ab5b1
|
|
| MD5 |
5ffa28037c20f21fa76b89ec5af76fd2
|
|
| BLAKE2b-256 |
0f87513abbbe1380c87bbb2f7c8c5237cadd7213cacfc4f0495c55241859a1a0
|