Normalize/clean text from PDF OCR/extraction (PUA bullets, quotes, dashes, NBSP, control chars)
Project description
textnormx
Cleaning extracted text (PDF/OCR): PUA bullets (\uf0b7), NBSP, quotation marks, dashes,
control characters, summary lines, etc.
Install
pip install textnormx
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
textnormx-0.2.0.tar.gz
(16.3 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file textnormx-0.2.0.tar.gz.
File metadata
- Download URL: textnormx-0.2.0.tar.gz
- Upload date:
- Size: 16.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
19cd0ae4485f5160954794ba555fbae8e502d520be26d0f9c3ba97a506f147c6
|
|
| MD5 |
79c5c38f95cda19c5f941d49f742c98a
|
|
| BLAKE2b-256 |
805b5a4b2f920f49d79be7131936f3c04e4ab0e4c92c91ddb14af8d8853c60b6
|
File details
Details for the file textnormx-0.2.0-py3-none-any.whl.
File metadata
- Download URL: textnormx-0.2.0-py3-none-any.whl
- Upload date:
- Size: 9.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4997625f467ce42461bd0dc224c32db5998fb85f04ef14314d9830dd53bbba63
|
|
| MD5 |
cbd43931e6d99b9cbf158dac34d12df3
|
|
| BLAKE2b-256 |
1ef7f3cdacdf59a6af1fde83cdf8c4ca97274d38cb265acfcbd9007313dc60ec
|