Skip to main content

Normalize/clean text from PDF OCR/extraction (PUA bullets, quotes, dashes, NBSP, control chars)

Project description

textnormx

Cleaning extracted text (PDF/OCR): PUA bullets (\uf0b7), NBSP, quotation marks, dashes, control characters, summary lines, etc.

Install

pip install textnormx

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

textnormx-0.2.0.tar.gz (16.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

textnormx-0.2.0-py3-none-any.whl (9.2 kB view details)

Uploaded Python 3

File details

Details for the file textnormx-0.2.0.tar.gz.

File metadata

  • Download URL: textnormx-0.2.0.tar.gz
  • Upload date:
  • Size: 16.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for textnormx-0.2.0.tar.gz
Algorithm Hash digest
SHA256 19cd0ae4485f5160954794ba555fbae8e502d520be26d0f9c3ba97a506f147c6
MD5 79c5c38f95cda19c5f941d49f742c98a
BLAKE2b-256 805b5a4b2f920f49d79be7131936f3c04e4ab0e4c92c91ddb14af8d8853c60b6

See more details on using hashes here.

File details

Details for the file textnormx-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: textnormx-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 9.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for textnormx-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4997625f467ce42461bd0dc224c32db5998fb85f04ef14314d9830dd53bbba63
MD5 cbd43931e6d99b9cbf158dac34d12df3
BLAKE2b-256 1ef7f3cdacdf59a6af1fde83cdf8c4ca97274d38cb265acfcbd9007313dc60ec

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page