Skip to main content

Normalize/clean text from PDF OCR/extraction (PUA bullets, quotes, dashes, NBSP, control chars)

Project description

textnormx

Cleaning extracted text (PDF/OCR): PUA bullets (\uf0b7), NBSP, quotation marks, dashes, control characters, summary lines, etc.

Install

pip install textnormx

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

textnormx-0.2.1.tar.gz (32.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

textnormx-0.2.1-py3-none-any.whl (12.6 kB view details)

Uploaded Python 3

File details

Details for the file textnormx-0.2.1.tar.gz.

File metadata

  • Download URL: textnormx-0.2.1.tar.gz
  • Upload date:
  • Size: 32.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for textnormx-0.2.1.tar.gz
Algorithm Hash digest
SHA256 a856157838c73f1161356c3fd60bfeaade66bd4603c5775cdce924d4b10222e7
MD5 c6e307aa017780f70f447de6fbd234fa
BLAKE2b-256 d165855ee1aee69fc64200d46444f186340a0d9229a1d45e5e04d20aff3a8bf4

See more details on using hashes here.

File details

Details for the file textnormx-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: textnormx-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 12.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for textnormx-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 b8b48eadf9bb539ac97ae4a607ce4e1e1a25facf1821e460df88dd98604ab5b1
MD5 5ffa28037c20f21fa76b89ec5af76fd2
BLAKE2b-256 0f87513abbbe1380c87bbb2f7c8c5237cadd7213cacfc4f0495c55241859a1a0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page