2 projects
textpraline
TextPraline is a text normalization and refinement engine designed to prepare raw extracted content for reliable downstream processing. It cleans and stabilizes text coming from: PDF extractors , OCR pipelines, HTML scrapers. Praline removes structural noise and invisible corruption without altering meaning.
textnormx
Normalize/clean text from PDF OCR/extraction (PUA bullets, quotes, dashes, NBSP, control chars)