Last released Jul 16, 2024
Easier to deal with PDF, extract readable text and OCR to recognise image text and clean the format. Make it more suitable for knowledge base construction. Best performance with Doc2X.
Supported by