Last released Jul 7, 2025
Toolkit for pre-processing LLM training data.
Last released Apr 5, 2024
Papermage. Casting magic over scientific PDFs.
Supported by