6 projects
scispacy
A full SpaCy pipeline and models for scientific/biomedical documents.
papermage
Papermage. Casting magic over scientific PDFs.
smashed
SMASHED is a toolkit designed to apply transformations to samples in datasets, such as fields extraction, tokenization, prompting, batching, and more. Supports datasets from Huggingface, torchdata iterables, or simple lists of dictionaries.
shadow-scholar
🎓🕶️ A collection of utilities and demos from the Semantic Scholar Research Team 🕶️🎓
mmda
MMDA - multimodal document analysis
scipdf
multimodal document analysis