18 projects
dolma
Data filters
papermage
Papermage. Casting magic over scientific PDFs.
mmdata
MMData is a toolkit for curating multimodal datasets.
tartare
Data filters
tokreate
Unified APIs for making calls to different LLMs.
quickumls
QuickUMLS is a tool for fast, unsupervised biomedical concept extraction from medical text
smashed
SMASHED is a toolkit designed to apply transformations to samples in datasets, such as fields extraction, tokenization, prompting, batching, and more. Supports datasets from Huggingface, torchdata iterables, or simple lists of dictionaries.
decontext
Pipeline for decontextualization of scientific snippets.
springs
A set of utilities to create and manage typed configuration files effectively, built on top of OmegaConf.
necessary
Python package to enforce optional dependencies
shadow-scholar
🎓🕶️ A collection of utilities and demos from the Semantic Scholar Research Team 🕶️🎓
mmda
MMDA - multimodal document analysis
trouting
Trouting (short for Type Routing) is a simple class decorator that allows to define multiple interfaces for a method that behave differently depending on input types.
pyterrier-sentence-transformers
Create an pyterrier index using any sentence-transformers model
scipdf
multimodal document analysis
espresso-config
A struct config parser that you can set up in the
Minimal-Server
Serve a python object through a simple socket; supports multiple connections.
quickumls-simstring
Clone of simstring designed to work with QuickUMLS. Original version here: http://chokkan.org/software/simstring/