26 projects
ushmm
A suite of tools for working with data at the United States Holocaust Memorial Museum
old-doc
Easily create synthetic data for HTR and OCR
bagpipes-spacy
A collection of spaCy components for rules-based detection and extraction.
spacy-chunks
An easy way to chunk spacy docs.
gliner-spacy
A SpaCy wrapper for the GLiNER model for enhanced Named Entity Recognition capabilities
spacyex
An extension for spaCy, making pattern matching as flexible as using regular expressions.
spacy-aligner
A spaCy component for connecting entities and building relational graphs in text.
gliner-finetune
A library to create synthetic data with OpenAI and train a GLiNER model on that data.
spacy-annoy
A Python package integrating Spacy and Annoy for efficient text search and analysis
gender-spacy
A spaCy component for identifying grammatical gender in English texts.
spacy-whisper
Integrate Whisper transcriptions with spaCy for advanced NLP tasks
leet-topic
A new transformer-based topic modeling library.
florida
A collection of Python utilities to simplify coding tasks
weaviate-filter
A package for creating GraphQL filters for Weaviate
label-mapper
A spaCy extension to map NER labels.
keyword-spacy
A spaCy pipeline component for extracting keywords from text using cosine similarity.
number-spacy
A spaCy extension for enhanced number entity recognition and extraction as structured data.
date-spacy
A spaCy extension for enhanced date and number entity recognition and extraction as structured data.
dna-spacy
A spaCy library for working with DNA Sequences.
en-hobbit
This is a spaCy package for working with Middle Earth Data.
spacy-utils
Some basic spaCy utility functions.
streamlit-pandas
Create a Streamlit Pandas App
en-ww2spacy
This is a spaCy pipeline for processing primary and secondary sources pertaining to World War 2
en-biospacy
A spaCy pipeline for parsing biology texts. Data for the plant EntityRuler was found at: http://www.worldfloraonline.org/
vulgata-spacy
A library for finding Vulgate references in Medieval Latin texts.