6 projects
SoMaJo
A tokenizer and sentence splitter for German and English web and social media texts.
SoMeWeTa
A part-of-speech tagger with support for domain adaptation and external resources.
textcomplexity
Linguistic and stylistic complexity measures for text
CorpConv
A converter between various corpus formats.
Pareidoscope
A collection of tools for determining the association between arbitrary linguistic structures.
Usurper
An unsupervised dependency parser.