Tools for processing language data.
Project description
What is CorPy?
A fancy plural for corpus ;) Also, a collection of handy but not especially mutually integrated tools for dealing with linguistic data. It abstracts away functionality which is often needed in practice in day to day work at the Czech National Corpus, without aspiring to be a fully featured or consistent NLP framework.
Currently available sub-packages are:
morphodita: tokenizing and tagging raw textual data using MorphoDiTa
vertical: parsing corpora in the vertical format devised originally for CWB, used also by (No)SketchEngine
phonetics: rule-based phonetic transcription of Czech
Installation
$ pip3 install corpy
Requirements
Only recent versions of Python 3 are supported by design.
License
Copyright © 2016–present ÚČNK/David Lukeš
Distributed under the GNU General Public License v3.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.