Tools for processing language data.
Project description
What is CorPy?
A fancy plural for corpus ;) Also, a collection of handy but not especially mutually integrated tools for dealing with linguistic data. It abstracts away functionality which is often needed in practice for teaching and/or day to day work at the Czech National Corpus, without aspiring to be a fully featured or consistent NLP framework.
The short URL to the docs is: https://corpy.rtfd.io/
Here’s an idea of what you can do with CorPy:
tokenize and morphologically tag raw textual data using MorphoDiTa
wrangle corpora in the vertical format devised originally for CWB, used also by (No)SketchEngine
plus some command line utilities
Installation
$ pip3 install corpy
Requirements
Only recent versions of Python 3 (3.6+) are supported by design.
License
Copyright © 2016–present ÚČNK/David Lukeš
Distributed under the GNU General Public License v3.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.