Tools for processing language data.
Project description
Installation
$ python3 -m pip install corpy
Only recent versions of Python 3 (3.10+) are supported by design.
Help and feedback
If you get stuck, it’s always a good idea to start by searching the documentation, the short URL to which is https://corpy.rtfd.io/.
The project is developed on GitHub. You can ask for help via GitHub discussions and report bugs and give other kinds of feedback via GitHub issues. Support is provided gladly, time and other engagements permitting, but cannot be guaranteed.
What is CorPy?
A fancy plural for corpus ;) Also, a collection of handy but not especially mutually integrated tools for dealing with linguistic data. It abstracts away functionality which is often needed in practice for teaching and/or day to day work at the Czech National Corpus, without aspiring to be a fully featured or consistent NLP framework.
Here’s an idea of what you can do with CorPy:
add linguistic annotation to raw textual data using either UDPipe or MorphoDiTa
run code in a sanitized global environment (useful for debugging in interactive sessions, e.g. with Jupyter notebooks in JupyterLab)
wrangle corpora in the vertical format devised originally for CWB, used also by (No)SketchEngine
plus some command line utilities
Development
Dependencies and building the docs
corpy needs to be installed in the ReadTheDocs virtualenv for autodoc to work. The optional dependencies in the doc group are also needed. This is all configured in .readthedocs.yml.
License
Copyright © 2016–present ÚČNK/David Lukeš
Distributed under the GNU General Public License v3.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.