Skip to main content

Tools for processing language data.

Project description

Documentation status PyPI package Code style

Installation

$ pip3 install corpy

Only recent versions of Python 3 (3.6+) are supported by design.

What is CorPy?

A fancy plural for corpus ;) Also, a collection of handy but not especially mutually integrated tools for dealing with linguistic data. It abstracts away functionality which is often needed in practice for teaching and/or day to day work at the Czech National Corpus, without aspiring to be a fully featured or consistent NLP framework.

The short URL to the docs is: https://corpy.rtfd.io/

Here’s an idea of what you can do with CorPy:

  • add linguistic annotation to raw textual data using either UDPipe or MorphoDiTa

Development

Dependencies and building the docs

The canonical dependency requirements are listed in pyproject.toml and frozen in poetry.lock. However, in order to use autodoc to build the API docs, the package has to be installed, and corpy has dependencies that are too resource-intensive to build on ReadTheDocs.

The solution is to use a dummy setup.py which lists only the dependencies needed to build the docs properly, and mock all other dependencies by listing them in autodoc_mock_imports in docs/conf.py. This dummy setup.py is used to install corpy only on ReadTheDocs (via the appropriate config option in .readthedocs.yml). The same goes for the MANIFEST.in file, which duplicates the tool.poetry.include entries in pyproject.toml for the sole benefit of ReadTheDocs.

License

Copyright © 2016–present ÚČNK/David Lukeš

Distributed under the GNU General Public License v3.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

corpy-0.2.3.tar.gz (30.0 kB view hashes)

Uploaded Source

Built Distribution

corpy-0.2.3-py3-none-any.whl (32.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page