Skip to main content

Toolkit that simplifies corpus processing

Project description

RuMor: Russian Morphology project

Corpuscula: a python NLP library for corpus processing

PyPI Version Python Version License: BSD-3

A part of RuMor project. It contains tools to simplify corpus processing. Highlights are:

  • full CoNLL-U support (includes CoNLL-U Plus)
  • wrappers for known corpora of Russian language
  • parser and wrapper for Russian part of Wikipedia
  • Corpus Dictionary that can be used for further morphology processing
  • simple database to keep named entities

Installation

pip

Corpuscula supports Python 3.5 or later. To install it via pip, run:

$ pip install corpuscula

If you currently have a previous version of Corpuscula installed, use:

$ pip install corpuscula -U

From Source

Alternatively, you can also install Corpuscula from source of this git repository:

$ git clone https://github.com/fostroll/corpuscula.git
$ cd corpuscula
$ pip install -e .

This gives you access to examples and data that are not included to the PyPI package.

Setup

After installation, you need to specify a directory where you prefer to store downloaded corpora:

>>> import corpuscula.corpus_utils as cu
>>> cu.set_root_dir(<path>)  # We will keep corpora here

NB: it will create/update config file .rumor in your home directory.

If you won't set the root directory, Corpuscula will keep corpora in the directory where it's installed.

Usage

CoNLL-U Support

Management of Corpora

Wrapper for Wikipedia

Corpus Dictionary

Utilities

Items database

Examples

You can find examples in the directory examples of our Corpuscula github repository.

License

Corpuscula is released under the BSD License. See the LICENSE file for more details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

corpuscula-1.0.56.tar.gz (30.2 kB view details)

Uploaded Source

Built Distribution

corpuscula-1.0.56-py3-none-any.whl (32.1 kB view details)

Uploaded Python 3

File details

Details for the file corpuscula-1.0.56.tar.gz.

File metadata

  • Download URL: corpuscula-1.0.56.tar.gz
  • Upload date:
  • Size: 30.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.1.0 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.8.5

File hashes

Hashes for corpuscula-1.0.56.tar.gz
Algorithm Hash digest
SHA256 38c15c3230f976c86b5f8732a337e68fadcb5f74d0103f40ec6fec0f6413a1a7
MD5 bcf748028131f3a956910d18f13ca9c4
BLAKE2b-256 050a5135585f53c4692201f286432b66b9b06c90569d877934eebfe97c8b7145

See more details on using hashes here.

File details

Details for the file corpuscula-1.0.56-py3-none-any.whl.

File metadata

  • Download URL: corpuscula-1.0.56-py3-none-any.whl
  • Upload date:
  • Size: 32.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.1.0 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.8.5

File hashes

Hashes for corpuscula-1.0.56-py3-none-any.whl
Algorithm Hash digest
SHA256 449cdc9f856db31985f61eca50aef40653ebcd4586057ecff1045087c78e956d
MD5 92c7bd272becfbdf9ed021d905ca2b06
BLAKE2b-256 be33667c397e98e76ebf6c6bb4b6d0a33f1d6bc499793ded77cd9b1a7a771a9c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page