Toolkit that simplifies corpus processing
Project description
Corpuscula: a python NLP library for corpus processing
A part of RuMor project. It contains tools to simplify corpus processing. Highlights are:
- full CoNLL-U support (includes CoNLL-U Plus)
- wrappers for known corpora of Russian language
- parser and wrapper for Russian part of Wikipedia
- Corpus Dictionary that can be used for further morphology processing
- simple database to keep named entities
Installation
pip
Corpuscula supports Python 3.5 or later. To install it via pip, run:
$ pip install corpuscula
If you currently have a previous version of Corpuscula installed, use:
$ pip install corpuscula -U
From Source
Alternatively, you can also install Corpuscula from source of this git repository:
$ git clone https://github.com/fostroll/corpuscula.git
$ cd corpuscula
$ pip install -e .
This gives you access to examples and data that are not included to the PyPI package.
Setup
After installation, you need to specify a directory where you prefer to store downloaded corpora:
>>> import corpuscula.corpus_utils as cu
>>> cu.set_root_dir(<path>) # We will keep corpora here
NB: it will create/update config file .rumor
in your home directory.
If you won't set the root directory, Corpuscula will keep corpora in the directory where it's installed.
Usage
Examples
You can find examples in the directory examples
of our Corpuscula github
repository.
License
Corpuscula is released under the BSD License. See the LICENSE file for more details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file corpuscula-1.0.56.tar.gz
.
File metadata
- Download URL: corpuscula-1.0.56.tar.gz
- Upload date:
- Size: 30.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.1.0 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.8.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 38c15c3230f976c86b5f8732a337e68fadcb5f74d0103f40ec6fec0f6413a1a7 |
|
MD5 | bcf748028131f3a956910d18f13ca9c4 |
|
BLAKE2b-256 | 050a5135585f53c4692201f286432b66b9b06c90569d877934eebfe97c8b7145 |
File details
Details for the file corpuscula-1.0.56-py3-none-any.whl
.
File metadata
- Download URL: corpuscula-1.0.56-py3-none-any.whl
- Upload date:
- Size: 32.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.1.0 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.8.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 449cdc9f856db31985f61eca50aef40653ebcd4586057ecff1045087c78e956d |
|
MD5 | 92c7bd272becfbdf9ed021d905ca2b06 |
|
BLAKE2b-256 | be33667c397e98e76ebf6c6bb4b6d0a33f1d6bc499793ded77cd9b1a7a771a9c |