toxine

Tiny preprocessor for Russian text

These details have not been verified by PyPI

Project links

Homepage

Project description

RuMor: Russian Morphology project

Toxine: a tiny python NLP library for Russian text preprocessing

A part of RuMor project. It contains pipeline for preprocessing and tokenization texts in Russian. Also, it includes preliminary entity tagging. Highlights are:

Extracting emojis, emails, dates, phones, urls, html/xml fragments etc.
Tagging/removing tokens with unallowed symbols
Normalizing punctuation
Tokenization (via NLTK)
Russan Wikipedia tokenizer
brat annotations support

Installation

pip

Toxine supports Python 3.5 or later. To install it via pip, run:

$ pip install toxine

If you currently have a previous version of Toxine installed, use:

$ pip install toxine -U

From Source

Alternatively, you can also install Toxine from source of this git repository:

$ git clone https://github.com/fostroll/toxine.git
$ cd toxine
$ pip install -e .

This gives you access to examples that are not included to the PyPI package.

Setup

Toxine uses NLTK with punkt data downloaded. If you didn't do it yet, start Python interpreter and execute:

>>> import nltk
>>> nltk.download('punkt')

NB: If you plan to use methods for brat annotations renewal, you need to install the python-Levenshtein library. See more on the brat annotations support page.

Usage

Text Preprocessor

Wrapper for tokenized Wikipedia

brat annotations support

Examples

You can find them in the directory examples of our Toxine github repository.

License

Toxine is released under the BSD License. See the LICENSE file for more details.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

1.0.52

Aug 30, 2021

1.0.51

May 22, 2021

1.0.50

May 16, 2021

1.0.49

May 13, 2021

1.0.48

Apr 26, 2021

1.0.47

Feb 15, 2021

1.0.46

Feb 14, 2021

1.0.45

Feb 14, 2021

1.0.44

Feb 13, 2021

1.0.42

Feb 12, 2021

1.0.41

Feb 12, 2021

1.0.40

Feb 7, 2021

1.0.39

Feb 6, 2021

1.0.37

Feb 4, 2021

1.0.36

Feb 4, 2021

1.0.35

Feb 4, 2021

1.0.34

Feb 4, 2021

1.0.33

Feb 4, 2021

1.0.32

Feb 3, 2021

1.0.31

Jan 23, 2021

1.0.30

Jan 16, 2021

1.0.29

Jan 7, 2021

1.0.28

Jan 6, 2021

1.0.27

Jan 6, 2021

1.0.26

Jan 1, 2021

1.0.25

Dec 26, 2020

1.0.24

Dec 26, 2020

1.0.23

Dec 26, 2020

1.0.22

Dec 26, 2020

1.0.21

Dec 25, 2020

1.0.20

Dec 25, 2020

1.0.19

Dec 25, 2020

1.0.18

Dec 18, 2020

1.0.17

Dec 4, 2020

1.0.16

Nov 2, 2020

1.0.15

Nov 2, 2020

1.0.14

Oct 27, 2020

1.0.13

Oct 14, 2020

1.0.12

Sep 9, 2020

1.0.9

Aug 12, 2020

1.0.8

Jul 27, 2020

1.0.7

Jun 10, 2020

1.0.6

Apr 20, 2020

1.0.5

Apr 20, 2020

1.0.4

Apr 13, 2020

1.0.3

Apr 13, 2020

1.0.2

Apr 13, 2020

1.0.1

Apr 13, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

toxine-1.0.52.tar.gz (29.5 kB view details)

Uploaded Aug 30, 2021 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

toxine-1.0.52-py3-none-any.whl (29.5 kB view details)

Uploaded Aug 30, 2021 Python 3

File details

Details for the file toxine-1.0.52.tar.gz.

File metadata

Download URL: toxine-1.0.52.tar.gz
Upload date: Aug 30, 2021
Size: 29.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.1.0 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.8.5

File hashes

Hashes for toxine-1.0.52.tar.gz
Algorithm	Hash digest
SHA256	`279250ac63c4fe41dab749bbf4a9d9d288c1b40ad89f8a8ad24bfd5ae64fe47d`
MD5	`d928ceea9ed8c5418388ede0ef27e861`
BLAKE2b-256	`0b43bcdac13c164940689d6ef8965e249ed0bda4e0ed233dba8eafcbc45a1ef5`

See more details on using hashes here.

File details

Details for the file toxine-1.0.52-py3-none-any.whl.

File metadata

Download URL: toxine-1.0.52-py3-none-any.whl
Upload date: Aug 30, 2021
Size: 29.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.1.0 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.8.5

File hashes

Hashes for toxine-1.0.52-py3-none-any.whl
Algorithm	Hash digest
SHA256	`773de3753814dbc43d5c4ac8be956c9ae5844a497f345fe675afc890c14e77a3`
MD5	`52120115a4a3df17e64ce80d72a38600`
BLAKE2b-256	`49e1b4720450edca8c5fe352fa499160d1965c10d1885dc6716c1afdf02436bd`

See more details on using hashes here.

toxine 1.0.52

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Toxine: a tiny python NLP library for Russian text preprocessing

Installation

pip

From Source

Setup

Usage

Examples

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes