saniti

Sanitise text while keeping your sanity

These details have not been verified by PyPI

Project links

Homepage

Project description

# Saniti

Sanitise lists of text documents quickly, easily and whilst maintaining your sanity

The aim was to streamline processing lists of documents into the same outputs into simply specifying the list of texts and defining the sanitization pipeline.

### Usage:

As a function-ish

>original_text = [“I like to moves it, move its”, “I likeing to move it!”, “the of”]

>text = saniti(original_text, [“token”, “destop”, “depunct”, “unempty”, “stem”, “out_corp_dict”]) #sanitise the text while initalising the class

>print(text.text)

As a class

>sani1 = saniti() # initialise the santising class

>text = sani1.process(original_text, [“token”, “destop”, “depunct”, “unempty”, “lemma”, “out_tag_doc”]) # sanitise the text

>print(text)

## Pipeline Components

“token” - tokenise texts
“depunct” - remove punctuation
“unempty” - remove empty words within documents
“lemma” - lemmatize text
“destop” - remove stopwords
“stem” - stem texts
“out_tag_doc” - turns the texts into gensim tagged documents for Doc2Vec
“out_corp_dict” - turns the texts into gensim corpus and dictionary

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.1.51

Jun 19, 2018

0.1.44

Jun 18, 2018

0.1.43

Jun 17, 2018

This version

0.1.42

Jun 17, 2018

0.1.41

Jun 17, 2018

0.1.22

Jun 1, 2018

0.1.21

Jun 1, 2018

0.1.4

Jun 13, 2018

0.1.3

Jun 3, 2018

0.1.2

Jun 1, 2018

0.1.1

May 31, 2018

0.1.0

May 31, 2018

0.0.13

Jun 18, 2018

0.0.12

Jun 18, 2018

0.0.11

Jun 18, 2018

0.0.1

Jun 18, 2018

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

saniti-0.1.42.tar.gz (3.0 kB view hashes)

Uploaded Jun 17, 2018 Source

Hashes for saniti-0.1.42.tar.gz

Hashes for saniti-0.1.42.tar.gz
Algorithm	Hash digest
SHA256	`e9ba75315230100e07eb561bda705d2919f251432f404264a57cb402b2ee9c9d`
MD5	`f00607096b7c14a571d4fdb99e32521c`
BLAKE2b-256	`b8fc71f6b5ce72ebe250f8409a6f7f6691b1c6da4c547020af6a0eae2cb38aa8`