Skip to main content

Python library for concurrent text preprocessing

Project description

contextpro

pipeline status coverage report License

contextpro is a Python library for concurrent text preprocessing using functions from some well-known NLP packages including NLTK, spaCy and TextBlob.

Installation

Windows / OS X / Linux:

  • Installation with pip

    pip install contextpro
    python -m spacy download en_core_web_sm
    
  • Installation with poetry

    poetry add contextpro
    python -m spacy download en_core_web_sm
    

Configuration

  • Before using the package, execute the below commands in your virtual environment:

    import nltk
    
    nltk.download("punkt")
    nltk.download("stopwords")
    nltk.download("wordnet")
    

Usage examples

from contextpro.normalization import batch_replace_contractions

corpus = [
    "I don't want to be rude, but you shouldn't do this",
    "Do you think he'll pass his driving test?",
    "I'll see you next week",
    "I'm going for a walk"
]

batch_replace_contractions(corpus)

[
    "I do not want to be rude, but you should not do this",
    "Do you think he will pass his driving test?",
    "I will see you next week",
    "I am going for a walk",
]
from contextpro.normalization import batch_remove_stopwords

corpus = [
    ['My', 'name', 'is', 'Dr', 'Jekyll'],
    ['His', 'name', 'is', 'Mr', 'Hyde'],
    ['This', 'guy', 's', 'name', 'is', 'Edward', 'Scissorhands'],
    ['And', 'this', 'is', 'Tom', 'Parker']
]

batch_remove_stopwords(corpus)

[
    ['My', 'name', 'Dr', 'Jekyll'],
    ['His', 'name', 'Mr', 'Hyde'],
    ['This', 'guy', 'name', 'Edward', 'Scissorhands'],
    ['And', 'Tom', 'Parker']
]
from contextpro.normalization import batch_lemmatize

corpus =  [
    ["I", "like", "driving", "a", "car"],
    ["I", "am", "going", "for", "a", "walk"],
    ["What", "are", "you", "doing"],
    ["Where", "are", "you", "coming", "from"]
]

batch_lemmatize(corpus, num_workers=2, pos="v")

[
    ['I', 'like', 'drive', 'a', 'car'],
    ['I', 'be', 'go', 'for', 'a', 'walk'],
    ['What', 'be', 'you', 'do'],
    ['Where', 'be', 'you', 'come', 'from']
]
from contextpro.normalization import batch_convert_numerals_to_numbers

corpus = [
    "A bunch of five",
    "A picture is worth a thousand words",
    "A stitch in time saves nine",
    "Back to square one",
    "Behind the eight ball",
    "Between two stools",
]

batch_convert_numerals_to_numbers(corpus, num_workers=2)

[
    'A bunch of 5',
    'A picture is worth a 1000 words',
    'A stitch in time saves 9',
    'Back to square 1',
    'Behind the 8 ball',
    'Between 2 stools',
]
from contextpro.statistics import batch_calculate_corpus_statistics

corpus = [
    "My name is Dr. Jekyll.",
    "His name is Mr. Hyde",
    "This guy's name is Edward Scissorhands",
    "And this is Tom Parker"
]

batch_calculate_corpus_statistics(
    corpus,
    lowercase=False,
    remove_stopwords=False,
    num_workers=2,
)

    characters  tokens  punctuation_characters  digits  whitespace_characters  \
0          22       5                       2       0                      4
1          20       5                       1       0                      4
2          38       7                       1       0                      5
3          22       5                       0       0                      4

        ascii_characters  sentiment_score  subjectivity_score
0                22              0.0                 0.0
1                20              0.0                 0.0
2                38              0.0                 0.0
3                22              0.0                 0.0

Release History

Meta

Łukasz Zawieska – zawieskal@yahoo.com

Gitlab account

Github account

Distributed under the MIT license. See LICENSE for more information.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

contextpro-2.0.1.tar.gz (12.9 kB view details)

Uploaded Source

Built Distribution

contextpro-2.0.1-py3-none-any.whl (14.2 kB view details)

Uploaded Python 3

File details

Details for the file contextpro-2.0.1.tar.gz.

File metadata

  • Download URL: contextpro-2.0.1.tar.gz
  • Upload date:
  • Size: 12.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.5 CPython/3.8.10 Linux/5.8.0-59-generic

File hashes

Hashes for contextpro-2.0.1.tar.gz
Algorithm Hash digest
SHA256 a4e201e894318d198f79d31744af149772b1fa3e20c8c9a7f6a08a00a3f6c8bc
MD5 9cc7e887fb8d3a282c3d841b12baceb3
BLAKE2b-256 20362180fe67510c481ef625d1f38c3ec452615af2ff5b6646ab5e55d5f93def

See more details on using hashes here.

File details

Details for the file contextpro-2.0.1-py3-none-any.whl.

File metadata

  • Download URL: contextpro-2.0.1-py3-none-any.whl
  • Upload date:
  • Size: 14.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.5 CPython/3.8.10 Linux/5.8.0-59-generic

File hashes

Hashes for contextpro-2.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 d5f19180e62ff99c0b03caedc1e559d6fe3eceb555ecef1c1543e048d61c018f
MD5 cc7cf5e05d5772e148f86073da7ae96b
BLAKE2b-256 1c30c880c3bd13ec8de3775f6df7c4e1a432d7eb08a4f5a3480fc6f81aac48d7

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page