Skip to main content

Easy-to-use NLP toolbox

Project description

Reason

License PyPI Downloads Lines of Code Activity

Python easy-to-use natural language processing toolbox with powerful integrated machine learning packages.

Packages

  • classify
    Naive bayes classifier
  • cluster
    Kmeans++ and DBSCAN clusterer, elbow method
  • metrics
    Confusion matrix, accuracy
  • tag
    POS tagger, regex, lookup and default tagging tools
  • tokenize
    Regex word and sentence tokenizer
  • stem
    Porter and regex stemmer
  • analysis
    Frequency distribution
  • util
    Bigrams, trigrams and ngrams

Install

Install latest stable version using pip:

pip install reason

Quick Start

Classification:

>>> from reason.classify import NaiveBayesClassifier
>>> classifier = NaiveBayesClassifier()
>>> classifier.fit(x, y)
>>> y_pred = classifier.predict(new_data)

>>> from reason.metrics import accuracy
>>> accuracy(y_true, y_pred)
0.9358

Clustering:

>>> from reason.cluster import KMeansClusterer
>>> from reason.cluster import elbow_method
>>> elbow_method(x, clusterer=KMeansClusterer, max_k=10)
5

>>> clusterer = KMeansClusterer()
>>> labels = clusterer.fit(x, k=5)
>>> pred = clusterer.predict(new_data)

>>> from reason.cluster import DBSCAN
>>> clusterer = DBSCAN()
>>> labels = clusterer.fit(x, eps=0.21)

Confusion matrix:

>>> from reason.metrics import ConfusionMatrix
>>> cm = ConfusionMatrix(y_true, y_pred)

>>> cm
68 21 13
16 70 11
14 10 77

>>> cm[actual, predicted]
16

>>> from reason.metrics import BinaryConfusionMatrix
>>> bcm = BinaryConfusionMatrix(b_y_true, b_y_pred)

>>> bcm.precision()
0.7837
>>> bcm.recall()
0.8055
>>> bcm.f1_score()
0.7944

Part-of-speech tagging:

>>> from reason.tag import POSTagger

>>> text = "10 tools from the file"
>>> tagger = POSTagger()
>>> tagger.tag(text)
[('10', 'CD'), ('tools', 'NNS'), ('from', 'IN'), ('the', 'AT'), ('file', 'NN')]

Word tokenization:

>>> from reason.tokenize import word_tokenize

>>> text = "Testing reason0.1.0, (on: 127.0.0.1). Cool stuff..."
>>> word_tokenize(text, 'alphanumeric')
['Testing', 'reason0.1.0', 'on', '127.0.0.1', 'Cool', 'stuff']

Sentence tokenization:

>>> from reason.tokenize import sent_tokenize

>>> text = "Hey, what's up? I love using Reason library!"
>>> sents = sent_tokenize(text)
>>> for sent in sents:
...     print(sent)
Hey, what's up?
I love using Reason library!

Lemmatization:

>>> from reason.stem import PorterStemmer

>>> text = "watched birds flying"
>>> stemmer = PorterStemmer()
>>> stemmer.stem(text)
['watch', 'bird', 'fly']

>>> from reason.stem import regex_stem

>>> regex_pattern = r'^(.*?)(ous)?$'
>>> regex_stem('dangerous', regex_pattern)
danger

Preprocess text (tokenizing + stemming):

>>> from reason import preprocess

>>> text = "What's up? I love using Reason library!"
>>> preprocess(text)
[["what's", 'up', '?'], ['i', 'love', 'us', 'reason', 'librari', '!']]

Frequency distribution:

>>> from reason.analysis import FreqDist

>>> words = ['hey', 'hey', 'oh', 'oh', 'oh', 'yeah']
>>> fd = FreqDist(words)

>>> fd
Frequency Distribution
Most-Common: [('oh', 3), ('hey', 2), ('yeah', 1)]
>>> fd.most_common(2)
[('oh', 3), ('hey', 2)]
>>> fd['yeah']
1

N-grams:

>>> sent = "Reason is easy to use"

>>> from reason.util import bigrams
>>> bigrams(sent)
[('Reason', 'is'), ('is', 'easy'), ('easy', 'to'), ('to', 'use')]

>>> from reason.util import trigrams
>>> trigrams(sent)
[('Reason', 'is', 'easy'), ('is', 'easy', 'to'), ('easy', 'to', 'use')]

>>> from reason.util import ngrams
>>> ngrams(sent, 4)
[('Reason', 'is', 'easy', 'to'), ('is', 'easy', 'to', 'use')]

Dependencies

  • NumPy
    Used to handle data
  • Pandas
    Used in classify and cluster packages

Keep in mind NumPy will be automatically installed with Reason.

License

MIT -- See LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

reason-0.6.1.tar.gz (238.3 kB view details)

Uploaded Source

Built Distribution

reason-0.6.1-py3-none-any.whl (264.0 kB view details)

Uploaded Python 3

File details

Details for the file reason-0.6.1.tar.gz.

File metadata

  • Download URL: reason-0.6.1.tar.gz
  • Upload date:
  • Size: 238.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.23.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.7.4

File hashes

Hashes for reason-0.6.1.tar.gz
Algorithm Hash digest
SHA256 74eac3b348d31444a4bab2cda9dc0dfb61a9fd84bf86e34bf2d144568013b844
MD5 8e0ae427369dd21e7607b567a1988404
BLAKE2b-256 ee0b16e34dc964256470b6fa8ff102c5dc49f5440c2556dbdcd5ff6c2cf8c313

See more details on using hashes here.

File details

Details for the file reason-0.6.1-py3-none-any.whl.

File metadata

  • Download URL: reason-0.6.1-py3-none-any.whl
  • Upload date:
  • Size: 264.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.23.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.7.4

File hashes

Hashes for reason-0.6.1-py3-none-any.whl
Algorithm Hash digest
SHA256 56383296b347c79804bf4143af1cebd9620f0c675c63d7fcff9ab3dbf2a44fd1
MD5 c9b813d20d48c2f0f507dbbd231652aa
BLAKE2b-256 314914fadfcc12dcbe4b0924e3deba8170a5851be4d2fbd71a9e5853befb240d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page