Skip to main content

Easy-to-use NLP toolbox

Project description

Reason

License PyPI Downloads Lines of Code Activity

Python easy-to-use natural language processing toolbox.

Packages

  • classify
    Naive bayes classifier
  • cluster
    Kmeans++ clusterer
  • metrics
    Confusion matrix, accuracy
  • tag
    POS tagger, regex, lookup and default tagging tools
  • tokenize
    Regex word and sentence tokenizer
  • stem
    Porter and regex stemmer
  • analysis
    Frequency distribution
  • util
    Bigrams, trigrams and ngrams

Install

Install latest stable version using pip:

pip install reason

Quick Start

Classification:

>>> from reason.classify import NaiveBayesClassifier
>>> classifier = NaiveBayesClassifier()
>>> classifier.fit(x, y)
>>> y_pred = classifier.predict(new_data)

>>> from reason.metrics import accuracy
>>> accuracy(y_true, y_pred)
0.9358

Clustering:

>>> from reason.cluster import KMeansClusterer
>>> clusterer = KMeansClusterer()
>>> clusters = clusterer.fit(x, k=2)
>>> pred = clusterer.predict(new_data)

Confusion matrix:

>>> from reason.metrics import ConfusionMatrix
>>> cm = ConfusionMatrix(y_true, y_pred)

>>> cm
68 21 13
16 70 11
14 10 77

>>> cm[actual, predicted]
16

>>> from reason.metrics import BinaryConfusionMatrix
>>> bcm = BinaryConfusionMatrix(b_y_true, b_y_pred)

>>> bcm.precision()
0.7837
>>> bcm.recall()
0.8055
>>> bcm.f1_score()
0.7944

Part-of-speech tagging:

>>> from reason.tag import POSTagger

>>> text = "10 tools from the file"
>>> tagger = POSTagger()
>>> tagger.tag(text)
[('10', 'CD'), ('tools', 'NNS'), ('from', 'IN'), ('the', 'AT'), ('file', 'NN')]

Word tokenization:

>>> from reason.tokenize import word_tokenize

>>> text = "Testing reason0.1.0, (on: 127.0.0.1). Cool stuff..."
>>> word_tokenize(text, 'alphanumeric')
['Testing', 'reason0.1.0', 'on', '127.0.0.1', 'Cool', 'stuff']

Sentence tokenization:

>>> from reason.tokenize import sent_tokenize

>>> text = "Hey, what's up? I love using Reason library!"
>>> sents = sent_tokenize(text)
>>> for sent in sents:
...     print(sent)
Hey, what's up?
I love using Reason library!

Lemmatization:

>>> from reason.stem import PorterStemmer

>>> text = "watched birds flying"
>>> stemmer = PorterStemmer()
>>> stemmer.stem(text)
['watch', 'bird', 'fly']

>>> from reason.stem import regex_stem

>>> regex_pattern = r'^(.*?)(ous)?$'
>>> regex_stem('dangerous', regex_pattern)
danger

Preprocess text (tokenizing + stemming):

>>> from reason import preprocess

>>> text = "What's up? I love using Reason library!"
>>> preprocess(text)
[["what's", 'up', '?'], ['i', 'love', 'us', 'reason', 'librari', '!']]

Frequency distribution:

>>> from reason.analysis import FreqDist

>>> words = ['hey', 'hey', 'oh', 'oh', 'oh', 'yeah']
>>> fd = FreqDist(words)

>>> fd
Frequency Distribution
Most-Common: [('oh', 3), ('hey', 2), ('yeah', 1)]
>>> fd.most_common(2)
[('oh', 3), ('hey', 2)]
>>> fd['yeah']
1

N-grams:

>>> sent = "Reason is easy to use"

>>> from reason.util import bigrams
>>> bigrams(sent)
[('Reason', 'is'), ('is', 'easy'), ('easy', 'to'), ('to', 'use')]

>>> from reason.util import trigrams
>>> trigrams(sent)
[('Reason', 'is', 'easy'), ('is', 'easy', 'to'), ('easy', 'to', 'use')]

>>> from reason.util import ngrams
>>> ngrams(sent, 4)
[('Reason', 'is', 'easy', 'to'), ('is', 'easy', 'to', 'use')]

Dependencies

  • NumPy
    Used to handle data
  • Pandas
    Used in classify and cluster packages

Keep in mind NumPy will be automatically installed with Reason.

License

MIT -- See LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

reason-0.6.0.tar.gz (237.1 kB view details)

Uploaded Source

Built Distribution

reason-0.6.0-py3-none-any.whl (261.9 kB view details)

Uploaded Python 3

File details

Details for the file reason-0.6.0.tar.gz.

File metadata

  • Download URL: reason-0.6.0.tar.gz
  • Upload date:
  • Size: 237.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.23.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.7.4

File hashes

Hashes for reason-0.6.0.tar.gz
Algorithm Hash digest
SHA256 408cf464dd0f11519f03d424e430bccede607549e3c7da19e5e02e520cc83c8b
MD5 49795ec4d6c628aa9cfa5737232ed67c
BLAKE2b-256 0b52964ab3f22c90becf883943dd5e4ee39e11c0cac1fef8bf88eda29b912a9e

See more details on using hashes here.

File details

Details for the file reason-0.6.0-py3-none-any.whl.

File metadata

  • Download URL: reason-0.6.0-py3-none-any.whl
  • Upload date:
  • Size: 261.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.23.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.7.4

File hashes

Hashes for reason-0.6.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f1121aff9c299177565a415f95d703f43c458da0cb8eb510258957f6eddc5660
MD5 739e3f4be517af49167c551c762c10ce
BLAKE2b-256 93c222215f6e4826c27621560a04a2f6e7b318daeaa581d46037dd580d71b0d3

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page