Easy-to-use NLP toolbox
Project description
Reason
Python easy-to-use natural language processing toolbox.
Packages
- classify
Naive bayes classifier - metrics
Confusion matrix, accuracy - tokenize
Regex word and sentence tokenizer - stem
Porter and regex stemmer - analysis
Frequency distribution - util
Bigrams, trigrams and Ngrams
Install
Install latest stable version using pip:
pip install reason
Quick Start
Classification:
>>> from reason.classify import NaiveBayesClassifier
>>> classifier = NaiveBayesClassifier(train_set)
>>> y_pred = classifier.classify(new_data)
>>> from reason.metrics import accuracy
>>> accuracy(y_true, y_pred)
0.9358
Confusion Matrix:
>>> from reason.metrics import ConfusionMatrix
>>> cm = ConfusionMatrix(y_true, y_pred)
>>> cm
68 21 13
16 70 11
14 10 77
>>> cm[actual, predicted]
16
>>> from reason.metrics import BinaryConfusionMatrix
>>> bcm = BinaryConfusionMatrix(b_y_true, b_y_pred)
>>> bcm.precision()
0.7837
>>> bcm.recall()
0.8055
>>> bcm.f1_score()
0.7944
Word Tokenization:
>>> from reason.tokenize import word_tokenize
>>> text = "Testing reason0.1.0, (on: 127.0.0.1). Cool stuff..."
>>> word_tokenize(text, 'alphanumeric')
['Testing', 'reason0.1.0', 'on', '127.0.0.1', 'Cool', 'stuff']
Sentence Tokenization:
>>> from reason.tokenize import sent_tokenize
>>> text = "Hey, what's up? I love using Reason library!"
>>> sents = sent_tokenize(text)
>>> for sent in sents:
... print(sent)
Hey, what's up?
I love using Reason library!
Word Stems:
>>> from reason.stem import PorterStemmer
>>> text = 'watched birds flying'
>>> stemmer = PorterStemmer()
>>> stemmer.stem(text)
['watch', 'bird', 'fly']
>>> from reason.stem import regex_stem
>>> regex_pattern = r'^(.*?)(ous)?$'
>>> regex_stem('dangerous', regex_pattern)
danger
Preprocess Text (Tokenizing + Stemming):
>>> from reason import preprocess
>>> text = "What's up? I love using Reason library!"
>>> preprocess(text)
[["what's", 'up', '?'], ['i', 'love', 'us', 'reason', 'librari', '!']]
Frequency Distribution:
>>> from reason.analysis import FreqDist
>>> words = ['hey', 'hey', 'oh', 'oh', 'oh', 'yeah']
>>> fd = FreqDist(words)
>>> fd
Frequency Distribution
Most-Common: [('oh', 3), ('hey', 2), ('yeah', 1)]
>>> fd.most_common(2)
[('oh', 3), ('hey', 2)]
>>> fd['yeah']
1
Ngrams:
>>> sent = 'Reason is easy to use'
>>> from reason.util import bigrams
>>> bigrams(sent)
[('Reason', 'is'), ('is', 'easy'), ('easy', 'to'), ('to', 'use')]
>>> from reason.util import trigrams
>>> trigrams(sent)
[('Reason', 'is', 'easy'), ('is', 'easy', 'to'), ('easy', 'to', 'use')]
>>> from reason.util import ngrams
>>> ngrams(sent, 4)
[('Reason', 'is', 'easy', 'to'), ('is', 'easy', 'to', 'use')]
Dependencies
Keep in mind NumPy will be automatically installed with Reason.
License
MIT -- See LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
reason-0.4.0.tar.gz
(227.9 kB
view details)
Built Distribution
reason-0.4.0-py3-none-any.whl
(240.9 kB
view details)
File details
Details for the file reason-0.4.0.tar.gz
.
File metadata
- Download URL: reason-0.4.0.tar.gz
- Upload date:
- Size: 227.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.23.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.7.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ce48c11f0682417569c0285959e36610fdee222ac18efabc86f4f9aae6a7ba6b |
|
MD5 | 683381e57151c1249f87fd0cca19c3a8 |
|
BLAKE2b-256 | e87abf73437085e0060717bdca2e1846005f54acecdc6d17a143ff936d071aa5 |
File details
Details for the file reason-0.4.0-py3-none-any.whl
.
File metadata
- Download URL: reason-0.4.0-py3-none-any.whl
- Upload date:
- Size: 240.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.23.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.7.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 938e33217d1ac158f7258ded1ae24e8817abea7a3d2a4f16b758981a12535a16 |
|
MD5 | f82dfc8a883f478097dfb4bed9b96706 |
|
BLAKE2b-256 | 1a05dc1fc6259a09ac785073fec3dd45323fb3399db3afad14d5afa420533878 |