Easy-to-use NLP toolbox
Project description
Reason
Python easy-to-use natural language processing toolbox with powerful integrated machine learning packages.
Packages
- classify
Naive bayes classifier - cluster
Kmeans++ and DBSCAN clusterer, elbow method - metrics
Confusion matrix, accuracy - tag
POS tagger, regex, lookup and default tagging tools - tokenize
Regex word and sentence tokenizer - stem
Porter and regex stemmer - analysis
Frequency distribution - util
Bigrams, trigrams and ngrams
Install
Install latest stable version using pip:
pip install reason
Quick Start
Classification:
>>> from reason.classify import NaiveBayesClassifier >>> classifier = NaiveBayesClassifier() >>> classifier.fit(x, y) >>> y_pred = classifier.predict(new_data) >>> from reason.metrics import accuracy >>> accuracy(y_true, y_pred) 0.9358
Clustering:
>>> from reason.cluster import KMeansClusterer >>> from reason.cluster import elbow_method >>> elbow_method(x, clusterer=KMeansClusterer, max_k=10) 5 >>> clusterer = KMeansClusterer() >>> labels = clusterer.fit(x, k=5) >>> pred = clusterer.predict(new_data) >>> from reason.cluster import DBSCAN >>> clusterer = DBSCAN() >>> labels = clusterer.fit(x, eps=0.21)
Confusion matrix:
>>> from reason.metrics import ConfusionMatrix >>> cm = ConfusionMatrix(y_true, y_pred) >>> cm 68 21 13 16 70 11 14 10 77 >>> cm[actual, predicted] 16 >>> from reason.metrics import BinaryConfusionMatrix >>> bcm = BinaryConfusionMatrix(b_y_true, b_y_pred) >>> bcm.precision() 0.7837 >>> bcm.recall() 0.8055 >>> bcm.f1_score() 0.7944
Part-of-speech tagging:
>>> from reason.tag import POSTagger >>> text = "10 tools from the file" >>> tagger = POSTagger() >>> tagger.tag(text) [('10', 'CD'), ('tools', 'NNS'), ('from', 'IN'), ('the', 'AT'), ('file', 'NN')]
Word tokenization:
>>> from reason.tokenize import word_tokenize >>> text = "Testing reason0.1.0, (on: 127.0.0.1). Cool stuff..." >>> word_tokenize(text, 'alphanumeric') ['Testing', 'reason0.1.0', 'on', '127.0.0.1', 'Cool', 'stuff']
Sentence tokenization:
>>> from reason.tokenize import sent_tokenize >>> text = "Hey, what's up? I love using Reason library!" >>> sents = sent_tokenize(text) >>> for sent in sents: ... print(sent) Hey, what's up? I love using Reason library!
Lemmatization:
>>> from reason.stem import PorterStemmer >>> text = "watched birds flying" >>> stemmer = PorterStemmer() >>> stemmer.stem(text) ['watch', 'bird', 'fly'] >>> from reason.stem import regex_stem >>> regex_pattern = r'^(.*?)(ous)?$' >>> regex_stem('dangerous', regex_pattern) danger
Preprocess text (tokenizing + stemming):
>>> from reason import preprocess >>> text = "What's up? I love using Reason library!" >>> preprocess(text) [["what's", 'up', '?'], ['i', 'love', 'us', 'reason', 'librari', '!']]
Frequency distribution:
>>> from reason.analysis import FreqDist >>> words = ['hey', 'hey', 'oh', 'oh', 'oh', 'yeah'] >>> fd = FreqDist(words) >>> fd Frequency Distribution Most-Common: [('oh', 3), ('hey', 2), ('yeah', 1)] >>> fd.most_common(2) [('oh', 3), ('hey', 2)] >>> fd['yeah'] 1
N-grams:
>>> sent = "Reason is easy to use" >>> from reason.util import bigrams >>> bigrams(sent) [('Reason', 'is'), ('is', 'easy'), ('easy', 'to'), ('to', 'use')] >>> from reason.util import trigrams >>> trigrams(sent) [('Reason', 'is', 'easy'), ('is', 'easy', 'to'), ('easy', 'to', 'use')] >>> from reason.util import ngrams >>> ngrams(sent, 4) [('Reason', 'is', 'easy', 'to'), ('is', 'easy', 'to', 'use')]
Dependencies
Keep in mind NumPy will be automatically installed with Reason.
License
MIT -- See LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Filename, size | File type | Python version | Upload date | Hashes |
---|---|---|---|---|
Filename, size reason-0.6.1-py3-none-any.whl (264.0 kB) | File type Wheel | Python version py3 | Upload date | Hashes View |
Filename, size reason-0.6.1.tar.gz (238.3 kB) | File type Source | Python version None | Upload date | Hashes View |