Skip to main content

Basic computational linguistics and natural language processing in Python

Project description

text_analytics

Basic computational linguistics and natural language processing in Python.

pip install textanalytics

pip install git+https://github.com/jonathandunn/text_analytics.git

This package provides code to support introductory courses in computational linguistics or natural language processing. These courses are available free on edX:

Introduction to Text Analytics and Natural Language Processing with Python

Visualizing Text Analytics and Natural Language Processing with Python

Usage

from text_analytics import TextAnalytics

ai = TextAnalytics()

Getting features

style, vocab_size = ai.get_features(df, features="style")

style = Function word n-grams

sentiment = Positive and negative words

content = Top content words with TD-IDF weighting, PMI for finding phrases, no stop words

constructions = A bag-of-constructions syntactic representation

Using a classifier

ai.shallow_classification(df, label, features="style", cv=False, classifier='svm')

ai.mlp(df, label, features="style", validation_set=False, test_size=0.10)

Unsupervised methods

Topic Models

ai.train_lda(df, n_topics, min_count)
    
topic_df = ai.use_lda(df, labels="Author")

Vector Semantics

ai.train_word2vec(file, min_count, workers)

Document and Word Clusters

cluster_df = ai.cluster(x, y=None, k)

*Nearest document searches

 y_sample, y_closest = ai.linguistic_distance(x, y, sample=1, n=3)

Corpus Descriptions

PMI-based Phrases

ai.fit_phrases(df)

Delta P-based Phrases

association_df = ai.get_association(df, min_count = 1, save_phraser = True)

Basic word frequencies

vocab = ai._get_vocab_list(df, min_count, return_freq = True)

Corpus Comparisons

similarity = ai.get_corpus_similarity(df1, df2)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

textanalytics-1.1-py2.py3-none-any.whl (100.8 kB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page