Skip to main content

Basic computational linguistics and natural language processing in Python

Project description

text_analytics

Basic computational linguistics and natural language processing in Python.

pip install textanalytics

pip install git+https://github.com/jonathandunn/text_analytics.git

This package provides code to support introductory courses in computational linguistics or natural language processing. These courses are available free on edX:

Introduction to Text Analytics and Natural Language Processing with Python

Visualizing Text Analytics and Natural Language Processing with Python

Usage

from text_analytics import TextAnalytics

ai = TextAnalytics()

Getting features

style, vocab_size = ai.get_features(df, features="style")

style = Function word n-grams

sentiment = Positive and negative words

content = Top content words with TD-IDF weighting, PMI for finding phrases, no stop words

constructions = A bag-of-constructions syntactic representation

Using a classifier

ai.shallow_classification(df, label, features="style", cv=False, classifier='svm')

ai.mlp(df, label, features="style", validation_set=False, test_size=0.10)

Unsupervised methods

Topic Models

ai.train_lda(df, n_topics, min_count)
    
topic_df = ai.use_lda(df, labels="Author")

Vector Semantics

ai.train_word2vec(file, min_count, workers)

Document and Word Clusters

cluster_df = ai.cluster(x, y=None, k)

*Nearest document searches

 y_sample, y_closest = ai.linguistic_distance(x, y, sample=1, n=3)

Corpus Descriptions

PMI-based Phrases

ai.fit_phrases(df)

Delta P-based Phrases

association_df = ai.get_association(df, min_count = 1, save_phraser = True)

Basic word frequencies

vocab = ai._get_vocab_list(df, min_count, return_freq = True)

Corpus Comparisons

similarity = ai.get_corpus_similarity(df1, df2)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

textanalytics-1.1-py2.py3-none-any.whl (100.8 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file textanalytics-1.1-py2.py3-none-any.whl.

File metadata

  • Download URL: textanalytics-1.1-py2.py3-none-any.whl
  • Upload date:
  • Size: 100.8 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.0 CPython/3.8.5

File hashes

Hashes for textanalytics-1.1-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 72e04587d4599eda7166304872ad8aecadea8f5a1b8a28cf63831654b723de3b
MD5 1682375ffae5aab0bce110f2e748a8df
BLAKE2b-256 e18062596bd3b6b8ed4230510935a34c70b45a9b18b144cc60674dedfb8854fa

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page