Skip to main content

Python support for 'The Art and Science of Data Analytics'

Project description

AdvancedAnalytics

A collection of python modules, classes and methods for simplifying building machine learning solutions. This was developed to simplify learning python, and it accompanies the book The Art and Science of Data Analytics.

Description

Machine learning applications progress through three stages:

  1. Data Preprocessing

  2. Modeling or Analytics

  3. Postprocessing

The classes and methods in AdvancedAnalytics primarily support the first and last stages of machine learning applications.

Surprisingly data scientists report they typically spend 80% of their total effort in data preprocessing and postprocessing. The first stage is concerned with preparing the data for analysis.

  1. identifying and correcting outliers,

  2. imputing missing values, and

  3. encoding data.

The last stage, solution postprocessing, involves displaying and graphing solution summaries as well as metrics and graphics used to evaluate the quality of the solution.

Usage

Currently the most popular usage is for supporting solutions developed using these popular machine learning packages:

  • Sci-Learn

  • StatsModels

  • NLTK

Current Modules and Classes

ReplaceImputeEncode
Classes for Data Preprocessing
  • DT defines new data types used in the data dictionary

  • ReplaceImputeEncode a class for data preprocessing

Regression
Classes for Linear and Logistic Regression
  • linreg support for linear regressino

  • logreg support for logistic regression

  • stepwise a variable selection class

Tree
Classes for Decision Tree Solutions
  • tree_regressor support for regressor decision trees

  • tree_classifier support for classification decision trees

Forest
Classes for Random Forests
  • forest_regressor support for regressor random forests

  • forest_classifier support for classification random forests

NeuralNetwork
Classes for Neural Networks
  • nn_regressor support for regressor neural networks

  • nn_classifier support for classification neural networks

TextAnalytics
Classes for Text Analytics
  • text_analysis support for topic analysis

  • sentiment_analysis support for sentiment analysis

Internet
Classes for Internet Applications
  • scrape support for web scrapping

  • metrics a class for solution metrics

Documentation and Examples

The API and documentation for all classes and examples are available at https://github.com/tandonneur/AdvancedAnalytics .

Installation and Dependencies

AdvancedAnalytics is designed to work on any operating system running python 3. It can be installed using pip or conda.

pip install AdvancedAnalytics
# or
conda install -c conda-forge AdvancedAnalytics
General Dependencies

There are dependencies. Most classes import one or more modules from Sci-Learn, referenced as sklearn in module imports, and StatsModels. These are both installed in with current versions of anaconda, a popular application for coding python solutions.

Decision Tree and Random Forest Dependencies

The Tree and Forest modules plot decision trees and importance metrics using pydotplus and the graphviz packages. If these are not installed and you are planning to use the Tree or Forest modules, they can be installed using the following code.

conda install -c conda-forge pydotplus
conda install -c conda-forge graphviz
pip install graphviz

One note, the second conda install does not complete the install of the graphviz package. To complete the graphviz install, it is necessary to run the pip install after the conda graphviz install.

Text Analytics Dependencies

The TextAnalytics module is based on the NLTK and Sci-Learn text analytics packages. They are both installed with the current version of anaconda.

However, TextAnalytics includes options to produce word clouds, which are graphic displays of the word collections associated with topic or data clusters. The wordcloud package is used to produce these graphs. If you are using the TextAnalytics module you can install the wordcloud package with the following code.

conda install -c conda-forge wordcloud

In addition, data used by the NLTK package is not automatically installed with this package. These data include the text dictionary and other data tables.

The following nltk.download commands should be run before using TextAnalytics. However, it is only necessary to run these once to download and install the data NLTK uses for text analytics.

#The following NLTK commands should be run once to
#download and install NLTK data.
nltk.download(?punkt?)
nltk.download(?averaged_preceptron_tagger?)
nltk.download(?stopwords?)
nltk.download(?wordnet?)

Code of Conduct

Everyone interacting in the AdvancedAnalytics project’s codebases, issue trackers, chat rooms, and mailing lists is expected to follow the PyPA Code of Conduct: https://www.pypa.io/en/latest/code-of-conduct/ .

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

AdvancedAnalytics-0.3.0.tar.gz (53.1 kB view details)

Uploaded Source

Built Distribution

AdvancedAnalytics-0.3.0-py3-none-any.whl (56.9 kB view details)

Uploaded Python 3

File details

Details for the file AdvancedAnalytics-0.3.0.tar.gz.

File metadata

  • Download URL: AdvancedAnalytics-0.3.0.tar.gz
  • Upload date:
  • Size: 53.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.3

File hashes

Hashes for AdvancedAnalytics-0.3.0.tar.gz
Algorithm Hash digest
SHA256 9fd6efd56b98b2993e9d2e5575cea078efecaba704ddc822c66f6e5353ceb59b
MD5 c05080cf0b4b89670a4087c6f1166999
BLAKE2b-256 3ef9153dffb0c7e6cbf22db5ecd900d274ff2bab66224d7801798571b5c7aa25

See more details on using hashes here.

File details

Details for the file AdvancedAnalytics-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: AdvancedAnalytics-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 56.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.3

File hashes

Hashes for AdvancedAnalytics-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7dc13daa1793cad2c2c1e88b3c8e42d8aed6c366a4e930f03e10fabe5ce07e32
MD5 791621f5492b2b37f0f6528b845b3edf
BLAKE2b-256 c08268e2437b203ef31b22d3446233c56609e8cd7f19d9662fd993ab7d76f1da

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page