AdvancedAnalytics

Python support for 'The Art and Science of Data Analytics'

These details have not been verified by PyPI

Project links

Homepage

Project description

AdvancedAnalytics

A collection of python modules, classes and methods for simplifying the use of machine learning solutions. AdvancedAnalytics provides easy access to advanced tools in Sci-Learn, NLTK and other machine learning packages. AdvancedAnalytics was developed to simplify learning python from the book The Art and Science of Data Analytics.

Description

From a high level view, building machine learning applications typically proceeds through three stages:

Data Preprocessing

Modeling or Analytics

Postprocessing

The classes and methods in AdvancedAnalytics primarily support the first and last stages of machine learning applications.

Data scientists report they spend 80% of their total effort in first and last stages. The first stage, data preprocessing, is concerned with preparing the data for analysis. This includes:

identifying and correcting outliers,

imputing missing values, and

encoding data.

The last stage, solution postprocessing, involves developing graphic summaries of the solution, and metrics for evaluating the quality of the solution.

Documentation and Examples

The API and documentation for all classes and examples are available at https://github.com/tandonneur/AdvancedAnalytics .

Usage

Currently the most popular usage is for supporting solutions developed using these advanced machine learning packages:

Sci-Learn

StatsModels

NLTK

The intention is to expand this list to other packages. This is a simple example for linear regression that uses the data map structure to preprocess data:

from AdvancedAnalytics.ReplaceImputeEncode import DT
from AdvancedAnalytics.ReplaceImputeEncode import ReplaceImputeEncode
from AdvancedAnalytics.Tree import tree_regressor
from sklearn.tree import DecisionTreeRegressor, export_graphviz
# Data Map Using DT, Data Types
data_map = {
    “Salary”:         [DT.Interval, (20000.0, 2000000.0)],
    “Department”:     [DT.Nominal, (“HR”, “Sales”, “Marketing”)]
    “Classification”: [DT.Nominal, (1, 2, 3, 4, 5)]
    “Years”:          [DT.Interval, (18, 60)] }
# Preprocess data from data frame df
rie = ReplaceImputeEncode(data_map=data_map, interval_scaling=None,
                          nominal_encoding= “SAS”, drop=True)
encoded_df = rie.fit_transform(df)
y = encoded_df[“Salary”]
X = encoded_df.drop(“Salary”, axis=1)
dt = DecisionTreeRegressor(criterion= “gini”, max_depth=4
                            min_samples_split=5, min_samples_leaf5)
dt = dt.fit(X,y)
tree_regressor.display_importance(dt, encoded_df.columns)
tree_regressor.display_metrics(dt, X, y)

Current Modules and Classes

ReplaceImputeEncode

Classes for Data Preprocessing

DT defines new data types used in the data dictionary
ReplaceImputeEncode a class for data preprocessing

Regression

Classes for Linear and Logistic Regression

linreg support for linear regressino
logreg support for logistic regression
stepwise a variable selection class

Tree

Classes for Decision Tree Solutions

tree_regressor support for regressor decision trees
tree_classifier support for classification decision trees

Forest

Classes for Random Forests

forest_regressor support for regressor random forests
forest_classifier support for classification random forests

NeuralNetwork

Classes for Neural Networks

nn_regressor support for regressor neural networks
nn_classifier support for classification neural networks

TextAnalytics

Classes for Text Analytics

text_analysis support for topic analysis
sentiment_analysis support for sentiment analysis

Internet

Classes for Internet Applications

scrape support for web scrapping
metrics a class for solution metrics

Installation and Dependencies

AdvancedAnalytics is designed to work on any operating system running python 3. It can be installed using pip or conda.

pip install AdvancedAnalytics
# or
conda install -c conda-forge AdvancedAnalytics

General Dependencies

There are dependencies. Most classes import one or more modules from Sci-Learn, referenced as sklearn in module imports, and StatsModels. These are both installed in with current versions of anaconda, a popular application for coding python solutions.

Decision Tree and Random Forest Dependencies

The Tree and Forest modules plot decision trees and importance metrics using pydotplus and the graphviz packages. If these are not installed and you are planning to use the Tree or Forest modules, they can be installed using the following code.

conda install -c conda-forge pydotplus
conda install -c conda-forge graphviz
pip install graphviz

One note, the second conda install does not complete the install of the graphviz package. To complete the graphviz install, it is necessary to run the pip install after the conda graphviz install.

Text Analytics Dependencies

The TextAnalytics module is based on the NLTK and Sci-Learn text analytics packages. They are both installed with the current version of anaconda.

However, TextAnalytics includes options to produce word clouds, which are graphic displays of the word collections associated with topic or data clusters. The wordcloud package is used to produce these graphs. If you are using the TextAnalytics module you can install the wordcloud package with the following code.

conda install -c conda-forge wordcloud

In addition, data used by the NLTK package is not automatically installed with this package. These data include the text dictionary and other data tables.

The following nltk.download commands should be run before using TextAnalytics. However, it is only necessary to run these once to download and install the data NLTK uses for text analytics.

#The following NLTK commands should be run once to
#download and install NLTK data.
nltk.download(“punkt”)
nltk.download(“averaged_preceptron_tagger”)
nltk.download(“stopwords”)
nltk.download(“wordnet”)

Internet Dependencies

The Internet module is contains a class scrape which has some functions for scraping newsfeeds. Some of these is based on the newspaper3k package. It can be installed using:

conda install -c conda-forge newspaper3k
# or
pip install newpaper3k

Code of Conduct

Everyone interacting in the AdvancedAnalytics project’s codebases, issue trackers, chat rooms, and mailing lists is expected to follow the PyPA Code of Conduct: https://www.pypa.io/en/latest/code-of-conduct/ .

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

1.39

Mar 31, 2023

1.38

Mar 25, 2023

1.37

Mar 25, 2023

1.36

Jul 2, 2022

1.35

Jul 2, 2022

1.34

Jul 23, 2021

1.33

Jul 30, 2020

1.32

Jul 29, 2020

1.31

Jul 29, 2020

1.30

Jul 29, 2020

1.29

Jul 29, 2020

1.28

Jul 27, 2020

1.27

Jul 27, 2020

1.26

Jul 27, 2020

1.25

Jun 27, 2020

1.24

Jun 27, 2020

1.23

Jun 27, 2020

1.22

Jun 27, 2020

1.21

Jun 27, 2020

1.20

Jun 24, 2020

1.19

Jun 18, 2020

1.18

Jun 13, 2020

1.17

Jun 13, 2020

1.16

Jun 11, 2020

1.15

Jun 10, 2020

1.14

Jun 10, 2020

1.13

Jun 6, 2020

1.11

May 24, 2020

1.10

May 24, 2020

1.9

May 23, 2020

1.8

Sep 7, 2019

1.6

Sep 7, 2019

1.5

Aug 30, 2019

1.4

Aug 26, 2019

1.3.0

Aug 26, 2019

1.2.0

Aug 26, 2019

1.1.0

Aug 26, 2019

1.0.0

Aug 26, 2019

0.65

Aug 26, 2019

This version

0.9.0

Aug 26, 2019

0.8.0

Aug 26, 2019

0.7.0

Aug 25, 2019

0.6.0

Aug 25, 2019

0.5.0

Aug 25, 2019

0.4.0

Aug 25, 2019

0.3.0

Aug 24, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

AdvancedAnalytics-0.9.0.tar.gz (54.6 kB view hashes)

Uploaded Aug 26, 2019 Source

Built Distribution

AdvancedAnalytics-0.9.0-py3-none-any.whl (111.1 kB view hashes)

Uploaded Aug 26, 2019 Python 3

Hashes for AdvancedAnalytics-0.9.0.tar.gz

Hashes for AdvancedAnalytics-0.9.0.tar.gz
Algorithm	Hash digest
SHA256	`a4c91870fc57f759e52f0c360d7c75d4131d24a32a3655be14146391af4f6f87`
MD5	`42b144bb77b8f2a13b12ef6a72e7b564`
BLAKE2b-256	`4c3111015081b39560f8d457a027204665f733f54ee34b96bf64a64d3c648122`

Hashes for AdvancedAnalytics-0.9.0-py3-none-any.whl

Hashes for AdvancedAnalytics-0.9.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7cc916f2c0be870f261940b0ddad69a7e8a9e4f482486ab1998c8f789174fa14`
MD5	`cf034329c527d84be7a7a06e9defa6e1`
BLAKE2b-256	`fc8ea39470f5d5b31f70e57c3a23b221b48e5c0cfea679a26bf01f5b01ab7c05`

AdvancedAnalytics 0.9.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

AdvancedAnalytics

Description

Documentation and Examples

Usage

Current Modules and Classes

Installation and Dependencies

Code of Conduct

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

AdvancedAnalytics 0.9.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

﻿AdvancedAnalytics

Description

Documentation and Examples

Usage

Current Modules and Classes

Installation and Dependencies

Code of Conduct

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

AdvancedAnalytics