Toolbox for easy and effective data exploration

These details have not been verified by PyPI

Project links

Homepage

Project description

EasyExplore

Description:

Toolbox for easy and effective data exploration in Python. It is designed to work with Jupyter notebooks especially, but it can also be used in any python module.

Table of Content:

Installation
Requirements
Introduction
- Practical Usage
- Utilities
  - DataImporter
  - DataExporter
- DataExplorer
- DataVisualizer
- TextMiner

1. Installation:

You can easily install EasyExplore via pip install easyexplore on every operating system.

2. Requirements:

dask>=2.23.0
emoji>=0.5.4
geojson>=2.5.0
googletrans>=3.0.0
ipywidgets>=0.5.1
joblib>=0.14.1
networkx>=2.2
nltk>=3.5
numpy>=1.18.1
pandas>=1.1.0
plotly>=4.5.4
pyod>=0.7.7.1
psutil>=5.5.1
scipy>=1.4.1
spacy>=2.3.2
scikit-learn>=0.23.1
sqlalchemy>=1.3.15
statsmodels>=0.9.0
wheel>=0.35.1
xlrd>=1.2.0

3. Introduction:

Practical Usage:

EasyExplore is designed as a wrapper which helps Data Scientists to explore data more convinient and efficient.

Data Importer:

You can easily import data set from several files as well as databases into a Pandas or dask DataFrame.

Data Exporter:

You can easily import data set from Pandas DataFrame or other data objects into several files or databases.

Data Explorer:

Explore your data set quickly and efficiently using the DataExplorer:

-- Data Typing:

    Check whether represented data types of Pandas is equal to the real data types occuring in the data

-- Data Health Check:

    Check the health of the data set in order to detecting, describing and visualizing ...
        ... the ammount of missing or invalid data vs. valid observations
        ... the amount of duplicated data
        ... the amount of invariant data

-- Data Distribution:

    Describing and visualizing statistical distribution of ...
        ... categorical features
        ... continuous features
        ... date features

-- Outlier Detection:

    Analyze outliers or anomalies of continuous features using univariate and multivariate methods:
        a) Univariate: Examines outlier values for each features separately using Inter-Quantile-Range (IQR)
        b) Multivarite: Examines outliers for each possible feature pair combined using a bunch of different machine learning algorithms. For further information just look at the PyOD packages documentation, because it is used under the hood.

-- Categorical Breakdown Statistics:

    Descriptive statistics of continuous features grouped by values of each categorical feature in the data set:


-- Correlation:

    Correlation analysis of continuous features. For analyzing multi-collinearity there is a partial correlation method implemented. The differences between marginal and partial correlations are inspected by visualizing the differences of the coefficients in a heat map as well.

-- Geo Statistics:

    Descriptive statistics of continuous features grouped by values of each geo features in the data set. Additionally, there is a geo map (OpenStreetMap) generated to visualize statistical distribution.

-- Text Analyzer:

    Analyze potential text features and generate various numerical features from those

Data Visualizer:

Let's make data visualization great again! Visualize your data set very easily using Plot.ly an interactive visualization library under the hood. The DataVisualizer is an efficient wrapper to abstract the most important elements for data exploration:

-- Table Chart:
    Visualize matrix (Pandas DataFrame) as an interactive table

-- Heat Map:
    Visualize value range of continuous features as heat map

-- Geo Map:
    Visualize statistics of categorical and continuous features as interactive OpenStreetMap

-- Contour Chart:
    Visualize value ranges of at least two continuous features as contours

-- Pie Chart:
    Visualize occurances of values of categorical features as an interactive pie chart

-- Bar Chart:
    Visualize occurances of values of categorical features as an interactive bar chart

-- Histogram:
    Visualize distribution of continuous features as an interactive histogram

-- Box-Whisker-Plot:
    Visualize descriptive statistics of continuous features as an interactive box-whisker-plot

-- Violin Chart:
    Visualize descriptive statistics of continuous features as an interactive violin chart

-- Parallel Category Chart:
    Visualize relationships interactively between categorical features especially, but it can also be used for mixed relations between values of categorical and continuous features by using brushing as well.

-- Parallel Coordinate Chart:
    Visualize relationships interactively between ranges of continuous features especially, but it can also be used for mixed relations between values of categorical and ranges of continuous features as well.

-- Scatter Chart:
    Visualize values of continuous features interactively.

-- Scatter3D Chart:
    Visualize values of three continuous features in one chart interactively.

-- Joint Distribution Chart:
    Visualize values of two continuous features interactively, including contours and histogram for each continuous feature.

-- Ridgeline Chart:
    Visualize changes in distribution of continuous features on certain time steps separately.

-- Line Chart:
    Visualize distribution after certain time steps as an interactive line chart.

-- Candlestick Chart:
    Visualize descritive statistics for each time steps as an interactive candlestick chart.

-- Dendrogram:
    Visualize hierarchical clusters.

-- Silhoutte Chart:
    Visualize partitionized clusters.

TextMiner

Explore text data (natural language) by generating various numerical features describing the text

-- Segmentation:

    Categorize potential text features into following segments ...
        -> Web features
            1) URL
            2) EMail
        -> Enumerated features
        -> Natural language (original text features)
        -> Identifier (original id features)
        -> Unknown

-- Simple text processing:
    Apply simple processing methods to text features
        -> Merge two text features by given separator
        -> Replace occurances
        -> Subset data set or feature list by given string

-- Language methods:
    Apply methods to ...
        -> ... detect language in text
        -> ... translate using Google Translate under the hood

-- Generate linguistic features:
    Apply semantic text processing to generate numeric features
        -> Clean text counter (text after removing stop words, punctuation and special character and lemmatizing)
        -> Part-of-Speech Tagging counter & labels
        -> Named Entity Recognition counter & labels
        -> Dependencies counter & labels (Tree based / Noun Chunks)
        -> Emoji counter & labels

-- Generate similarity / clustering features:
    Apply similarity methods to generate continuous features using word embeddings
        -> TF-IDF

4. Examples:

Check the jupyter notebook for examples. Happy exploration :)

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.7.4

Apr 16, 2023

0.7.3

Apr 1, 2023

0.7.2

Jun 8, 2022

0.7.1

Jun 4, 2021

0.7.0

May 18, 2021

0.6.9

May 10, 2021

0.6.8

Apr 9, 2021

0.6.7

Apr 6, 2021

0.6.6

Mar 8, 2021

0.6.5

Mar 8, 2021

0.6.4

Mar 5, 2021

0.6.3

Mar 5, 2021

0.6.2

Mar 5, 2021

0.6.1

Mar 2, 2021

0.6.0

Mar 2, 2021

0.5.9

Mar 2, 2021

0.5.8

Jan 31, 2021

0.5.7

Jan 29, 2021

0.5.6

Jan 2, 2021

0.5.5

Dec 18, 2020

0.5.4

Dec 8, 2020

0.5.3

Oct 2, 2020

0.5.2

Oct 1, 2020

0.5.1

Sep 29, 2020

0.5.0

Sep 25, 2020

0.4.9

Sep 22, 2020

0.4.8

Sep 14, 2020

0.4.7

Sep 14, 2020

0.4.6

Sep 14, 2020

0.4.5

Sep 13, 2020

0.4.4

Sep 9, 2020

0.4.3

Sep 8, 2020

0.4.2

Sep 7, 2020

This version

0.4.1

Sep 5, 2020

0.4.0

Sep 4, 2020

0.3.9

Sep 2, 2020

0.3.8

Sep 2, 2020

0.3.7

Sep 1, 2020

0.3.6

Sep 1, 2020

0.3.5

Sep 1, 2020

0.3.4

Aug 31, 2020

0.3.3

Aug 31, 2020

0.3.2

Aug 31, 2020

0.3.1

Aug 31, 2020

0.3.0

Aug 11, 2020

0.2.9

Aug 11, 2020

0.2.8

Aug 11, 2020

0.2.7

Jul 13, 2020

0.2.6

Jul 6, 2020

0.2.5

Jul 6, 2020

0.2.4

Jul 6, 2020

0.2.3

Jul 6, 2020

0.2.2

Jul 4, 2020

0.2.1

Jun 29, 2020

0.2.0

Jun 7, 2020

0.1.9

Jun 6, 2020

0.1.8

Jun 6, 2020

0.1.7

Jun 1, 2020

0.1.6

May 24, 2020

0.1.5

May 20, 2020

0.1.4

May 19, 2020

0.1.3

May 18, 2020

0.1.2

May 12, 2020

0.1.1

May 11, 2020

0.1.0

May 11, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

easyexplore-0.4.1.tar.gz (96.2 kB view details)

Uploaded Sep 5, 2020 Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

easyexplore-0.4.1-py3.7.egg (206.5 kB view details)

Uploaded Sep 5, 2020 Egg

easyexplore-0.4.1-py3-none-any.whl (105.6 kB view details)

Uploaded Sep 5, 2020 Python 3

File details

Details for the file easyexplore-0.4.1.tar.gz.

File metadata

Download URL: easyexplore-0.4.1.tar.gz
Upload date: Sep 5, 2020
Size: 96.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.0.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.7.3

File hashes

Hashes for easyexplore-0.4.1.tar.gz
Algorithm	Hash digest
SHA256	`1d854f25b94055448d11f5c35e5da4d0a4e40b935a3b9468a9fc330cb9fdf071`
MD5	`a605eb3c2ad23f67c5ac138608f6d753`
BLAKE2b-256	`3383722705d806adbd8fd7b0d8864f553ae900b803a0a40b4a8c2fa2badc8f62`

See more details on using hashes here.

File details

Details for the file easyexplore-0.4.1-py3.7.egg.

File metadata

Download URL: easyexplore-0.4.1-py3.7.egg
Upload date: Sep 5, 2020
Size: 206.5 kB
Tags: Egg
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.0.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.7.3

File hashes

Hashes for easyexplore-0.4.1-py3.7.egg
Algorithm	Hash digest
SHA256	`e91098f3e0f3ac48b2b5234a1f4df0e70125550bb2bf818a1dadc212d715703b`
MD5	`a61d3be9b8d203ad35bf39a5ea2932b0`
BLAKE2b-256	`8f87c04390c7efb7f580b518cfc148cf125ab3b6a9544ca405f405f080b3d224`

See more details on using hashes here.

File details

Details for the file easyexplore-0.4.1-py3-none-any.whl.

File metadata

Download URL: easyexplore-0.4.1-py3-none-any.whl
Upload date: Sep 5, 2020
Size: 105.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.0.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.7.3

File hashes

Hashes for easyexplore-0.4.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`916d0e57901ce41ea713b2c14e9ed3d8f659fca62c6da0f2ce1b83d764b7ed02`
MD5	`6d5ebbeaa1d508e8546f1e484c9296c6`
BLAKE2b-256	`3237b2196acf94e1835940041a871724032741a52ddd7d91ddc5622acfb504b0`

See more details on using hashes here.

easyexplore 0.4.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

EasyExplore

Description:

Table of Content:

1. Installation:

2. Requirements:

3. Introduction:

4. Examples:

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distributions

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes