A simple program to automate exploratory data analysis and reporting.

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 4 - Beta
Intended Audience
- Science/Research
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3.8

Project description

`eda-report` - Automated Exploratory Data Analysis

A Python program to help automate the exploratory data analysis and reporting process.

Input data is processed and analysed using pandas' built-in methods, and graphs are plotted using matplotlib & seaborn. The results are then nicely packaged as a Word (.docx) document using python-docx.

Installation

You can install the package from PyPI using:

pip install eda-report

Basic Usage

1. Graphical User Interface

The eda-report command launches a graphical window to help select and analyse a csv/excel file:

eda-report

screencast of the gui

You will be prompted to set a report title, target variable (optional), graph color and output filename, after which the contents of the input file will be analysed, and the results will be saved in a Word (.docx) document.

2. Command Line Interface

To analyse a file named input.csv, just supply its path to the eda-report command:

eda-report -i input.csv

Or even:

eda-report -i input.csv -o output.docx -c cyan --title 'EDA Report'

For more details on the optional arguments, pass the -h or --help flag to view the help message:

eda-report -h

usage: eda-report [-h] [-i INFILE] [-o OUTFILE] [-t TITLE] [-c COLOR]
                  [-T TARGET]

Automatically analyse data and generate reports. A graphical user interface
will be launched if none of the optional arguments is specified.

optional arguments:
  -h, --help            show this help message and exit
  -i INFILE, --infile INFILE
                        A .csv or .xlsx file to analyse.
  -o OUTFILE, --outfile OUTFILE
                        The output name for analysis results (default: eda-
                        report.docx)
  -t TITLE, --title TITLE
                        The top level heading for the report (default:
                        Exploratory Data Analysis Report)
  -c COLOR, --color COLOR
                        The color to apply to graphs (default: cyan)
  -T TARGET, --target TARGET
                        The target variable (dependent feature), used to
                        color-code plotted values. An integer value is treated
                        as a column index, whereas a string is treated as a
                        column label.

3. Interactive Mode

3.1 Analyse univariate data

>>> from eda_report.univariate import Variable
>>> Variable(range(20), name="1 to 20")
        Overview
        ========
Name: 1 to 20
Type: numeric
Unique Values: 20 -> [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, [...]
Missing Values: None
          ***
      Summary Statistics
                         1 to 20
Number of observations  20.00000
Average                  9.50000
Standard Deviation       5.91608
Minimum                  0.00000
Lower Quartile           4.75000
Median                   9.50000
Upper Quartile          14.25000
Maximum                 19.00000
Skewness                 0.00000
Kurtosis                -1.20000

3.2 Analyse multivariate data

>>> from eda_report.multivariate import MultiVariable
>>> from seaborn import load_dataset
>>> data = load_dataset("iris")
>>> MultiVariable(data)
                        OVERVIEW
                        ========
Numeric features: sepal_length, sepal_width, petal_length, petal_width
Categorical features: species
                          ***
          Summary Statistics (Numeric features)
          -------------------------------------
              count    mean     std  min  25%   50%  75%  max  skewness  kurtosis
sepal_length  150.0  5.8433  0.8281  4.3  5.1  5.80  6.4  7.9    0.3149   -0.5521
sepal_width   150.0  3.0573  0.4359  2.0  2.8  3.00  3.3  4.4    0.3190    0.2282
petal_length  150.0  3.7580  1.7653  1.0  1.6  4.35  5.1  6.9   -0.2749   -1.4021
petal_width   150.0  1.1993  0.7622  0.1  0.3  1.30  1.8  2.5   -0.1030   -1.3406
                          ***
          Summary Statistics (Categorical features)
          -----------------------------------------
        count unique     top freq relative freq
species   150      3  setosa   50        33.33%
                          ***
          Bivariate Analysis (Correlation)
          --------------------------------
petal_length & petal_width --> very strong positive correlation (0.96)
sepal_length & petal_length --> strong positive correlation (0.87)
sepal_length & petal_width --> strong positive correlation (0.82)
sepal_length & sepal_width --> very weak negative correlation (-0.12)
sepal_width & petal_length --> weak negative correlation (-0.43)
sepal_width & petal_width --> weak negative correlation (-0.37)

3.3 Generate a report

>>> from eda_report import get_word_report
>>> from seaborn import load_dataset

>>> data = load_dataset("iris")
>>> get_word_report(data)
Bivariate analysis: 100%|███████████████████████████████████| 6/6 numeric pairs.
Univariate analysis: 100%|███████████████████████████████████| 5/5 features.
[INFO 17:31:37.880] Done. Results saved as 'eda-report.docx'
<eda_report.document.ReportDocument object at 0x7f3040c9bcd0>

Visit the official documentation for more details.

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 4 - Beta
Intended Audience
- Science/Research
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3.8

Release history Release notifications | RSS feed

2.8.1

Aug 19, 2023

2.8.0

Apr 26, 2023

2.7.3

Dec 5, 2022

2.7.2

Oct 31, 2022

2.7.1

Oct 24, 2022

2.7.0

Oct 6, 2022

2.6.0

Jul 12, 2022

2.5.1

May 29, 2022

2.5.0

Apr 22, 2022

2.4.1

Mar 16, 2022

2.4.0

Mar 8, 2022

2.3.1

Jan 25, 2022

2.3.0

Jan 18, 2022

2.2.4

Dec 14, 2021

2.2.3

Nov 7, 2021

2.2.2

Oct 5, 2021

2.2.1

Sep 20, 2021

This version

2.2.0

Sep 10, 2021

2.1.0

Aug 20, 2021

2.0.0

Jul 28, 2021

2.0.0rc0 pre-release

Jul 27, 2021

1.6.2

Jun 29, 2021

1.6.1

Jun 22, 2021

1.6.0

Jun 14, 2021

1.5.0

Jun 8, 2021

1.4.0

Jun 4, 2021

1.4.0rc0 pre-release

Jun 4, 2021

1.4.0b0 pre-release

Jun 3, 2021

1.3.2

May 16, 2021

1.3.2rc0 pre-release

May 16, 2021

1.3.1

Apr 26, 2021

1.3.1rc0 pre-release

Apr 25, 2021

1.3.0

Apr 24, 2021

1.3.0rc0 pre-release

Apr 24, 2021

1.3.0b0 pre-release

Apr 24, 2021

1.3.0a0 pre-release

Apr 24, 2021

1.2.0

Apr 2, 2021

1.2.0rc1 pre-release

Apr 2, 2021

1.2.0b1 pre-release

Apr 2, 2021

1.2.0b0 pre-release

Apr 2, 2021

1.1.3

Mar 28, 2021

1.1.3a1 pre-release

Mar 28, 2021

1.1.2

Mar 25, 2021

1.1.2rc1 pre-release

Mar 25, 2021

1.1.2rc0 pre-release

Mar 25, 2021

1.1.1

Mar 22, 2021

1.1.0

Mar 12, 2021

1.0.0

Mar 11, 2021

0.0.6

Mar 9, 2021

0.0.6b0 pre-release

Mar 9, 2021

0.0.6a0 pre-release

Mar 9, 2021

0.0.5

Mar 7, 2021

0.0.5a0 pre-release

Mar 7, 2021

0.0.4

Mar 3, 2021

0.0.3

Feb 28, 2021

0.0.2

Feb 24, 2021

0.0.1

Feb 24, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

eda_report-2.2.0.tar.gz (42.5 kB view hashes)

Uploaded Sep 10, 2021 Source

Built Distribution

eda_report-2.2.0-py3-none-any.whl (42.6 kB view hashes)

Uploaded Sep 10, 2021 Python 3

Hashes for eda_report-2.2.0.tar.gz

Hashes for eda_report-2.2.0.tar.gz
Algorithm	Hash digest
SHA256	`4ff8c93541122fc5d80daa0c00135ed85a982b80749a31a70858918fad3e27b5`
MD5	`b67d7d8345b71a97a9032cc1b6c9bd38`
BLAKE2b-256	`d396b6b3828a6f84ae783053cb267d7cd5c53d399c6de52e889fd80152879e9a`

Hashes for eda_report-2.2.0-py3-none-any.whl

Hashes for eda_report-2.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`94bb363ab39fd22f3904385a3e525a76562ca9a78bd52e0b664401c9ffbdaf43`
MD5	`c12be0b24a883147a18c2720f7ba8111`
BLAKE2b-256	`70b5d93428066b96dcb1c4fda6eb88ec14dc5ddf6b92e40aa1f7fb0cce546428`

eda-report 2.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Project description

`eda-report` - Automated Exploratory Data Analysis

Installation

Basic Usage

1. Graphical User Interface

2. Command Line Interface

3. Interactive Mode

3.1 Analyse univariate data

3.2 Analyse multivariate data

3.3 Generate a report

Project details

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

eda-report 2.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Project description

eda-report - Automated Exploratory Data Analysis

Installation

Basic Usage

1. Graphical User Interface

2. Command Line Interface

3. Interactive Mode

3.1 Analyse univariate data

3.2 Analyse multivariate data

3.3 Generate a report

Project details

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

`eda-report` - Automated Exploratory Data Analysis