A simple program to get a basic EDA report in .docx format.

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 4 - Beta
Intended Audience
- Science/Research
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3.8

Project description

Automated Exploratory Data Analysis

A Python program to help automate the EDA process.

Data is analysed using pandas' built-in methods, and graphs are plotted using matplotlib & seaborn. The results are then nicely packaged as a .docx file using python-docx.

Installation

You can install the package from PyPI using:

pip install eda-report

Basic Usage

1. Graphical User Interface

The eda_report command launches a graphical window to help select and analyse a csv/excel file:

eda_report

screencast of the gui

You will be prompted to set a report title, graph color and output filename, after which the contents of the input file will be analysed, and the results will be saved in .docx format.

2. Interactive Mode

You can obtain a summary for a single feature (univariate) using the Variable class:

>>> from eda_report.univariate import Variable
>>> x = Variable(data=range(50), name='1 to 50')
>>> x
            Overview
            ========
Name: 1 to 50,
Type: numeric,
Unique Values: 50 -> {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, [...],
Missing Values: None

        Summary Statistics
        ==================
                         1 to 50
Number of observations  50.00000
Average                 24.50000
Standard Deviation      14.57738
Minimum                  0.00000
Lower Quartile          12.25000
Median                  24.50000
Upper Quartile          36.75000
Maximum                 49.00000
Skewness                 0.00000
Kurtosis                -1.20000

>>> x.show_graphs()

You can obtain statistics for a set of features (multivariate) using the MultiVariable class:

>>> from eda_report.multivariate import MultiVariable
>>> # Get a dataset
>>> import seaborn as sns
>>> data = sns.load_dataset('iris')
>>> X = MultiVariable(data)
Bivariate analysis: 100%|████████████████████████████████████████████| 6/6 [00:01<00:00,  3.85it/s]
>>> X
        Overview
        ========
Numeric features: sepal_length, sepal_width, petal_length, petal_width
Categorical features: species

        Summary Statistics (Numeric features)
        =====================================
       sepal_length  sepal_width  petal_length  petal_width
count    150.000000   150.000000    150.000000   150.000000
mean       5.843333     3.057333      3.758000     1.199333
std        0.828066     0.435866      1.765298     0.762238
min        4.300000     2.000000      1.000000     0.100000
25%        5.100000     2.800000      1.600000     0.300000
50%        5.800000     3.000000      4.350000     1.300000
75%        6.400000     3.300000      5.100000     1.800000
max        7.900000     4.400000      6.900000     2.500000

        Summary Statistics (Categorical features)
        =========================================
       species
count      150
unique       3
top     setosa
freq        50

        Bivariate Analysis (Correlation)
        ================================
sepal_length & petal_width --> strong positive correlation (0.82)
sepal_width & petal_width --> weak negative correlation (-0.37)
sepal_length & sepal_width --> very weak negative correlation (-0.12)
sepal_length & petal_length --> strong positive correlation (0.87)
sepal_width & petal_length --> weak negative correlation (-0.43)
petal_length & petal_width --> very strong positive correlation (0.96)

>>> X.show_correlation_heatmap()
>>> # Generate a report document
>>> from eda_report import get_word_report
>>> get_word_report(data)
[INFO 10:56:50.241] Assessing correlation in numeric variables...
Bivariate analysis: 100%|████████████████████████████████████████████| 6/6 [00:01<00:00,  3.89it/s]
[INFO 10:56:53.851] Done. Summarising each variable...
Univariate analysis: 100%|███████████████████████████████████████████| 5/5 [00:01<00:00,  2.52it/s]
[INFO 10:56:56.007] Done. Results saved as 'eda-report.docx'

3. Command Line Interface

To analyse a file named input.csv, just supply its path to the eda_cli command:

eda_cli input.csv

Or even:

eda_cli input.csv -o output.docx -c cyan --title 'EDA Report'

For more details on the optional arguments, pass the -h or --help flag to view the help message:

eda_cli -h

usage: eda_cli [-h] [-o OUTFILE] [-t TITLE] [-c COLOR] infile

Get a basic EDA report in docx format.

positional arguments:
  infile                A .csv or .xlsx file to process.

optional arguments:
  -h, --help            show this help message and exit
  -o OUTFILE, --outfile OUTFILE
                        The output file (default: eda-report.docx)
  -t TITLE, --title TITLE
                        The top level heading in the report (default: Exploratory Data Analysis Report)
  -c COLOR, --color COLOR
                        A valid matplotlib color specifier (default: orangered)

Visit the official documentation for more details.

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 4 - Beta
Intended Audience
- Science/Research
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3.8

Release history Release notifications | RSS feed

2.8.1

Aug 19, 2023

2.8.0

Apr 26, 2023

2.7.3

Dec 5, 2022

2.7.2

Oct 31, 2022

2.7.1

Oct 24, 2022

2.7.0

Oct 6, 2022

2.6.0

Jul 12, 2022

2.5.1

May 29, 2022

2.5.0

Apr 22, 2022

2.4.1

Mar 16, 2022

2.4.0

Mar 8, 2022

2.3.1

Jan 25, 2022

2.3.0

Jan 18, 2022

2.2.4

Dec 14, 2021

2.2.3

Nov 7, 2021

2.2.2

Oct 5, 2021

2.2.1

Sep 20, 2021

2.2.0

Sep 10, 2021

2.1.0

Aug 20, 2021

2.0.0

Jul 28, 2021

2.0.0rc0 pre-release

Jul 27, 2021

1.6.2

Jun 29, 2021

1.6.1

Jun 22, 2021

1.6.0

Jun 14, 2021

1.5.0

Jun 8, 2021

1.4.0

Jun 4, 2021

1.4.0rc0 pre-release

Jun 4, 2021

1.4.0b0 pre-release

Jun 3, 2021

This version

1.3.2

May 16, 2021

1.3.2rc0 pre-release

May 16, 2021

1.3.1

Apr 26, 2021

1.3.1rc0 pre-release

Apr 25, 2021

1.3.0

Apr 24, 2021

1.3.0rc0 pre-release

Apr 24, 2021

1.3.0b0 pre-release

Apr 24, 2021

1.3.0a0 pre-release

Apr 24, 2021

1.2.0

Apr 2, 2021

1.2.0rc1 pre-release

Apr 2, 2021

1.2.0b1 pre-release

Apr 2, 2021

1.2.0b0 pre-release

Apr 2, 2021

1.1.3

Mar 28, 2021

1.1.3a1 pre-release

Mar 28, 2021

1.1.2

Mar 25, 2021

1.1.2rc1 pre-release

Mar 25, 2021

1.1.2rc0 pre-release

Mar 25, 2021

1.1.1

Mar 22, 2021

1.1.0

Mar 12, 2021

1.0.0

Mar 11, 2021

0.0.6

Mar 9, 2021

0.0.6b0 pre-release

Mar 9, 2021

0.0.6a0 pre-release

Mar 9, 2021

0.0.5

Mar 7, 2021

0.0.5a0 pre-release

Mar 7, 2021

0.0.4

Mar 3, 2021

0.0.3

Feb 28, 2021

0.0.2

Feb 24, 2021

0.0.1

Feb 24, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

eda_report-1.3.2.tar.gz (115.5 kB view hashes)

Uploaded May 16, 2021 Source

Built Distribution

eda_report-1.3.2-py3-none-any.whl (114.8 kB view hashes)

Uploaded May 16, 2021 Python 3

Hashes for eda_report-1.3.2.tar.gz

Hashes for eda_report-1.3.2.tar.gz
Algorithm	Hash digest
SHA256	`e12063e93ea4ecb8049c866b2f6076f474b5b344474fc0e051764ecde9fe2932`
MD5	`99ac30e443bcd27a5683b2cd3e8764fc`
BLAKE2b-256	`c796bd1b4a16cd970c536f4775272252a6ff1d67a7937ec2409e01624a3d2d0b`

Hashes for eda_report-1.3.2-py3-none-any.whl

Hashes for eda_report-1.3.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`455f961b9f957f14045db01f6ea1486491259d88e27fcdbac1696e8432a44d1e`
MD5	`40abb43bfa84cce2120a23fc86ca7e79`
BLAKE2b-256	`ab19412371af4b053053c2ce01169879822c76497e513d21bb5474456baadb7c`