A simple program to automate exploratory data analysis and reporting.
Project description
eda-report
- Automated Exploratory Data Analysis
A Python program to help automate the exploratory data analysis and reporting process.
Input data is analyzed using pandas and SciPy. Graphs are plotted using matplotlib. The results are then nicely packaged as a Word (.docx) document using python-docx.
Installation
You can install the package from PyPI using:
pip install eda-report
Basic Usage
1. Graphical User Interface
The eda-report
command launches a graphical window to help select and analyze a csv
/excel
file:
eda-report
You will be prompted to set a report title, group-by variable (optional), graph color and output filename, after which the contents of the input file will be analyzed, and the results will be saved in a Word (.docx) document.
NOTE: For help with
Tk
- related issues, consider visiting TkDocs.
2. Command Line Interface
To analyze a file named input.csv
, just supply its path to the eda-report
command:
eda-report -i input.csv
Or even:
eda-report -i input.csv -o output.docx -c cyan --title 'EDA Report'
For more details on the optional arguments, pass the -h
or --help
flag to view the help message:
eda-report -h
usage: eda-report [-h] [-i INFILE] [-o OUTFILE] [-t TITLE] [-c COLOR]
[-g GROUPBY]
Automatically analyze data and generate reports. A graphical user interface
will be launched if none of the optional arguments is specified.
optional arguments:
-h, --help show this help message and exit
-i INFILE, --infile INFILE
A .csv or .xlsx file to analyze.
-o OUTFILE, --outfile OUTFILE
The output name for analysis results (default: eda-
report.docx)
-t TITLE, --title TITLE
The top level heading for the report (default:
Exploratory Data Analysis Report)
-c COLOR, --color COLOR
The color to apply to graphs (default: cyan)
-g GROUPBY, -T GROUPBY, --groupby GROUPBY, --target GROUPBY
The variable to use for grouping plotted values. An
integer value is treated as a column index, whereas a
string is treated as a column label.
3. Interactive Mode
3.1 Analyze data
>>> eda_report.summarize(iris_data)
OVERVIEW
========
Numeric features: sepal_length, sepal_width, petal_length, petal_width
Categorical features: species
Summary Statistics (Numeric features)
-------------------------------------
count mean std min 25% 50% 75% max skewness kurtosis
sepal_length 150.0 5.8433 0.8281 4.3 5.1 5.80 6.4 7.9 0.3149 -0.5521
sepal_width 150.0 3.0573 0.4359 2.0 2.8 3.00 3.3 4.4 0.3190 0.2282
petal_length 150.0 3.7580 1.7653 1.0 1.6 4.35 5.1 6.9 -0.2749 -1.4021
petal_width 150.0 1.1993 0.7622 0.1 0.3 1.30 1.8 2.5 -0.1030 -1.3406
Summary Statistics (Categorical features)
-----------------------------------------
count unique top freq relative freq
species 150 3 setosa 50 33.33%
Pearson's Correlation (Top 20)
------------------------------
petal_length & petal_width --> very strong positive correlation (0.96)
sepal_length & petal_length --> very strong positive correlation (0.87)
sepal_length & petal_width --> very strong positive correlation (0.82)
sepal_width & petal_length --> moderate negative correlation (-0.43)
sepal_width & petal_width --> weak negative correlation (-0.37)
sepal_length & sepal_width --> very weak negative correlation (-0.12)
3.2 Plot statistical graphs
>>> fig = ep.regression_plot(mpg_data["acceleration"], mpg_data["horsepower"],
... labels=("Acceleration", "Horsepower"))
>>> fig.savefig("regression-plot.png")
3.3 Generate a report
>>> eda_report.get_word_report(iris_data)
Analyze variables: 100%|███████████████████████████████████| 5/5
Plot variables: 100%|███████████████████████████████████| 5/5
Bivariate analysis: 100%|███████████████████████████████████| 6/6 pairs.
[INFO 16:14:53.648] Done. Results saved as 'eda-report.docx'
<eda_report.document.ReportDocument object at 0x7f196753bd60>
Visit the official documentation for more info.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for eda_report-2.7.3-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8aa0b673c40d331993135216af2d58e3d75f24ee2d4d12bb836eae65e0d8087d |
|
MD5 | bb5738d6f54550816fe500ef05639981 |
|
BLAKE2b-256 | adb4cae293a9152e9af296d8ba70a3f1f913dfddbd494f0a54b55bc9eece1e91 |