Generate profile report for pandas DataFrame
Project description
Data_analyser
data_analyser is a Python package for generating comprehensive profiling reports from pandas DataFrames, helping you quickly understand your data's structure and quality.
▶️ Quickstart
Install
pip install data_analyser
or
conda install -c conda-forge data_analyser
Start profiling
Start by loading your pandas DataFrame as you normally would, e.g. by using:
import numpy as np
import pandas as pd
from data_analyser import ProfileReport
df = pd.DataFrame(np.random.rand(100, 5), columns=["a", "b", "c", "d", "e"])
To generate the standard profiling report, merely run:
profile = ProfileReport(df, title="Profiling Report")
profile.to_file("output.html")
📊 Key features
- Type inference: automatic detection of columns' data types (Categorical, Numerical, Date, etc.)
- Warnings: A summary of the problems/challenges in the data that you might need to work on (missing data, inaccuracies, skewness, etc.)
- Univariate analysis: including descriptive statistics (mean, median, mode, etc) and informative visualizations such as distribution histograms
- Multivariate analysis: including correlations, a detailed analysis of missing data, duplicate rows, and visual support for variables pairwise interaction
- Time-Series: including different statistical information relative to time dependent data such as auto-correlation and seasonality, along ACF and PACF plots.
- Text analysis: most common categories (uppercase, lowercase, separator), scripts (Latin, Cyrillic) and blocks (ASCII, Cyrilic)
- File and Image analysis: file sizes, creation dates, dimensions, indication of truncated images and existence of EXIF metadata
- Compare datasets: one-line solution to enable a fast and complete report on the comparison of datasets
- Flexible output formats: all analysis can be exported to an HTML report that can be easily shared with different parties, as JSON for an easy integration in automated systems and as a widget in a Jupyter Notebook.
The report contains three additional sections:
- Overview: mostly global details about the dataset (number of records, number of variables, overall missigness and duplicates, memory footprint)
- Alerts: a comprehensive and automatic list of potential data quality issues (high correlation, skewness, uniformity, zeros, missing values, constant values, between others)
- Reproduction: technical details about the analysis (time, version and configuration)
Exporting the report to a file
To generate a HTML report file, save the ProfileReport to an object and use the to_file() function:
profile.to_file("your_report.html")
Alternatively, the report's data can be obtained as a JSON file:
# As a JSON string
json_data = profile.to_json()
# As a file
profile.to_file("your_report.json")
🛠️ Installation
Using pip
You can install using the pip package manager by running:
pip install -U data_analyser
Extras
The package declares "extras", sets of additional dependencies.
[notebook]: support for rendering the report in Jupyter notebook widgets.[unicode]: support for more detailed Unicode analysis, at the expense of additional disk space.[pyspark]: support for pyspark for big dataset analysis
Install these with e.g.
pip install -U data_analyser[notebook,unicode,pyspark]
🙋 Support
Need help? Want to share a perspective? Report a bug? Ideas for collaborations?
Shoot me an email @ leandroofalero@outlook.com
🤝🏽 Contributing
A big thank you to all the team at Ydata-profiling in whose work I based this package
License
This project is licensed under the MIT License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file leo_data_analyser-1.0.0.tar.gz.
File metadata
- Download URL: leo_data_analyser-1.0.0.tar.gz
- Upload date:
- Size: 267.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2b14548ce9c1033d6a92fb40baa53685c9cbe07cc12ef1e63bafbcd66426b6ce
|
|
| MD5 |
02f4c13c4661902348a97d6d739ecd1e
|
|
| BLAKE2b-256 |
2b8052acc96dae40397361472d8bb469ff2ccdb80928129be5c6a0249f9bfd54
|
File details
Details for the file leo_data_analyser-1.0.0-py2.py3-none-any.whl.
File metadata
- Download URL: leo_data_analyser-1.0.0-py2.py3-none-any.whl
- Upload date:
- Size: 350.8 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
110800e69ef67efeef3bcab7c5ebb3973af8ee37d989ed578535d1966fbf3b97
|
|
| MD5 |
f16613af4ecc81ef84146a7d976fae8a
|
|
| BLAKE2b-256 |
a5a6261882ec9052ffe50f54ccb1a87b66c502c2592e0e0b247f8a46307a8292
|