Generate profile report for pandas DataFrame

These details have not been verified by PyPI

Project links

Homepage

Project description

Data_analyser

data_analyser is a Python package for generating comprehensive profiling reports from pandas DataFrames, helping you quickly understand your data's structure and quality.

▶️ Quickstart

Install

pip install data_analyser

conda install -c conda-forge data_analyser

Start profiling

Start by loading your pandas DataFrame as you normally would, e.g. by using:

import numpy as np
import pandas as pd
from data_analyser import ProfileReport

df = pd.DataFrame(np.random.rand(100, 5), columns=["a", "b", "c", "d", "e"])

To generate the standard profiling report, merely run:

profile = ProfileReport(df, title="Profiling Report")
profile.to_file("output.html")

📊 Key features

Type inference: automatic detection of columns' data types (Categorical, Numerical, Date, etc.)
Warnings: A summary of the problems/challenges in the data that you might need to work on (missing data, inaccuracies, skewness, etc.)
Univariate analysis: including descriptive statistics (mean, median, mode, etc) and informative visualizations such as distribution histograms
Multivariate analysis: including correlations, a detailed analysis of missing data, duplicate rows, and visual support for variables pairwise interaction
Time-Series: including different statistical information relative to time dependent data such as auto-correlation and seasonality, along ACF and PACF plots.
Text analysis: most common categories (uppercase, lowercase, separator), scripts (Latin, Cyrillic) and blocks (ASCII, Cyrilic)
File and Image analysis: file sizes, creation dates, dimensions, indication of truncated images and existence of EXIF metadata
Compare datasets: one-line solution to enable a fast and complete report on the comparison of datasets
Flexible output formats: all analysis can be exported to an HTML report that can be easily shared with different parties, as JSON for an easy integration in automated systems and as a widget in a Jupyter Notebook.

The report contains three additional sections:

Overview: mostly global details about the dataset (number of records, number of variables, overall missigness and duplicates, memory footprint)
Alerts: a comprehensive and automatic list of potential data quality issues (high correlation, skewness, uniformity, zeros, missing values, constant values, between others)
Reproduction: technical details about the analysis (time, version and configuration)

Exporting the report to a file

To generate a HTML report file, save the ProfileReport to an object and use the to_file() function:

profile.to_file("your_report.html")

Alternatively, the report's data can be obtained as a JSON file:

# As a JSON string
json_data = profile.to_json()

# As a file
profile.to_file("your_report.json")

🛠️ Installation

Using pip

You can install using the pip package manager by running:

pip install -U data_analyser

Extras

The package declares "extras", sets of additional dependencies.

[notebook]: support for rendering the report in Jupyter notebook widgets.
[unicode]: support for more detailed Unicode analysis, at the expense of additional disk space.
[pyspark]: support for pyspark for big dataset analysis

Install these with e.g.

pip install -U data_analyser[notebook,unicode,pyspark]

🙋 Support

Need help? Want to share a perspective? Report a bug? Ideas for collaborations?

Shoot me an email @ leandroofalero@outlook.com

🤝🏽 Contributing

A big thank you to all the team at Ydata-profiling in whose work I based this package

License

This project is licensed under the MIT License

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

1.0.0

Aug 30, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

leo_data_analyser-1.0.0.tar.gz (267.5 kB view details)

Uploaded Aug 30, 2024 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

leo_data_analyser-1.0.0-py2.py3-none-any.whl (350.8 kB view details)

Uploaded Aug 30, 2024 Python 2Python 3

File details

Details for the file leo_data_analyser-1.0.0.tar.gz.

File metadata

Download URL: leo_data_analyser-1.0.0.tar.gz
Upload date: Aug 30, 2024
Size: 267.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.12.5

File hashes

Hashes for leo_data_analyser-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`2b14548ce9c1033d6a92fb40baa53685c9cbe07cc12ef1e63bafbcd66426b6ce`
MD5	`02f4c13c4661902348a97d6d739ecd1e`
BLAKE2b-256	`2b8052acc96dae40397361472d8bb469ff2ccdb80928129be5c6a0249f9bfd54`

See more details on using hashes here.

File details

Details for the file leo_data_analyser-1.0.0-py2.py3-none-any.whl.

File metadata

Download URL: leo_data_analyser-1.0.0-py2.py3-none-any.whl
Upload date: Aug 30, 2024
Size: 350.8 kB
Tags: Python 2, Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.12.5

File hashes

Hashes for leo_data_analyser-1.0.0-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`110800e69ef67efeef3bcab7c5ebb3973af8ee37d989ed578535d1966fbf3b97`
MD5	`f16613af4ecc81ef84146a7d976fae8a`
BLAKE2b-256	`a5a6261882ec9052ffe50f54ccb1a87b66c502c2592e0e0b247f8a46307a8292`

See more details on using hashes here.

leo-data-analyser 1.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Data_analyser

▶️ Quickstart

Install

Start profiling

📊 Key features

Exporting the report to a file

🛠️ Installation

Using pip

Extras

🙋 Support

🤝🏽 Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes