Skip to main content

Generate profile report for pandas DataFrame

Project description

Data_analyser

data_analyser is a Python package for generating comprehensive profiling reports from pandas DataFrames, helping you quickly understand your data's structure and quality.

▶️ Quickstart

Install

pip install data_analyser

or

conda install -c conda-forge data_analyser

Start profiling

Start by loading your pandas DataFrame as you normally would, e.g. by using:

import numpy as np
import pandas as pd
from data_analyser import ProfileReport

df = pd.DataFrame(np.random.rand(100, 5), columns=["a", "b", "c", "d", "e"])

To generate the standard profiling report, merely run:

profile = ProfileReport(df, title="Profiling Report")
profile.to_file("output.html")

📊 Key features

  • Type inference: automatic detection of columns' data types (Categorical, Numerical, Date, etc.)
  • Warnings: A summary of the problems/challenges in the data that you might need to work on (missing data, inaccuracies, skewness, etc.)
  • Univariate analysis: including descriptive statistics (mean, median, mode, etc) and informative visualizations such as distribution histograms
  • Multivariate analysis: including correlations, a detailed analysis of missing data, duplicate rows, and visual support for variables pairwise interaction
  • Time-Series: including different statistical information relative to time dependent data such as auto-correlation and seasonality, along ACF and PACF plots.
  • Text analysis: most common categories (uppercase, lowercase, separator), scripts (Latin, Cyrillic) and blocks (ASCII, Cyrilic)
  • File and Image analysis: file sizes, creation dates, dimensions, indication of truncated images and existence of EXIF metadata
  • Compare datasets: one-line solution to enable a fast and complete report on the comparison of datasets
  • Flexible output formats: all analysis can be exported to an HTML report that can be easily shared with different parties, as JSON for an easy integration in automated systems and as a widget in a Jupyter Notebook.

The report contains three additional sections:

  • Overview: mostly global details about the dataset (number of records, number of variables, overall missigness and duplicates, memory footprint)
  • Alerts: a comprehensive and automatic list of potential data quality issues (high correlation, skewness, uniformity, zeros, missing values, constant values, between others)
  • Reproduction: technical details about the analysis (time, version and configuration)

Exporting the report to a file

To generate a HTML report file, save the ProfileReport to an object and use the to_file() function:

profile.to_file("your_report.html")

Alternatively, the report's data can be obtained as a JSON file:

# As a JSON string
json_data = profile.to_json()

# As a file
profile.to_file("your_report.json")

🛠️ Installation

Using pip

You can install using the pip package manager by running:

pip install -U data_analyser

Extras

The package declares "extras", sets of additional dependencies.

  • [notebook]: support for rendering the report in Jupyter notebook widgets.
  • [unicode]: support for more detailed Unicode analysis, at the expense of additional disk space.
  • [pyspark]: support for pyspark for big dataset analysis

Install these with e.g.

pip install -U data_analyser[notebook,unicode,pyspark]

🙋 Support

Need help? Want to share a perspective? Report a bug? Ideas for collaborations?

Shoot me an email @ leandroofalero@outlook.com

🤝🏽 Contributing

A big thank you to all the team at Ydata-profiling in whose work I based this package

License

This project is licensed under the MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

leo_data_analyser-1.0.0.tar.gz (267.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

leo_data_analyser-1.0.0-py2.py3-none-any.whl (350.8 kB view details)

Uploaded Python 2Python 3

File details

Details for the file leo_data_analyser-1.0.0.tar.gz.

File metadata

  • Download URL: leo_data_analyser-1.0.0.tar.gz
  • Upload date:
  • Size: 267.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.5

File hashes

Hashes for leo_data_analyser-1.0.0.tar.gz
Algorithm Hash digest
SHA256 2b14548ce9c1033d6a92fb40baa53685c9cbe07cc12ef1e63bafbcd66426b6ce
MD5 02f4c13c4661902348a97d6d739ecd1e
BLAKE2b-256 2b8052acc96dae40397361472d8bb469ff2ccdb80928129be5c6a0249f9bfd54

See more details on using hashes here.

File details

Details for the file leo_data_analyser-1.0.0-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for leo_data_analyser-1.0.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 110800e69ef67efeef3bcab7c5ebb3973af8ee37d989ed578535d1966fbf3b97
MD5 f16613af4ecc81ef84146a7d976fae8a
BLAKE2b-256 a5a6261882ec9052ffe50f54ccb1a87b66c502c2592e0e0b247f8a46307a8292

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page