Skip to main content

A package that provides quick summaries of datasets, including data types, missing value counts, and basic statistics.

Project description

summarease

Documentation Status Python 3.11 codecov ci-cd

Project Summary

Summarease is a package designed to provide quick insights into a dataset by summarizing its key features. It offers functions that help users understand the structure of the data, making it easier to plan data cleaning and exploratory data analysis (EDA) tasks.

Package Features

  • summarize_dtypes:
    Summarize the data types in the dataset.

  • summarize_target:
    Summarize and evaluate the target variable for categorical or numerical types. Generate a summary or proportion table for numerical or categorical target. Generate a visualization for categorical balance check.

  • summarize_numeric:
    Summarize the numeric variables in the dataset by providing the summary statistics (e.g., mean, standard deviation, min, max, etc.) for each numeric column or plotting the correlation heatmap to visualize the relationships between numeric variables. Generate density plots for each numeric column in the provided dataset. Generate a correlation heatmap for the specified numeric columns in a dataset.

  • summarize:
    Summarize generates a comprehensive PDF report for a dataset, including statistical summaries, visualizations, and target variable analysis. It supports customizable options like sample observations, automatic data cleaning, and flexible summarization methods (tables, plots, or both). Perfect for automating exploratory data analysis (EDA).

Fit Within Python Ecosystem

Summarease is a lightweight and compact Python package designed for efficiency and ease of use. Despite its simplicity, it offers users great flexibility to customize the output format, whether through detailed tables or insightful visualizations.

Why Choose Summarease?

There are several related Python packages with similar functionalities that offer dataset summarization, such as:

  • ydata-profiling – Generates a detailed HTML report but can be slow for large datasets.
  • sweetviz – Provides comparative EDA reports, but lacks customization options for PDF output.
  • dtale – Offers interactive dashboards, but may not be suitable for quick, static reports.

summarease stands out because:

Lightweight & Fast – Summarization and reporting are optimized for performance.
Customizable Reports – Users can configure tables, plots, and formats to match reporting needs.
PDF Export Support – Unlike sweetviz and dtale, summarease directly generates PDF reports.

Installation

$ pip install summarease

To install the development version from git, use:

$ pip install git+https://github.com/UBC-MDS/summarease.git

Documentation

Package documentation can be found here.

Usage

First, import the summarize function from summarease.summarize module.

from summarease.summarize import summarize

Next depending on the way you want summarize your datasets (whether using tables or plots) you can run the following commands:

For generating a report using plots:

The below code will generate a report that contains dominantly plots describing the numeric columns, target variable, correlation heatmap and a table summarizing the data types included in the data. It is intended as a reference to the syntax of our function. For more information, including a walkthrough on how to load the dataset, please see the Example usage section in the docs for the Summarize function.

summarize(
    dataset=iris_df, 
    dataset_name="Iris Dataset Summary", 
    description="Iris Dataset can be found on the UCI Machine Learning Repository",
    summarize_by="plot",
    target_variable="target",
    target_type="categorical",
    output_file="iris_summary.pdf",
    output_dir="./dataset_summary/"
)

For generating a report using tables:

The below code will generate a report that contains tables describing the numeric columns, target variable and data types.

summarize(
    dataset=iris_df, 
    dataset_name="Iris Dataset Summary", 
    description="Iris Dataset can be found on the UCI Machine Learning Repository",
    summarize_by="table",
    target_variable="target",
    target_type="categorical",
    output_file="iris_summary.pdf",
    output_dir="./dataset_summary/"
)

To get in-depth idea of the function you can always run the following code:

help(summarize)

If you find an error or inconsistency, please refer to the Contributing header.

Running tests

You can always run the tests to see if the package works as expected. Before doing that, ensure that you have cloned the repository as described in the Installation section and pytest is installed.

pip install pytest

Navigate to the root directory of the package and run:

pytest

You can also get the coverage score by running the following command:

pytest --cov=summarease

Contributing

Interested in contributing? Check out the contributing guidelines.

Please note that this project is released with a Code of Conduct. By contributing to this project, you agree to abide by its terms.

License

summarease is licensed under the terms of the MIT license.

Contributors

summarease was created by Hrayr Muradyan, Yun Zhou, Stephanie Wu, and Zuer Zhong.

Credits

summarease was created with cookiecutter and the py-pkgs-cookiecutter template.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

summarease-1.1.4.tar.gz (14.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

summarease-1.1.4-py3-none-any.whl (14.4 kB view details)

Uploaded Python 3

File details

Details for the file summarease-1.1.4.tar.gz.

File metadata

  • Download URL: summarease-1.1.4.tar.gz
  • Upload date:
  • Size: 14.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for summarease-1.1.4.tar.gz
Algorithm Hash digest
SHA256 c68f0d7a0c72714fe6cb706cc5cc3d012cb44d639e6db25a011fd4ca24e6ba6d
MD5 05dcfa8789c65fcb261ca38c3beffee7
BLAKE2b-256 4f613531e0dfb97d23a7a0763dd450033064583918e24eb8f6187d55c35c6572

See more details on using hashes here.

File details

Details for the file summarease-1.1.4-py3-none-any.whl.

File metadata

  • Download URL: summarease-1.1.4-py3-none-any.whl
  • Upload date:
  • Size: 14.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for summarease-1.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 8fe0d84b19c1704a6240e99285e43f5c0095ffc0b212dce2f5a621746bc5b1c9
MD5 8efed623e4c9f12032e045f9140ed350
BLAKE2b-256 87435803121af78a284927ea6b01a5c93051d533e56337c2f3c5c78b411b68a2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page