rostaing-report

Generate comprehensive EDA and statistical reports from Pandas DataFrames with a single line of code.

These details have not been verified by PyPI

Project links

Intended Audience
- Developers
- Science/Research
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3
Topic
- Scientific/Engineering

Project description

Rostaing Report, created by Davila Rostaing.

rostaing-report is a powerful yet easy-to-use Python package designed to dramatically accelerate the Exploratory Data Analysis (EDA) process. In just one line of code, it generates a complete and beautifully formatted report from a Pandas DataFrame, covering everything from descriptive statistics to key inferential tests.

This toolkit is built for Data Scientists and Data Analysts who need to gain a deep, initial understanding of their data quickly and efficiently. By providing a holistic view of variable types, distributions, missing values, outliers, and correlations, rostaing-report empowers you to make informed, data-driven decisions about feature engineering, modeling strategy, and data cleaning priorities.

Key Features

📊 Detailed Overview: Get a bird's-eye view of your dataset, including row/column counts, memory usage, duplicate rows, and a clear breakdown of variable types.
🔢 In-depth Numerical Analysis: For each numerical column, instantly see statistics like mean, standard deviation, quantiles, variance, skewness, kurtosis, standard error, and outlier detection.
🔠 Insightful Categorical Analysis: Understand your categorical variables with counts, unique values, top occurrences, and frequencies.
🔗 Smart Correlation Analysis: Instead of a giant matrix, view a clean, sorted table of the most significant variable correlations, complete with a plain-English interpretation (e.g., "Strong Positive Correlation").
🧪 Built-in Statistical Tests: Perform common inferential statistics tests directly from your EDA object, including:
- Normality Tests (Shapiro-Wilk, Jarque-Bera, D'Agostino & Pearson)
- Goodness-of-fit Test (Kolmogorov-Smirnov) to check if data follows a specific distribution.
- Independence Test (Chi-squared)
- Group Comparison Tests (T-test, Mann-Whitney U, Kruskal-Wallis)
✨ Beautiful & Flexible Display: The report is automatically rendered as a stylish HTML table in notebooks (Jupyter, VS Code) and as a clean, readable text table in terminals.

Installation

Install the package from PyPI with a single command:

pip install rostaing-report

Quick Start

Getting a full data profile is as simple as this:

import pandas as pd
import numpy as np
from rostaing import rostaing_report

# 1. Create a sample DataFrame
data = {
    'product_id': range(100),
    'price': np.random.normal(150, 40, 100).round(2),
    'customer_age': np.random.normal(35, 8, 100).astype(int),
    'category': np.random.choice(['Electronics', 'Books', 'Home Goods', 'Apparel'], 100),
    'rating': np.random.choice([1, 2, 3, 4, 5, np.nan], 100, p=[0.05, 0.05, 0.1, 0.3, 0.4, 0.1]),
    'is_member': np.random.choice([True, False], 100)
}
df = pd.DataFrame(data)

# 2. Generate the full EDA report
report = rostaing_report(df)

# 3. Display the report
# In a Jupyter Notebook or similar environment, just run:
# report

# In a standard Python script or terminal, use print():
print(report)

In-Depth Usage

Beyond the main report, you can access powerful statistical methods directly.

The Main Report Breakdown

The rostaing_report(df) object provides several detailed sections:

Overview Statistics: Key metrics about the entire dataset.
Variable Types: A summary table of all data types (int64, float64, object, etc.) and their counts.
Numerical Variables Analysis: A deep dive into each number-based column. The has_outliers column (based on the IQR method) is especially useful for spotting anomalies.
Categorical Variables Analysis: A summary of all text-based, boolean, or categorical columns.
Top Correlations: A sorted list of the most correlated numerical variables, making it easy to spot multicollinearity or interesting relationships. The interpretation column saves you time.

Performing Statistical Tests

Validate your hypotheses directly from the report object.

1. Test for Normality

Check if a variable follows a normal distribution.

# H0: The 'price' data is drawn from a normal distribution.
# Use test='shapiro', 'normaltest', or 'jarque_bera'.
normality_results = report.normality_test('price', test='normaltest')
print(pd.Series(normality_results))

# Output:
# test                       D'Agostino & Pearson's test
# column                                           price
# statistic                                     0.478335
# p_value                                       0.787285
# conclusion (alpha=0.05)    The null hypothesis (normality) cannot be r...
# dtype: object

2. Test for Goodness-of-Fit (Kolmogorov-Smirnov)

Check if your data conforms to a specific theoretical distribution, like the normal distribution ('norm').

# H0: The 'price' data follows a normal ('norm') distribution.
ks_results = report.ks_test('price', dist='norm')
print(pd.Series(ks_results))

# Output:
# test                       Kolmogorov-Smirnov Test
# column                                       price
# distribution_tested                           norm
# statistic                                 0.081123
# p_value                                   0.518872
# conclusion (alpha=0.05)    The data may follow a 'norm' distribution (p...
# dtype: object

3. Test for Independence (Categorical Variables)

Check if two categorical variables are independent.

# H0: 'category' and 'is_member' are independent variables.
chi2_results = report.chi2_test('category', 'is_member')

print(f"P-value: {chi2_results['p_value']:.4f}")
print(f"Conclusion: {chi2_results['conclusion (alpha=0.05)']}")
# Output:
# P-value: 0.8876
# Conclusion: The variables are independent (p >= 0.05).

4. Compare Two Independent Groups (Non-parametric)

Check if the distribution of a numerical variable is the same across two groups. This is useful when your data is not normally distributed.

# H0: The distribution of 'price' is the same for members and non-members.
mw_results = report.mann_whitney_u_test(col='price', group_col='is_member')
print(pd.Series(mw_results))

# Output:
# test                                                 Mann-Whitney U
# compared_variable                                           price
# groups                                               False vs True
# U_statistic                                               1241.0
# p_value                                                   0.963973
# conclusion (alpha=0.05)    No significant difference between distributi...
# dtype: object

Why rostaing-report?

Speed: Go from a raw DataFrame to a full, insightful report in seconds. Drastically reduce the time spent on boilerplate EDA code.
Clarity: The structured output, both in notebooks and terminals, is designed for maximum readability. The plain-English interpretations for correlations help you communicate findings faster.
Completeness: It bridges the gap between descriptive statistics and initial hypothesis testing by bundling both into one cohesive interface.
Better Decision-Making: By quickly identifying potential issues like outliers, high cardinality, skewness, or unexpected correlations, you can make smarter, evidence-backed decisions on how to proceed with your data modeling or business analysis.

Contributing

Contributions are welcome! If you have ideas for new features, find a bug, or want to improve the documentation, please feel free to open an issue or submit a pull request on the project's repository.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Useful Links

Github: https://github.com/Rostaing/rostaing-report
PyPI: https://pypi.org/project/rostaing-report/
LinkedIn: https://www.linkedin.com/in/davila-rostaing/
YouTube: youtube.com/@RostaingAI

Project details

These details have not been verified by PyPI

Project links

Intended Audience
- Developers
- Science/Research
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3
Topic
- Scientific/Engineering

Release history Release notifications | RSS feed

0.2.0

Jul 15, 2025

This version

0.1.2

Jul 12, 2025

0.1.1

Jul 11, 2025

0.1.0

Jul 11, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rostaing_report-0.1.2.tar.gz (12.1 kB view details)

Uploaded Jul 12, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

rostaing_report-0.1.2-py3-none-any.whl (9.7 kB view details)

Uploaded Jul 12, 2025 Python 3

File details

Details for the file rostaing_report-0.1.2.tar.gz.

File metadata

Download URL: rostaing_report-0.1.2.tar.gz
Upload date: Jul 12, 2025
Size: 12.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.5

File hashes

Hashes for rostaing_report-0.1.2.tar.gz
Algorithm	Hash digest
SHA256	`a9f437a3128889d7909cabbdaee51c0ab48d15f89a4ca09c4935b94328deff24`
MD5	`e1629570dcbea4e3914f00ea6729f857`
BLAKE2b-256	`27c43f5c7a18552251c2e9c18de22717d664fcc51e9ccde2e72dcf23a72f2050`

See more details on using hashes here.

File details

Details for the file rostaing_report-0.1.2-py3-none-any.whl.

File metadata

Download URL: rostaing_report-0.1.2-py3-none-any.whl
Upload date: Jul 12, 2025
Size: 9.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.5

File hashes

Hashes for rostaing_report-0.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`417c963cb4eefc987e57d22a46d9b165fc13c705322454fdd0421cdea45e6a5a`
MD5	`682ac332cbf62605faf7d4241bdeb3de`
BLAKE2b-256	`03cbbd315b3e00d421978af584973b3b5b14792fbfe430643fd4b5448f881f52`

See more details on using hashes here.

rostaing-report 0.1.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Rostaing Report, created by Davila Rostaing.

Key Features

Installation

Quick Start

In-Depth Usage

The Main Report Breakdown

Performing Statistical Tests

1. Test for Normality

2. Test for Goodness-of-Fit (Kolmogorov-Smirnov)

3. Test for Independence (Categorical Variables)

4. Compare Two Independent Groups (Non-parametric)

Why rostaing-report?

Contributing

License

Useful Links

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes