Skip to main content

Customizable data and model summaries in Python.

Project description

statsframe

PyPI - Version PyPI - Python Version License Repo Status


Customizable data and model summaries in Python.

statsframe creates tables that provide descriptive statistics of numeric and categorical data.

The goal is to provide a simple -- yet customizable -- way to summarize data and models in Python.

statsframe is heavily inspired by modelsummary in R. The goal is not to replicate all that modelsummary does, but to provide a way of achieving similar results in Python.

In order to achieve this, statsframe builds on the polars library to produce tables that can be easily customized and exported to other formats.

Basic Usage

As an example of statsframe usage, the skim_frame function provides a summary of a DataFrame (either polars.DataFrame or pandas.DataFrame). The default summary statistics returned by statsframe.skim_frame() are unique values, percentage missing, mean, standard deviation, minimum, median, and maximum.

Where possible, statsframe will print a table to the console and return a polars DataFrame with the summary statistics. This allows for easy customization. For example, the polars.DataFrame with statistics from statsframe can be modified using the Great Tables package.

import polars as pl
import statsframe as sf

df = (
        pl.read_csv("https://vincentarelbundock.github.io/Rdatasets/csv/datasets/mtcars.csv")
          .drop("rownames")
    )

stats = sf.skim_frame(df)

Summary Statistics
Rows: 32, Columns: 11
┌──────┬────────────┬─────────────┬───────┬───────┬──────┬────────┬───────┐
       Unique (#) ┆ Missing (%) ┆  Mean ┆    SD ┆  Min ┆ Median ┆   Max │
╞══════╪════════════╪═════════════╪═══════╪═══════╪══════╪════════╪═══════╡
  mpg          25          0.0   20.1    6.0  10.4    19.2   33.9 
  cyl           3          0.0    6.2    1.8   4.0     6.0    8.0 
 disp          27          0.0  230.7  123.9  71.1   196.3  472.0 
   hp          22          0.0  146.7   68.6  52.0   123.0  335.0 
 drat          22          0.0    3.6    0.5   2.8     3.7    4.9 
   wt          29          0.0    3.2    1.0   1.5     3.3    5.4 
 qsec          30          0.0   17.8    1.8  14.5    17.7   22.9 
   vs           2          0.0    0.4    0.5   0.0     0.0    1.0 
   am           2          0.0    0.4    0.5   0.0     0.0    1.0 
 gear           3          0.0    3.7    0.7   3.0     4.0    5.0 
 carb           6          0.0    2.8    1.6   1.0     2.0    8.0 
└──────┴────────────┴─────────────┴───────┴───────┴──────┴────────┴───────┘

We can achieve the same result above with a pandas DataFrame.

import pandas as pd
import statsframe as sf

trees_df = pd.read_csv(
    "https://vincentarelbundock.github.io/Rdatasets/csv/datasets/trees.csv"
).drop(columns=["rownames"])

trees_stats = sf.skim_frame(trees_df)

Summary Statistics
Rows: 31, Columns: 3
┌────────┬────────────┬─────────────┬──────┬──────┬──────┬────────┬──────┐
         Unique (#) ┆ Missing (%) ┆ Mean ┆   SD ┆  Min ┆ Median ┆  Max │
╞════════╪════════════╪═════════════╪══════╪══════╪══════╪════════╪══════╡
  Girth          27          0.0  13.2   3.1   8.3    12.9  20.6 
 Height          21          0.0  76.0   6.4  63.0    76.0  87.0 
 Volume          30          0.0  30.2  16.4  10.2    24.2  77.0 
└────────┴────────────┴─────────────┴──────┴──────┴──────┴────────┴──────┘

Contributing

If you encounter a bug, have usage questions, or want to share ideas to make the statsframe package more useful, please feel free to file an issue.

Code of Conduct

Please note that the statsframe project is released with a contributor code of conduct.

By participating in this project you agree to abide by its terms.

License

statsframe is licensed under the MIT license.

Governance

This project is primarily maintained by Niall Keleher. Contributions from other authors is welcome.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

statsframe-0.0.3.tar.gz (12.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

statsframe-0.0.3-py3-none-any.whl (12.8 kB view details)

Uploaded Python 3

File details

Details for the file statsframe-0.0.3.tar.gz.

File metadata

  • Download URL: statsframe-0.0.3.tar.gz
  • Upload date:
  • Size: 12.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.7.1 CPython/3.12.1 Linux/6.1.60-08594-g03a802b9a072

File hashes

Hashes for statsframe-0.0.3.tar.gz
Algorithm Hash digest
SHA256 67ae47ca693990100065341c9aeb0b6c5fe2c8e302b47fc3ddee6336d5b548d1
MD5 bbd2dc0a2883461e9cdcf023c93eee59
BLAKE2b-256 75dae8cc6588b05b3e882f30fdb4bbadfe6516da37a619e0a9b2e3f507e8191f

See more details on using hashes here.

File details

Details for the file statsframe-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: statsframe-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 12.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.7.1 CPython/3.12.1 Linux/6.1.60-08594-g03a802b9a072

File hashes

Hashes for statsframe-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 d46d2c25c38d71de44d842402e98db762f9f6a70f4b1ebd09ca07511ff535991
MD5 022d21d3d16f243ce4f22066fa791877
BLAKE2b-256 6eafc125a57702b976eb2419d9d2f3427c92c50667ddbe737a5965be6f437bd7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page