Skip to main content

skimpy

Project description

Skimpy

A light weight tool for creating summary statistics from dataframes. png

PyPI Status Python Version License Ruff pre-commit Tests Codecov Read the documentation at https://aeturrell.github.io/skimpy/ Downloads

Linux macOS Windows Source

skimpy is a light weight tool that provides summary statistics about variables in pandas or Polars data frames within the console or your interactive Python window.

Think of it as a super-charged version of pandas' df.describe(). You can find the documentation here.

Quickstart

skim a pandas or polars dataframe and produce summary statistics within the console using:

from skimpy import skim

skim(df)

where df is a pandas or polars dataframe.

If you need to a dataset to try skimpy out on, you can use the built-in test Pandas data frame:

from skimpy import generate_test_data, skim

df = generate_test_data()
skim(df)
╭──────────────────────────────────────────────── skimpy summary ─────────────────────────────────────────────────╮
│          Data Summary                Data Types               Categories                                        │
│ ┏━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┓ ┏━━━━━━━━━━━━━┳━━━━━━━┓ ┏━━━━━━━━━━━━━━━━━━━━━━━┓                                │
│ ┃ Dataframe          Values ┃ ┃ Column Type  Count ┃ ┃ Categorical Variables ┃                                │
│ ┡━━━━━━━━━━━━━━━━━━━╇━━━━━━━━┩ ┡━━━━━━━━━━━━━╇━━━━━━━┩ ┡━━━━━━━━━━━━━━━━━━━━━━━┩                                │
│ │ Number of rows    │ 1000   │ │ float64     │ 3     │ │ class                 │                                │
│ │ Number of columns │ 13     │ │ category    │ 2     │ │ location              │                                │
│ └───────────────────┴────────┘ │ datetime64  │ 2     │ └───────────────────────┘                                │
│                                │ object      │ 2     │                                                          │
│                                │ int64       │ 1     │                                                          │
│                                │ bool        │ 1     │                                                          │
│                                │ string      │ 1     │                                                          │
│                                │ timedelta64 │ 1     │                                                          │
│                                └─────────────┴───────┘                                                          │
│                                                     number                                                      │
│ ┏━━━━━━━━━┳━━━━━━┳━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┳━━━━━━━━┓  │
│ ┃ column   NA    NA %   mean       sd       p0          p25      p50         p75     p100   hist   ┃  │
│ ┡━━━━━━━━━╇━━━━━━╇━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━╇━━━━━━━━┩  │
│ │ length    0    0   0.5016 0.3597 1.573e-06  0.134    0.49760.8602    1█▃▃▃▄█ │  │
│ │ width     0    0    2.037  1.929  0.002057  0.603     1.468 2.95313.91 █▃▁   │  │
│ │ depth     0    0    10.02  3.208         2      8        10    12   20▁▄█▆▃▁ │  │
│ │ rnd     118 11.8 -0.01977  1.002    -2.809-0.7355-0.00077360.66393.717▁▄█▅▁  │  │
│ └─────────┴──────┴───────┴───────────┴─────────┴────────────┴─────────┴────────────┴────────┴───────┴────────┘  │
│                                                    category                                                     │
│ ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┓  │
│ ┃ column                       NA          NA %             ordered                  unique              ┃  │
│ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━┩  │
│ │ class                               0              0False                                    2 │  │
│ │ location                            1            0.1False                                    5 │  │
│ └─────────────────────────────┴────────────┴─────────────────┴─────────────────────────┴─────────────────────┘  │
│                                                      bool                                                       │
│ ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┓  │
│ ┃ column                           true              true rate                       hist                 ┃  │
│ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━┩  │
│ │ booly_col                                   516                          0.52       █    █        │  │
│ └─────────────────────────────────┴──────────────────┴────────────────────────────────┴──────────────────────┘  │
│                                                    datetime                                                     │
│ ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓  │
│ ┃ column                        NA     NA %      first               last               frequency       ┃  │
│ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩  │
│ │ datetime                        0       0    2018-01-31       2101-04-30    ME              │  │
│ │ datetime_no_freq                3     0.3    1992-01-05       2023-03-04    None            │  │
│ └──────────────────────────────┴───────┴──────────┴────────────────────┴───────────────────┴─────────────────┘  │
│                                            <class 'datetime.date'>                                              │
│ ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┓  │
│ ┃ column                             NA     NA %     first             last              frequency      ┃  │
│ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━┩  │
│ │ datetime.date                        0      02018-01-31      2101-04-30      ME             │  │
│ │ datetime.date_no_freq                0      01992-01-05      2023-03-04      None           │  │
│ └───────────────────────────────────┴───────┴─────────┴──────────────────┴──────────────────┴────────────────┘  │
│                                                  timedelta64                                                    │
│ ┏━━━━━━━━━━━━━━━━┳━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┓  │
│ ┃ column          NA    NA %     mean                    median                  max                    ┃  │
│ ┡━━━━━━━━━━━━━━━━╇━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━┩  │
│ │ time diff        5    0.5       8 days 00:05:47       0 days 00:00:00      26 days 00:00:00 │  │
│ └────────────────┴──────┴─────────┴────────────────────────┴────────────────────────┴────────────────────────┘  │
│                                                     string                                                      │
│ ┏━━━━━━━━┳━━━━┳━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━┓  │
│ ┃                                                                 chars per   words per  total      ┃  │
│ ┃ column  NA  NA %  shortest    longest    min         max        row         row        words      ┃  │
│ ┡━━━━━━━━╇━━━━╇━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━┩  │
│ │ text   6 0.6How are   Indeed,  How are   What           31.1      5.8      5761 │  │
│ │        │    │      │ you?      it was   you?      weather!  │            │           │            │  │
│ │        │    │      │            │ the most  │            │           │            │           │            │  │
│ │        │    │      │            │ outrageou │            │           │            │           │            │  │
│ │        │    │      │            │ sly       │            │           │            │           │            │  │
│ │        │    │      │            │ pompous   │            │           │            │           │            │  │
│ │        │    │      │            │ cat I     │            │           │            │           │            │  │
│ │        │    │      │            │ have ever │            │           │            │           │            │  │
│ │        │    │      │            │ seen.     │            │           │            │           │            │  │
│ └────────┴────┴──────┴────────────┴───────────┴────────────┴───────────┴────────────┴───────────┴────────────┘  │
│                                                     object                                                      │
│ ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓  │
│ ┃ column                                                                    NA           NA %              ┃  │
│ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩  │
│ │ datetime.date                                                                     0                0 │  │
│ │ datetime.date_no_freq                                                             0                0 │  │
│ └──────────────────────────────────────────────────────────────────────────┴─────────────┴───────────────────┘  │
╰────────────────────────────────────────────────────── End ──────────────────────────────────────────────────────╯

It is recommended that you set your datatypes before using skimpy (for example converting any text columns to pandas string datatype), as this will produce richer statistical summaries. However, the skim() function will try and guess what the datatypes of your columns are.

Requirements

You can find a full list of requirements in the pyproject.toml file.

You can try this package out right now in your browser using this Google Colab notebook (requires a Google account). Note that the Google Colab notebook uses the latest package released on PyPI (rather than the development release).

Installation

You can install the latest release of skimpy via pip from PyPI:

$ pip install skimpy

To install the development version from git, use:

$ pip install git+https://github.com/aeturrell/skimpy.git

For development, see contributing.

License

Distributed under the terms of the MIT license, skimpy is free and open source software.

Issues

If you encounter any problems, please file an issue along with a detailed description.

Credits

This project was generated from @cjolowicz's Hypermodern Python Cookiecutter template.

skimpy was inspired by the R package skimr and by exploratory Python packages including ydata_profiling and dataprep, from which the clean_columns function comes.

This package would not have been possible without the Rich package.

The package is built with uv, while the documentation is built with Quarto and great-docs (a Python package). Tests are run with nox.

Using skimpy in your paper? Let us know by raising an issue beginning with "citation" and we'll add it to this page.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

skimpy-0.0.21.tar.gz (32.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

skimpy-0.0.21-py3-none-any.whl (18.2 kB view details)

Uploaded Python 3

File details

Details for the file skimpy-0.0.21.tar.gz.

File metadata

  • Download URL: skimpy-0.0.21.tar.gz
  • Upload date:
  • Size: 32.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for skimpy-0.0.21.tar.gz
Algorithm Hash digest
SHA256 1240ed2c592d19404ee9ee976ec0ec1b6dacd3e6c6a4aa34b60de98790bc2503
MD5 790ff9292256aef2bc227fd74803a3c5
BLAKE2b-256 4d279e32392036ba41c166465557b778241b0c65d83eaf3099a7db86a8ae92ec

See more details on using hashes here.

Provenance

The following attestation bundles were made for skimpy-0.0.21.tar.gz:

Publisher: release.yml on aeturrell/skimpy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file skimpy-0.0.21-py3-none-any.whl.

File metadata

  • Download URL: skimpy-0.0.21-py3-none-any.whl
  • Upload date:
  • Size: 18.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for skimpy-0.0.21-py3-none-any.whl
Algorithm Hash digest
SHA256 2caf100f9569cb14df1a422889c0b20ea5d88447097823d7d87cd1e439a4dad6
MD5 5e1ae93995c43c430512bfcf1edd2f6e
BLAKE2b-256 96b48a4230c1f4a2fc7766f055b8fb625f8c5080960e44561a879b9d2b071198

See more details on using hashes here.

Provenance

The following attestation bundles were made for skimpy-0.0.21-py3-none-any.whl:

Publisher: release.yml on aeturrell/skimpy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page