Skip to main content

No project description provided

Project description

Tablespam - Simple Tables Made Simple

Lifecycle: experimental

The objective of tablespam is to provide a “good enough” approach to creating tables in python. tablespam is a port of the R package tablespan.

tablespam currently builds on the awesome package great_tables, which allows tables created with tablespam to be exported to the following formats:

Installation

tablespam is available from pip:

pip install tablespam

To install the development version from GitHub, run:

pip install git+https://github.com/jhorzek/tablespam.git#egg=tablespam

Introduction

Python has a large set of great packages that allow you to create and export tables that look exactly like you envisioned. However, sometimes you may just need a good-enough table that is easy to create and share with others. This is where tablespam can be of help.

Let’s assume that we want to share the following table:

import polars as pl

# Create data set inspired by the cars data in R:
cars = pl.DataFrame(
    {
        'mpg': [21.0, 21.0, 22.8, 21.4, 18.7, 18.1, 14.3, 24.4, 22.8, 19.2],
        'cyl': [6, 6, 4, 6, 8, 6, 8, 4, 4, 6],
        'disp': [160.0, 160.0, 108.0, 258.0, 360.0, 225.0, 360.0, 146.7, 140.8, 167.6],
        'hp': [110, 110, 93, 110, 175, 105, 245, 62, 95, 123],
        'drat': [3.90, 3.90, 3.85, 3.08, 3.15, 2.76, 3.21, 4.08, 3.92, 3.92],
        'wt': [2.620, 2.875, 2.320, 3.215, 3.440, 3.460, 3.570, 3.190, 3.150, 3.440],
        'qsec': [16.46, 17.02, 18.61, 19.44, 17.02, 20.22, 15.84, 20.00, 22.90, 18.30],
        'vs': [0, 0, 1, 1, 0, 1, 0, 1, 1, 1],
        'am': [1, 1, 1, 0, 0, 0, 0, 0, 0, 0],
        'gear': [4, 4, 4, 3, 3, 3, 3, 4, 4, 4],
        'carb': [4, 4, 1, 1, 2, 1, 4, 2, 2, 4],
    }
)

summarized_table = cars.group_by(['cyl', 'vs'], 
                                  maintain_order=True).agg(
    [
        pl.len().alias('N'),
        pl.col('hp').mean().alias('mean_hp'),
        pl.col('hp').std().alias('sd_hp'),
        pl.col('wt').mean().alias('mean_wt'),
        pl.col('wt').std().alias('sd_wt'),
    ]
)

print(summarized_table)
shape: (4, 7)
┌─────┬─────┬─────┬────────────┬───────────┬──────────┬──────────┐
│ cyl ┆ vs  ┆ N   ┆ mean_hp    ┆ sd_hp     ┆ mean_wt  ┆ sd_wt    │
│ --- ┆ --- ┆ --- ┆ ---        ┆ ---       ┆ ---      ┆ ---      │
│ i64 ┆ i64 ┆ u32 ┆ f64        ┆ f64       ┆ f64      ┆ f64      │
╞═════╪═════╪═════╪════════════╪═══════════╪══════════╪══════════╡
│ 6   ┆ 0   ┆ 2   ┆ 110.0      ┆ 0.0       ┆ 2.7475   ┆ 0.180312 │
│ 4   ┆ 1   ┆ 3   ┆ 83.333333  ┆ 18.502252 ┆ 2.886667 ┆ 0.491155 │
│ 6   ┆ 1   ┆ 3   ┆ 112.666667 ┆ 9.291573  ┆ 3.371667 ┆ 0.136045 │
│ 8   ┆ 0   ┆ 2   ┆ 210.0      ┆ 49.497475 ┆ 3.505    ┆ 0.091924 │
└─────┴─────┴─────┴────────────┴───────────┴──────────┴──────────┘

Note: tablespam currently only supports polars data frames.

We don’t want to share the table as is - the variable names are all a bit technical and the table could need some spanners summarizing columns. So, we want to share a table that looks something like this:

|                   | Horse Power |   Weight  |
| Cylinder | Engine | Mean  |  SD | Mean | SD |
| -------- | ------ | ----- | --- | ---- | -- |
|                   |                         |

tablespam allows us to create this table with a single formula.

Creating a Basic Table

In tablespam, the table headers are defined with a formula inspired by R. For example, "cyl ~ mean_hp + sd_hp" defines a table with cyl as the row names and mean_hp and sd_hp as columns:

from tablespam import TableSpam
tbl = TableSpam(data = summarized_table,
                formula = "cyl ~ mean_hp + sd_hp")
print(tbl.as_string())
| cyl | mean_hp sd_hp |
| --- - ------- ----- |
| 6   | 110.0   0.0   |
| 4   | 83.33   18.5  |
| 6   | 112.67  9.29  |
| ... | ...     ...   |

Note that the row names (cyl) are in a separate block to the left.

Adding Spanners

Spanners are defined using braces and spanner names. For example, the following defines a spanner for mean_hp and sd_hp with the name Horsepower: "cyl ~ (Horsepower = mean_hp + sd_hp)":

tbl = TableSpam(data = summarized_table,
                formula = "cyl ~ (Horsepower = mean_hp + sd_hp)")
print(tbl.as_string())
|     | Horsepower       |
| cyl | mean_hp    sd_hp |
| --- - ---------- ----- |
| 6   | 110.0      0.0   |
| 4   | 83.33      18.5  |
| 6   | 112.67     9.29  |
| ... | ...        ...   |

Spanners can also be nested:

tbl = TableSpam(data = summarized_table,
                formula = "cyl ~ (Horsepower = (Mean = mean_hp) + (SD  = sd_hp))")
print(tbl.as_string())
|     | Horsepower       |
|     | Mean       SD    |
| cyl | mean_hp    sd_hp |
| --- - ---------- ----- |
| 6   | 110.0      0.0   |
| 4   | 83.33      18.5  |
| 6   | 112.67     9.29  |
| ... | ...        ...   |

Renaming Columns

Variable names in an data frame are often very technical (e.g., mean_hp and sd_hp). When sharing the table, we may want to replace those names. In the example above, we may want to replace mean_hp and sd_hp with “Mean” and “SD”. In tablespam renaming variables is achieved with new_name:old_name. For example, "cyl ~ (Horsepower = Mean:mean_hp + SD:sd_hp)" renames mean_hp to Mean and sd_hp to SD:

tbl = TableSpam(data = summarized_table,
                formula = "cyl ~ (Horsepower = Mean:mean_hp + SD:sd_hp)")
print(tbl.as_string())
|     | Horsepower      |
| cyl | Mean       SD   |
| --- - ---------- ---- |
| 6   | 110.0      0.0  |
| 4   | 83.33      18.5 |
| 6   | 112.67     9.29 |
| ... | ...        ...  |

Creating the Full Table

The combination of row names, spanners, and renaming of variables allows creating the full table:

tbl = TableSpam(data = summarized_table,
                formula = """Cylinder:cyl + Engine:vs ~
                   N +
                   (`Horse Power` = Mean:mean_hp + SD:sd_hp) +
                   (`Weight` = Mean:mean_wt + SD:sd_wt)""",
                 title = "Motor Trend Car Road Tests",
                 subtitle = "A table created with tablespam",
                 footnote = "Data from the infamous mtcars data set.")
print(tbl.as_string())
Motor Trend Car Road Tests
A table created with tablespam

|                 |     Horse Power      Weight      |
| Cylinder Engine | N   Mean        SD   Mean   SD   |
| -------- ------ - --- ----------- ---- ------ ---- |
| 6        0      | 2   110.0       0.0  2.75   0.18 |
| 4        1      | 3   83.33       18.5 2.89   0.49 |
| 6        1      | 3   112.67      9.29 3.37   0.14 |
| ...      ...    | ... ...         ...  ...    ...  |
Data from the infamous mtcars data set.

Exporting to Excel

Note: The screenshots below are from the R-package tablespan and may deviate slightly from the actual output. The screenshots will be updated soon.

Tables created with tablespam can be exported to openpyxl workbooks, which allows saving as .xlsx files.

# Translate to openpyxl:
tbl_xlsx = tbl.as_excel()

# save the table:
# tbl_xlsx.save("my_table.xlsx")

Styling

While tablespam provides limited styling options, some elements can be adjusted. For example, we may want to print some elements in bold or format numbers differently. In tablespam, styling happens when translating the table to an openpyxl workbook with as_excel.

Changing the Overall Look

The easiest way to customize tables is to change the default color scheme. The class XlsxStyles provides control over most elements in the table, but in many cases style_color may be sufficient. The following creates a table with teal-colored backgrounds for the title, header, and row names:

from tablespam.Excel.xlsx_styles import style_color

tbl_xlsx = tbl.as_excel(styles = style_color(primary_color = "008080"))

# save the table:
# tbl_xlsx.save("my_table.xlsx")

Similarly, a dark background can be defined as follows:

tbl_xlsx = tbl.as_excel(styles = style_color(primary_color = "000000"))

# save the table:
# tbl_xlsx.save("my_table.xlsx")

Formatting Cells

Let’s assume we want all mean_hp values with a value $\geq 100$ to be printed in bold. To this end, we first create a function that takes in a single openpyxl cell and applies a style to it:

import openpyxl
def bold(c):
    c.font = openpyxl.styles.Font(bold=True)

Next, we have to define a CellStyle for the column mean_hp, where we pass in the index of the rows that should be bold.

Note: openpyxl uses 1-based indexing!

# get the indices:
geq_100 = (summarized_table.with_row_index(name="index") 
            .filter(pl.col("mean_hp") >= 100) 
            .select("index") 
            .to_series() 
            .to_list())
# translate to 1-based index:
geq_100 = [i + 1 for i in geq_100]

Define cell styles:

from tablespam.Excel.xlsx_styles import XlsxStyles, CellStyle
styles=XlsxStyles(
            cell_styles=[
                CellStyle(
                    rows=geq_100,
                    cols=['mean_hp'],
                    style=bold,
                ),
            ]
        )

Finally, we pass this style to as_excel:

tbl_xlsx = tbl.as_excel(styles=styles)

# save the table:
# tbl_xlsx.save("my_table.xlsx")

Formatting Data Types

tablespan also allows formatting specific data types. Let’s assume that we want to round all doubles to 3 instead of the default 2 digits. To this end, we use DataStyles, where we specify (1) a function that checks for the data type we want to style (here for doubles) and (2) a style for all columns that match that style:

from tablespam.Excel.xlsx_styles import DataStyle
# Define test: The function will be passed a polars data frame and should
# check for the data types defined here: 
# https://docs.pola.rs/api/python/stable/reference/datatypes.html
def is_double(x: pl.DataFrame) -> bool:
    return all([tp in [pl.Float32, pl.Float64] for tp in x.dtypes])

# Function that applies our style to a single cell:
def three_digits(c):
    c.number_format = "0.000"

# Now we define the data_style. The data_style must be a 
# dict.
data_style = {
    "double": DataStyle(test = is_double,
                         style = three_digits) 
} 
tbl_xlsx = tbl.as_excel(
        styles=XlsxStyles(
            data_styles=data_style
        )
    )
# tbl_xlsx.save("my_table.xlsx")

Exporting to HTML, LaTeX, and RTF

Tables created with tablespam can be exported to great_tables which allows saving as HTML, LaTeX, or RTF file. To this end, we simply have to call as_gt on our table:

# Translate to gt:
gt_tbl = tbl.as_gt()
gt_tbl.show()

Standard table

Styling Great Tables

The great_tables package provides a wide range of functions to adapt the style of the table created with as_gt. For instance, opt_stylize adds a pre-defined style to the entire table:

(gt_tbl
  .opt_stylize(style = 6,
               color = 'gray')
  .show())

Styled table

When adapting the great_tables object, there is an important detail to keep in mind: To ensure that each table spanner has a unique ID, tablespam will create IDs that differ from the text shown in the spanner. To demonstrate this, Let’s assume that we want to add a spanner above Horse Power and Weight:

(gt_tbl
  .tab_spanner(label = "New Spanner", 
               spanners = ["Horse Power", "Weight"]))
AssertionError: 
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
Cell In[16], line 2
      1 (gt_tbl
----> 2   .tab_spanner(label = "New Spanner", 
      3                spanners = ["Horse Power", "Weight"]))

File ~/Documents/Programming/Python/tablespam/.venv/lib/python3.13/site-packages/great_tables/_spanners.py:165, in tab_spanner(self, label, columns, spanners, level, id, gather, replace)
    161 # select spanner ids ----
    162 # TODO: this supports tidyselect
    163 # TODO: could we use something like resolve_vector_l
    164 if spanners is not None:
--> 165     assert set(spanners).issubset(set(crnt_spanner_ids))
    166     spanner_ids = spanners
    167 else:

AssertionError: 

This will throw an error because the spanner IDs are different from the spanner labels.

In general, the ID of a spanner is given by a concatenation of “BASE_LEVEL” and the names of all spanners above the current spanner. For example, the IDs for Horse Power and Weight are “__BASE_LEVEL__Horse Power” and “__BASE_LEVEL__Weight”:

(gt_tbl
  .tab_spanner(label = "New Spanner", 
                  spanners = ["__BASE_LEVEL__Horse Power", 
                               "__BASE_LEVEL__Weight"])
  .show())

Table with additional spanner

Tables without row names

Using 1 on the left hand side of the formula creates a table without row names. For example, "1 ~ (Horsepower = Mean:mean_hp + SD:sd_hp)" defines

tbl = TableSpam(data = summarized_table,
                formula = "1 ~ (Horsepower = Mean:mean_hp + SD:sd_hp)")
print(tbl.as_string())
| Horsepower      |
| Mean       SD   |
| ---------- ---- |
| 110.0      0.0  |
| 83.33      18.5 |
| 112.67     9.29 |
| ...        ...  |

References

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tablespam-0.1.1.tar.gz (46.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tablespam-0.1.1-py3-none-any.whl (49.8 kB view details)

Uploaded Python 3

File details

Details for the file tablespam-0.1.1.tar.gz.

File metadata

  • Download URL: tablespam-0.1.1.tar.gz
  • Upload date:
  • Size: 46.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.0.1 CPython/3.13.0 Darwin/24.2.0

File hashes

Hashes for tablespam-0.1.1.tar.gz
Algorithm Hash digest
SHA256 5cfd8bca4d944f60e363e1d30ca1aaf460d8f788dc2b58fcb923c9ca24003a79
MD5 4f190303e22d47fa71b201ab2bc354fb
BLAKE2b-256 9feae610a6d3d0b27bdd65934c2ea6924714701a2ec229da852a06244f8af065

See more details on using hashes here.

File details

Details for the file tablespam-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: tablespam-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 49.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.0.1 CPython/3.13.0 Darwin/24.2.0

File hashes

Hashes for tablespam-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 1bcbdcdc09dec9944fcd81f94a8674c95fa7ba8e94e0f6812069466a66344bb9
MD5 ce18f3948db4e1bc4181deb52ff2dfbe
BLAKE2b-256 f3c4588e359fa284f5e9b8ef587b9f2cb21de0b83a39e59e39197f9fcd583715

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page