This API and command line program describes data in tables with metadata and generate LaTeX tables in a `.sty` file from CSV files.
Project description
Describe and optimize data
In this package, Pythonic objects are used to easily (un)serialize to create LaTeX tables, figures and Excel files. The API and command-line program describes data in tables with metadata and using YAML and CSV files and integrates with Pandas. The paths to the CSV files to create tables from and their metadata is given as a YAML configuration file.
Features:
- Create LaTeX tables (with captions) and Excel files (with notes) of tabular metadata from CSV files.
- Create LaTeX friendly encapsulated postscript (
.eps) files from CSV files. - Data and metadata is viewable in a nice format with paging in a web browser using the Render program.
- Usable as an API during data collection for research projects.
Table of Contents
Documentation
See the full documentation. The API reference is also available.
Obtaining
The library can be installed with pip from the pypi repository:
pip3 install zensols.datdesc
Binaries are also available on pypi.
Usage
The library can be used as a Python API to programmatically create tables,
figures, and/or represent tabular data. However, it also has a very robust
command-line that is intended by be used by GNU make. The command-line can
be used to create on the fly LaTeX .sty files that are generated as commands
and figures are generated as Encapsulated Postscript (.eps) files.
The YAML file format is used to create both tables and figures. Parameters are
both files or both directories when using directories, only files that match
*-table.yml are considered on the command line.
Tables
First create the table's configuration file. For example, to create a Latex
.sty file from the CSV file test-resources/section-id.csv using the first
column as the index (makes that column go away) using a variable size and
placement, use:
intercodertab:
type: slack
slack_col: 0
single_column: true
path: some-path/some-file.csv
caption: >-
A caption ...
column_keeps:
- dataset
- split
- count
- portion
column_renames:
dataset: Dataset
split: Split
count: Count
portion: Portion
read_params:
index_col: 0
make_percent_column_names:
portion: 0
format_thousands_column_names:
count: null
tabulate_params:
disable_numparse: true
replace_nan: ' '
blank_columns: [0]
bold_cells: [[0, 0], [1, 0], [2, 0], [3, 0]]
Some of these fields include:
- make_percent_column_names: columns to make percents with decimal points
- format_thousands_column_names: columns to add commas and decimals points
- index_col: clears column 0 and
- bold_cells: make certain cells bold
- disable_numparse tells the
tabulatemodule not reformat numbers
See the Table class for a full listing of options.
Figures
Figures can be generated in any format supported by matplotlib (namely
.eps, .svg, and .pdf). Figures are configured in a very similar fashion
to tables. The configuration also points to a CSV file, but
describes the plot.
The primary difference is that the YAML is parsed using the Zensols parsing
rules so the string path: target will be given to a new Plot instance as a
pathlib.Path.
A bar plot is configured below:
irisFig:
image_dir: 'path: target'
seaborn:
style:
style: darkgrid
rc:
axes.facecolor: 'str: .9'
context:
context: 'paper'
font_scale: 1.3
plots:
- type: bar
data: 'dataframe: test-resources/fig/iris.csv'
title: 'Iris Splits'
x_column_name: ds_type
y_column_name: count
core_pre: |
plot.data = plot.data.groupby('ds_type').agg({'ds_type': 'count'}).\
rename(columns={'ds_type': 'count'}).reset_index()
This configuration meaning:
- The top level
irisFigcreates a Figure instance, and when used with the command line, outputs this root level string as the name in theimage_dirdirectory. - The
image_dirtells where to write the image. This should be left out when invoking from the command-line to allow it to decide where to write the file. - The
seabornsection configures the seaborn module. - The plots are a list of Plot instances that, like the Figure level, are populated with all the values.
- The
code_pre(optionally) allows the massaging of the plot (bound to variabledata) and/or Pandas dataframe accessible withplot.dataframewith all other properties and attributes.
If code_post is given, it is called after the plot is created and accessible
with variable plot. If code_post_render it is executed after the plot is
rendered by matplotlib.
Other plot configuration examples are given in the test cases directory. See the Figure and Plot classes for a full listing of options.
Changelog
An extensive changelog is available here.
Community
Please star this repository and let me know how and where you use this API. Contributions as pull requests, feedback, and any input is welcome.
License
Copyright (c) 2023 - 2026 Paul Landes
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file zensols_datdesc-1.4.6-py3-none-any.whl.
File metadata
- Download URL: zensols_datdesc-1.4.6-py3-none-any.whl
- Upload date:
- Size: 64.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a178955c5aafec1eeee2d31986f9b8868a2cfa6884b59fe2666def55c9e1d795
|
|
| MD5 |
23865e39a4864259635b05c9bf969d53
|
|
| BLAKE2b-256 |
ab1e468e441d35d1afb4c62495f4d8e22cc74fdf0fda790b9b36b8346716b63c
|