Generate Latex tables in a .sty file from CSV files

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

Describe and optimize data

This API and command line program describes data in tables with metadata and generate LaTeX tables in a .sty file from CSV files. The paths to the CSV files to create tables from and their metadata is given as a YAML configuration file. Paraemters are both files or both directories. When using directories, only files that match *-table.yml are considered. In addition, the described data can be hyperparameter metadata, which can be optimized with the hyperparameter module.

Features:

Associate metadata with each column in a Pandas DataFrame.
DataFrame metadata is used to format LaTeX data and exported to Excel as column header notes.
Data and metadata is viewable in a nice format with paging in a web browser using the Render program.
Usable as an API during data collection for research projects.

Documentation

See the full documentation. The API reference is also available.

Obtaining

The easiest way to install the command line program is via the pip installer:

pip3 install zensols.datdesc

Binaries are also available on pypi.

Usage

First create the table's configuration file. For example, to create a Latex .sty file from the CSV file test-resources/section-id.csv using the first column as the index (makes that column go away) using a variable size and placement, use:

intercodertab:
  path: test-resources/section-id.csv
  caption: >-
    Krippendorff’s ...
  size: VAR
  placement: VAR
  single_column: true
  uses: zentable
  read_kwargs:
    index_col: 0
  write_kwargs:
    disable_numparse: true
  replace_nan: ' '
  blank_columns: [0]
  bold_cells: [[0, 0], [1, 0], [2, 0], [3, 0]]

Some of these fields include:

placement: the placement (i.e. h!), which VAR means to create the command with a variable to use as the first parameter
size: the font size (i.e. small), which VAR means to create the command with a variable to use as the second parameter
index_col: clears column 0 and
bold_cells: make certain cells bold
disable_numparse tells the tabulate module not reformat numbers

See the Table class for a full listing of options.

Hyperparameters

Hyperparameter metadata: access and documentation. This package was designed for the following purposes:

Provide a basic scaffolding to update model hyperparameters such as hyperopt.
Generate LaTeX tables of the hyperparamers and their descriptions for academic papers.

Access to the hyperparameters via the API is done by calling the set or model levels with a dotted path notation string. For example, svm.C first navigates to model svm, then to the hyperparameter named C.

A command line access to create LaTeX tables from the hyperparameter definitions is available with the hyper action. An example of a hyperparameter set (a grouping of models that in turn have hyperparameters) follows:

svm:
  doc: 'support vector machine'
  params:
    kernel:
      type: choice
      choices: [radial, linear]
      doc: 'maps the observations into some feature space'
    C:
      type: float
      doc: 'regularization parameter'
    max_iter:
      type: int
      doc: 'number of iterations'
      value: 20
      interval: [1, 30]

In the example, the svm model has hyperparameters kernel, C and max_iter. The kernel type is set as a choice, which is a string that has the constraints of matching a string in the list. The C hyperparameter is a floating point number, and the max_iter is an integer that must be between 1 and 30.

In this next example, the k_means model uses the string k-means in human readable documentation, which can be Python generated code in a dataclass.

k_means:
  desc: k-means
  doc: 'k-means clustering'
  params:
    n_clusters:
      type: int
      doc: 'number of clusters'
    copy_x:
      type: bool
      value: True
      doc: 'When pre-computing distances it is more numerically accurate to center the data first'
    strata:
      type: list
      doc: 'An array of stratified hyperparameters (made up for test cases).'
      value: [1, 2]
    kwargs:
      type: dict
      doc: 'Model keyword arguments (made up for test cases).'
      value:
        learning_rate: 0.01
        epochs: 3

Changelog

An extensive changelog is available here.

Community

Please star this repository and let me know how and where you use this API. Contributions as pull requests, feedback and any input is welcome.

License

MIT License

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

This version

0.2.2

Mar 5, 2024

0.2.1

Dec 29, 2023

0.2.0

Dec 5, 2023

0.1.1

Nov 30, 2023

0.1.0

Aug 16, 2023

0.0.2

Jun 10, 2023

0.0.1

Jun 7, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

zensols.datdesc-0.2.2-py3-none-any.whl (35.2 kB view hashes)

Uploaded Mar 5, 2024 Python 3

Hashes for zensols.datdesc-0.2.2-py3-none-any.whl

Hashes for zensols.datdesc-0.2.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f83298c5548253dec19c3132c4ebf78231403d893e0d03cf2dd1732afb844d10`
MD5	`ce923b69b8aca40b60735098a68b8203`
BLAKE2b-256	`bfb46fa886289d5e5f9b25aea5a0a66ae66477ce6138a6d4c790dbb5a43bde6e`