Skip to main content

Human-friendly, VCS-friendly file format for Python Pandas and Polars DataFrames.

Project description

dftxt

A Python library for a simple DataFrame text file format that facilitates easier specification of a Pandas and Polars DataFrame in a human-readable text format for use in testing and where source data is small and human managed. By, example here's what the format would look like:

Name      Planet    Numeral  Mean Radius (km)  Discovery Year  Discoverer
          &&cat              &&float           &&Int
Moon      Earth     I        1738              None            None
Phobos    Mars      I        11.267            1877            Hall
Deimos    Mars      II       6.2               1877            Hall
Io        Jupiter   I        1821              1610            Galileo
Europa    Jupiter   II       1560              1610            Galileo
Ganymede  Jupiter   III      2634              1610            Galileo
Callisto  Jupiter   IV       2410              1610            Galileo
Amalthea  Jupiter   V        83.5              1892            Barnard
Himalia   Jupiter   VI       69.8              1904            Perrine
Mimas     Saturn    I        198.2             1789            Herschel

This is a fixed-width file format that uses two+ spaces separating column names to define the width of each column.

The benefits of the format are:

1. Preserves DataFrame Structure

Most importantly, this format retains the necessary information to reload the DataFrame in an identical fashion as the file was specified. This includes data types, column ordering, and indexing (Pandas only as Polars has no index). In testing, it should be possible to use (pandas|polars).testing.assert_frame_equal() on a loaded DataFrame without any transformation when read from a file.

For example,

sku           price_usd  originally_released_on  product_name
&&int_index   &decimal   &date
109456        119.99     2023-07-09              Fancy Socks
450213        24.49      2020-11-12              Simple Socks
90210         299.99     1998-03-28              LA Heartthrob Socks

2. Human Friendly

The format is easy to read and modify by humans and requires little to no machine characters in its specification. Whitespace is used as the delimiter - specifically 2+ spaces between column names - which also serves to align columns for easy readability. Quoting and escaping are rarely needed as a result.

3. Diff/Code Review Friendly

TODO

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dftxt-1.0.1.tar.gz (37.5 kB view hashes)

Uploaded Source

Built Distribution

dftxt-1.0.1-py3-none-any.whl (71.3 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page