Skip to main content

Subpackage of utilities for extended CSV (XCSV) files

Project description

xcsv-utils

xcsv-utils is a subpackage of xcsv. It's main purpose is to provide utilities for working with extended CSV (XCSV) files.

Install

The package can be installed from PyPI:

$ pip install xcsv-utils

Using the package

XCSV data can be printed directly, as the data table is a pandas table:

>>> content.data
     time (year) [a]  depth (m)
0               2012      0.575
1               2011      1.125
2               2010      2.225
3               2009      2.825
4               2008      3.508
..               ...        ...
387             1625    132.240
388             1624    132.426
389             1623    132.680
390             1622    132.880
391             1621    133.180

[392 rows x 2 columns]

But an XCSV object usually contains an extended header section as well as the data. Pandas doesn't handle this header, but the XCSV object does. In addition, the XCSV object parses the data table column headers and makes their content machine-readable.

The xcsv-utils subpackage provides simple access and visual inspection of the attributes of an XCSV object - the metadata (header and column_headers) and the data. The main class is Print, which provides a formatted, themed view of an XCSV object, and can pretty-print an XCSV file directly from the command line. This is the purpose of the xcsv-utils subpackage. For example:

$ python3 -m xcsv.utils example.csv 
id: 1
title: The title
summary: This dataset...
The second summary paragraph.
The third summary paragraph.  Escaped because it contains the delimiter in a URL https://dummy.domain
authors: A B, C D
institution: BAS (British Antarctic Survey).
latitude: -73.86 (degree_north)
longitude: -65.46 (degree_east)
elevation: 1897 (m a.s.l.)
[a]: 2012 not a complete year
+----+--------------+-------------+
|    |         time |   depth (m) |
|    |   (year) [a] |             |
|----+--------------+-------------|
|  0 |         2012 |       0.575 |
|  1 |         2011 |       1.125 |
|  2 |         2010 |       2.225 |
+----+--------------+-------------+

All the colour theming is handled by the blessed package, and the data table formatting is handled by the tabulate package.

Note here that we're calling xcsv-utils as a module main. As a convenience, this invocation is wrapped as a console script (called xcsv_print) when installing the package, hence the following invocation is equivalent:

$ xcsv_print example.csv

A key feature of the CLI, is that the attributes of the XCSV object can be inspected individually. When called with no options, the header is printed in a themed and formatted way, followed by the data table, which is also themed and formatted. This is more readable than simply cating the file. The modes of the CLI are:

  • No options: Print the header, followed by the data table.
  • -H: Print the header only.
  • -C: Print the column headers only. These are printed with a numeric index, so that optionally a subset of columns can be specified when printing the data table.
  • -D: Print the data table only. The rows have a leading row index column (as per usual with pandas dataframes), so that optionally a subset of rows can be specified when printing the data table.

For example:

$ xcsv_print example.csv -H
id: 1
title: The title
summary: This dataset...
The second summary paragraph.
The third summary paragraph.  Escaped because it contains the delimiter in a URL https://dummy.domain
authors: A B, C D
institution: BAS (British Antarctic Survey).
latitude: -73.86 (degree_north)
longitude: -65.46 (degree_east)
elevation: 1897 (m a.s.l.)
[a]: 2012 not a complete year
$ xcsv_print example.csv -C
   0 time (year) [a]
   1 depth (m)
$ xcsv_print example.csv -D
+----+--------------+-------------+
|    |         time |   depth (m) |
|    |   (year) [a] |             |
|----+--------------+-------------|
|  0 |         2012 |       0.575 |
|  1 |         2011 |       1.125 |
|  2 |         2010 |       2.225 |
+----+--------------+-------------+

These modes are mutually exclusive. Each of these options can be combined with the verbose (-v) option, to highlight how that attribute was parsed. For example, when verbosely printing the header, any numeric values with units, and any list items, will be highlighted:

$ xcsv_print example.csv -Hv
id: 1
title: The title
summary: ['This dataset...', 'The second summary paragraph.', 'The third summary paragraph.  Escaped because it contains the delimiter in a URL https://dummy.domain']
authors: A B, C D
institution: BAS (British Antarctic Survey).
latitude: value: -73.86, units: degree_north
longitude: value: -65.46, units: degree_east
elevation: value: 1897, units: m a.s.l.
[a]: 2012 not a complete year

Similarly for the column headers:

$ xcsv_print example.csv -Cv
   0 name: time, units: year, notes: a
   1 name: depth, units: m, notes: None

and the data table:

$ xcsv_print example.csv -Dv
+----+---------------+---------------+
|    |    name: time |   name: depth |
|    |   units: year |      units: m |
|    |      notes: a |   notes: None |
|----+---------------+---------------|
|  0 |          2012 |         0.575 |
|  1 |          2011 |         1.125 |
|  2 |          2010 |         2.225 |
+----+---------------+---------------+

The data table also has an extra verbose (-vv) option whereby it will resolve any table notes to the notes text from the coresponding header item:

$ xcsv_print example.csv -Dvv
+----+-----------------------------------+---------------+
|    |                        name: time |   name: depth |
|    |                       units: year |      units: m |
|    |   notes: a -> 2012 not a complete |   notes: None |
|    |                              year |               |
|----+-----------------------------------+---------------|
|  0 |                              2012 |         0.575 |
|  1 |                              2011 |         1.125 |
|  2 |                              2010 |         2.225 |
+----+-----------------------------------+---------------+

There are further useful options for printing the data table:

  • -c: Specify a subset of columns to print. This can be useful when the data table contains a large number of columns.
  • -r: Similarly, specify a subset of rows to print.
  • --head: Convenience function. Print only the first 10 data rows.
  • --tail: Convenience function. Print only the last 10 data rows.

All column and row indices are zero-based, and can be ascertained from the -C and -D options respectively. They can be specified as comma-separated lists and/or hyphen-separated (inclusive) ranges.

Of course the OS head and tail (and less) commands can be used to limit the output of the table, but doing so will lose the theming. Moreover though, if the output includes the header, then head will apply to the header, and will likely not get as far as showing any of the data table.

For example:

$ xcsv_print -D --head A68_icebergs_positions_and_dimensions_2020-21-A68A.csv
+----+------------+--------------------+--------------------+---------------------------+---------------------------+---------------------------+---------------------------+-------------------------+--------------------------+-------------+-----------------------------------------------------------------+
|    | Date       |   A68A_Central_Lat |   A68A_Central_Lon |   A68A_Long_axis_end_Lat1 |   A68A_Long_axis_end_Lon1 |   A68A_Long_axis_end_Lat2 |   A68A_Long_axis_end_Lon2 |   A68A_Long_axis_length |   A68A_Short_axis_length |   A68A_Area | Dimensions_source                                               |
|    |            |     (degree_north) |      (degree_east) |            (degree_north) |             (degree_east) |            (degree_north) |             (degree_east) |                    (km) |                     (km) |       (km2) |                                                                 |
|----+------------+--------------------+--------------------+---------------------------+---------------------------+---------------------------+---------------------------+-------------------------+--------------------------+-------------+-----------------------------------------------------------------|
|  0 | 2020-09-01 |             -59.42 |             -49.47 |                    -59.63 |                    -50.51 |                    -59.26 |                    -47.98 |                 nan     |                  nan     |      nan    | 9999                                                            |
|  1 | 2020-09-02 |             -59.42 |             -49.49 |                    -59.8  |                    -50.52 |                    -59.11 |                    -48.3  |                 nan     |                  nan     |      nan    | 9999                                                            |
|  2 | 2020-09-03 |             -59.46 |             -49.7  |                    -60.04 |                    -50.18 |                    -58.88 |                    -48.9  |                 149.375 |                   49.875 |     4451.66 | s1a-ew-grd-hh-20200903t223918-20200903t224018-034200-03f92b-001 |
|  3 | 2020-09-04 |             -59.43 |             -49.72 |                    -60.11 |                    -49.77 |                    -58.78 |                    -49.43 |                 nan     |                  nan     |      nan    | 9999                                                            |
|  4 | 2020-09-05 |             -59.45 |             -49.72 |                    -60.06 |                    -49.49 |                    -58.73 |                    -49.72 |                 nan     |                  nan     |      nan    | 9999                                                            |
|  5 | 2020-09-06 |             -59.44 |             -49.66 |                    -59.98 |                    -49.19 |                    -58.69 |                    -50.02 |                 nan     |                  nan     |      nan    | 9999                                                            |
|  6 | 2020-09-07 |             -59.38 |             -49.69 |                    -59.84 |                    -49    |                    -58.69 |                    -50.27 |                 nan     |                  nan     |      nan    | 9999                                                            |
|  7 | 2020-09-08 |             -59.31 |             -49.66 |                    -59.73 |                    -48.85 |                    -58.7  |                    -50.48 |                 nan     |                  nan     |      nan    | 9999                                                            |
|  8 | 2020-09-09 |             -59.29 |             -49.68 |                    -59.72 |                    -48.85 |                    -58.67 |                    -50.47 |                 nan     |                  nan     |      nan    | 9999                                                            |
|  9 | 2020-09-10 |             -59.26 |             -49.7  |                    -59.7  |                    -48.85 |                    -58.64 |                    -50.45 |                 nan     |                  nan     |      nan    | 9999                                                            |
+----+------------+--------------------+--------------------+---------------------------+---------------------------+---------------------------+---------------------------+-------------------------+--------------------------+-------------+-----------------------------------------------------------------+
$ xcsv_print -C A68_icebergs_positions_and_dimensions_2020-21-A68A.csv
   0 Date
   1 A68A_Central_Lat (degree_north)
   2 A68A_Central_Lon (degree_east)
   3 A68A_Long_axis_end_Lat1 (degree_north)
   4 A68A_Long_axis_end_Lon1 (degree_east)
   5 A68A_Long_axis_end_Lat2 (degree_north)
   6 A68A_Long_axis_end_Lon2 (degree_east)
   7 A68A_Long_axis_length (km)
   8 A68A_Short_axis_length (km)
   9 A68A_Area (km2)
  10 Dimensions_source
$ xcsv_print -D --head -c 0-2 A68_icebergs_positions_and_dimensions_2020-21-A68A.csv
+----+------------+--------------------+--------------------+
|    | Date       |   A68A_Central_Lat |   A68A_Central_Lon |
|    |            |     (degree_north) |      (degree_east) |
|----+------------+--------------------+--------------------|
|  0 | 2020-09-01 |             -59.42 |             -49.47 |
|  1 | 2020-09-02 |             -59.42 |             -49.49 |
|  2 | 2020-09-03 |             -59.46 |             -49.7  |
|  3 | 2020-09-04 |             -59.43 |             -49.72 |
|  4 | 2020-09-05 |             -59.45 |             -49.72 |
|  5 | 2020-09-06 |             -59.44 |             -49.66 |
|  6 | 2020-09-07 |             -59.38 |             -49.69 |
|  7 | 2020-09-08 |             -59.31 |             -49.66 |
|  8 | 2020-09-09 |             -59.29 |             -49.68 |
|  9 | 2020-09-10 |             -59.26 |             -49.7  |
+----+------------+--------------------+--------------------+
$ xcsv_print -D --head -c 0,3-6 A68_icebergs_positions_and_dimensions_2020-21-A68A.csv
+----+------------+---------------------------+---------------------------+---------------------------+---------------------------+
|    | Date       |   A68A_Long_axis_end_Lat1 |   A68A_Long_axis_end_Lon1 |   A68A_Long_axis_end_Lat2 |   A68A_Long_axis_end_Lon2 |
|    |            |            (degree_north) |             (degree_east) |            (degree_north) |             (degree_east) |
|----+------------+---------------------------+---------------------------+---------------------------+---------------------------|
|  0 | 2020-09-01 |                    -59.63 |                    -50.51 |                    -59.26 |                    -47.98 |
|  1 | 2020-09-02 |                    -59.8  |                    -50.52 |                    -59.11 |                    -48.3  |
|  2 | 2020-09-03 |                    -60.04 |                    -50.18 |                    -58.88 |                    -48.9  |
|  3 | 2020-09-04 |                    -60.11 |                    -49.77 |                    -58.78 |                    -49.43 |
|  4 | 2020-09-05 |                    -60.06 |                    -49.49 |                    -58.73 |                    -49.72 |
|  5 | 2020-09-06 |                    -59.98 |                    -49.19 |                    -58.69 |                    -50.02 |
|  6 | 2020-09-07 |                    -59.84 |                    -49    |                    -58.69 |                    -50.27 |
|  7 | 2020-09-08 |                    -59.73 |                    -48.85 |                    -58.7  |                    -50.48 |
|  8 | 2020-09-09 |                    -59.72 |                    -48.85 |                    -58.67 |                    -50.47 |
|  9 | 2020-09-10 |                    -59.7  |                    -48.85 |                    -58.64 |                    -50.45 |
+----+------------+---------------------------+---------------------------+---------------------------+---------------------------+
$ xcsv_print -D --head -c 0,7-8 A68_icebergs_positions_and_dimensions_2020-21-A68A.csv
+----+------------+-------------------------+--------------------------+
|    | Date       |   A68A_Long_axis_length |   A68A_Short_axis_length |
|    |            |                    (km) |                     (km) |
|----+------------+-------------------------+--------------------------|
|  0 | 2020-09-01 |                 nan     |                  nan     |
|  1 | 2020-09-02 |                 nan     |                  nan     |
|  2 | 2020-09-03 |                 149.375 |                   49.875 |
|  3 | 2020-09-04 |                 nan     |                  nan     |
|  4 | 2020-09-05 |                 nan     |                  nan     |
|  5 | 2020-09-06 |                 nan     |                  nan     |
|  6 | 2020-09-07 |                 nan     |                  nan     |
|  7 | 2020-09-08 |                 nan     |                  nan     |
|  8 | 2020-09-09 |                 nan     |                  nan     |
|  9 | 2020-09-10 |                 nan     |                  nan     |
+----+------------+-------------------------+--------------------------+
# Replicate the combined --head and --tail functionality with the -r option
$ xcsv_print -D -c 0,7-8 -r 0-9,218-227 A68_icebergs_positions_and_dimensions_2020-21-A68A.csv
+-----+------------+-------------------------+--------------------------+
|     | Date       |   A68A_Long_axis_length |   A68A_Short_axis_length |
|     |            |                    (km) |                     (km) |
|-----+------------+-------------------------+--------------------------|
|   0 | 2020-09-01 |                 nan     |                  nan     |
|   1 | 2020-09-02 |                 nan     |                  nan     |
|   2 | 2020-09-03 |                 149.375 |                   49.875 |
|   3 | 2020-09-04 |                 nan     |                  nan     |
|   4 | 2020-09-05 |                 nan     |                  nan     |
|   5 | 2020-09-06 |                 nan     |                  nan     |
|   6 | 2020-09-07 |                 nan     |                  nan     |
|   7 | 2020-09-08 |                 nan     |                  nan     |
|   8 | 2020-09-09 |                 nan     |                  nan     |
|   9 | 2020-09-10 |                 nan     |                  nan     |
| 218 | 2021-04-07 |                 nan     |                  nan     |
| 219 | 2021-04-08 |                 nan     |                  nan     |
| 220 | 2021-04-09 |                 nan     |                  nan     |
| 221 | 2021-04-10 |                 nan     |                  nan     |
| 222 | 2021-04-11 |                 nan     |                  nan     |
| 223 | 2021-04-12 |                 nan     |                  nan     |
| 224 | 2021-04-13 |                 nan     |                  nan     |
| 225 | 2021-04-14 |                 nan     |                  nan     |
| 226 | 2021-04-15 |                 nan     |                  nan     |
| 227 | 2021-04-16 |                 nan     |                  nan     |
+-----+------------+-------------------------+--------------------------+

In general, the default theme works OK in a dark- or light- background terminal. The default theme (light) assumes a dark background. If highlighted text is difficult to read on a light background, then specify the dark theme instead (-t dark).

In addition to using the CLI, the package can be used as a Python library. The main class is Print which provides methods to pretty-print a given dataset (XCSV object):

>>> import xcsv
>>> import xcsv.utils as xu
>>> filename = 'example.csv'
>>> with xcsv.File(filename) as f:
>>>     dataset = f.read()
>>> printer = xu.Print(metadata=dataset.metadata, data=dataset.data)
>>> printer.print_data()
+----+--------------+-------------+
|    |         time |   depth (m) |
|    |   (year) [a] |             |
|----+--------------+-------------|
|  0 |         2012 |       0.575 |
|  1 |         2011 |       1.125 |
|  2 |         2010 |       2.225 |
+----+--------------+-------------+
# To access the formatted and themed string, we can do the following
>>> output = printer.format_data()
>>> output
'+----+----------+---------+\n|    |     \x1b[38;2;190;190;190m\x1b[38;2;190;190;190m\x1b[38;2;190;190;190mtime\x1b[m\x1b[m\x1b[m |   \x1b[38;2;190;190;190m\x1b[38;2;190;190;190m\x1b[38;2;190;190;190mdepth\x1b[m\x1b[m |\n|    |   \x1b[38;2;190;190;190m\x1b[38;2;190;190;190m\x1b[38;2;190;190;190m(year)\x1b[m\x1b[m |     \x1b[38;2;190;190;190m\x1b[38;2;190;190;190m(m)\x1b[m\x1b[m\x1b[m |\n|    |      \x1b[38;2;190;190;190m\x1b[38;2;190;190;190m[a]\x1b[m\x1b[m\x1b[m |         |\n|----+----------+---------|\n|  0 |     2012 |   0.575 |\n|  1 |     2011 |   1.125 |\n|  2 |     2010 |   2.225 |\n+----+----------+---------+'
>>> print(output)
+----+----------+---------+
|    |     time |   depth |
|    |   (year) |     (m) |
|    |      [a] |         |
|----+----------+---------|
|  0 |     2012 |   0.575 |
|  1 |     2011 |   1.125 |
|  2 |     2010 |   2.225 |
+----+----------+---------+
>>> printer.columns = [1]
>>> printer.print_data()
+----+---------+
|    |   depth |
|    |     (m) |
|----+---------|
|  0 |   0.575 |
|  1 |   1.125 |
|  2 |   2.225 |
+----+---------+
>>> printer.print_header()
id: 1
title: The title
summary: This dataset...
The second summary paragraph.
The third summary paragraph.  Escaped because it contains the delimiter in a URL https://dummy.domain
authors: A B, C D
institution: BAS (British Antarctic Survey).
latitude: -73.86 (degree_north)
longitude: -65.46 (degree_east)
elevation: 1897 (m a.s.l.)
[a]: 2012 not a complete year
>>> printer.verbose = 1
>>> printer.print_header()
id: 1
title: The title
summary: ['This dataset...', 'The second summary paragraph.', 'The third summary paragraph.  Escaped because it contains the delimiter in a URL https://dummy.domain']
authors: A B, C D
institution: BAS (British Antarctic Survey).
latitude: value: -73.86, units: degree_north
longitude: value: -65.46, units: degree_east
elevation: value: 1897, units: m a.s.l.
[a]: 2012 not a complete year
>>> printer.print_column_headers()
   0 name: time, units: year, notes: a
   1 name: depth, units: m, notes: None

Command line usage

Calling the script with the --help option will show the following usage:

$ python3 -m xcsv.utils --help
usage: xcsv_print [-h] [-H] [-C] [-D] [-c COLUMNS] [-r ROWS] [--head]
                  [--tail] [-t {light,dark}] [-v] [-V]
                  in_file

print xcsv file

positional arguments:
  in_file               input XCSV file

optional arguments:
  -h, --help            show this help message and exit
  -H, --header-only     show the extended header section only
  -C, --column-headers-only
                        show the data table column headers only
  -D, --data-only       show the data table only
  -c COLUMNS, --columns COLUMNS
                        columns to include in the data table (specify
                        multiple columns separated by commas and/or as
                        hyphen-separated ranges)
  -r ROWS, --rows ROWS  rows to include in the data table (specify multiple
                        rows separated by commas and/or as hyphen-separated
                        ranges)
  --head                only show the first 10 rows of the data table
  --tail                only show the last 10 rows of the data table
  -t {light,dark}, --theme {light,dark}
                        use the named theme to apply styling to the output
  -v, --verbose         show incrementally verbose output
  -V, --version         show program's version number and exit

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xcsv_utils-0.1.1.tar.gz (18.1 kB view details)

Uploaded Source

Built Distribution

xcsv_utils-0.1.1-py3-none-any.whl (14.4 kB view details)

Uploaded Python 3

File details

Details for the file xcsv_utils-0.1.1.tar.gz.

File metadata

  • Download URL: xcsv_utils-0.1.1.tar.gz
  • Upload date:
  • Size: 18.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.2.2 CPython/3.8.10 Linux/5.15.0-91-generic

File hashes

Hashes for xcsv_utils-0.1.1.tar.gz
Algorithm Hash digest
SHA256 7cfc756e24ec47047760124f7f08166eb53d4c7707308b38ed7b830172e2653a
MD5 4594947004a7427e235e96f9dbbde31a
BLAKE2b-256 6138e287982e3a549cb8c050d2ca7ddbb738dc00a94c0b8448ff721c50f21b5b

See more details on using hashes here.

File details

Details for the file xcsv_utils-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: xcsv_utils-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 14.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.2.2 CPython/3.8.10 Linux/5.15.0-91-generic

File hashes

Hashes for xcsv_utils-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 3eabb1323bae36760b08e9d30df95376d8520dc502d520349dd9b5dfdc2bdb96
MD5 6a1a2e53883d4d6c1d1882d1d7c8c995
BLAKE2b-256 f25f4a7d528091e30e88557ab012f2dffca96bac6cdaccefa73fbbf7561bcf50

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page