Skip to main content

Vertical view of delimited text records

Project description

csvchk

Check one record of a delimited text file

This program will show you the first record of a delimited text file transposed vertically. It is meant to complement the many features of the csvkit tools. For example, given a file like this:

$ csvlook test/test.csv
| id | val |
| -- | --- |
|  1 | foo |
|  2 | bar |

This program will show:

// ****** Record 1 ****** //
id  : 1
val : foo

Usage and options

Run with -h or --help for a full usage:

$ ./csvchk.py -h
usage: csvchk.py [-h] [-s sep] [-f names] [-l nrecs] [-L nrecs] [-g grep] [-d]
                 [-n] [-N] [-e encode] [--version]
                 FILE [FILE ...]

Check a delimited text file

positional arguments:
  FILE                  Input file(s)

optional arguments:
  -h, --help            show this help message and exit
  -s sep, --sep sep     Field separator (default: )
  -f names, --fieldnames names
                        Field names (no header) (default: )
  -l nrecs, --limit nrecs
                        How many records to show (default: 1)
  -L nrecs, --field-limit nrecs
                        How many fields to show (default: 0)
  -g grep, --grep grep  Only show records with a given value (default: )
  -d, --dense           Not sparse (skip empty fields) (default: False)
  -n, --number          Show field number (e.g., for awk) (default: False)
  -N, --noheaders       No headers in first row (default: False)
  -e encode, --encoding encode
                        File encoding (default: utf-8)
  --version             show program's version number and exit

Separator

The default field separator is a tab character unless the input file has the extension .csv. You can change this value using the -s or --sep option.

For example, given this file:

$ cat test/test2.txt
id:val
1:foo
2:bar

You could run:

$ csvchk -s ':' test/test2.txt
// ****** Record 1 ****** //
id  : 1
val : foo

Field names

The input file is assumed to contain column headers/field names in the first row. If a file has no such headers, you can provide a comma-separated string with -f or --fieldnames of values to use instead.

For example, given this file:

$ cat test/nohdr.csv
1,foo
2,bar

You can run:

$ csvchk -f 'id, value' test/nohdr.csv
// ****** Record 1 ****** //
id    : 1
value : foo

Limit

By default, the program will use the -l or --limit value of 1 to show the first record. You can increase this, for example:

$ csvchk -l 2 test/test.csv
// ****** Record 1 ****** //
id  : 1
val : foo
// ****** Record 2 ****** //
id  : 2
val : bar

To see all the records, use a negative value like -1:

$ csvchk -l -1 test/test.csv
// ****** Record 1 ****** //
id  : 1
val : foo
// ****** Record 2 ****** //
id  : 2
val : bar
// ****** Record 3 ****** //
id  : 3
val : baz

Dense output

By default, all fields and values will be shown for each record. For example, given this file:

$ cat test/sparse.csv
id,val
1,foo
2,
,baz

This will be shown:

$ csvchk test/sparse.csv -l -1
// ****** Record 1 ****** //
id  : 1
val : foo
// ****** Record 2 ****** //
id  : 2
val :
// ****** Record 3 ****** //
id  :
val : baz

You can use the -d or --dense option to omit fields that have no values:

$ csvchk test/sparse.csv -l -1 -d
// ****** Record 1 ****** //
id  : 1
val : foo
// ****** Record 2 ****** //
id : 2
// ****** Record 3 ****** //
val : baz

Numbering fields

The -n or --number option will append the field numbers before the output:

$ csvchk -n test/test.tab
// ****** Record 1 ****** //
  1 id  : 1
  2 val : foo

This can be useful if you would like to know the field number to use with awk, e.g., we could look for records where the val column (in the second position) has an "a":

$ awk '$2 ~ /a/' test/test.tab
id	val
2	bar

No headers

If the input file does not have headers (column names) in the first row, you can use the -N or --noheaders option to have the program create names like "Field1," "Field2," etc.:

$ csvchk -N test/nohdr.csv
// ****** Record 1 ****** //
Field1 : 1
Field2 : foo

Filter by record contents

You can use the -g or --grep option to view only records containing a string:

$ csvchk -g ba -l 2 tests/test.csv
// ****** Record 1 ****** //
id  : 2
val : bar
// ****** Record 2 ****** //
id  : 3
val : baz

Multiple file inputs

If given multiple files as inputs, the program will insert a header noting the basename of each file:

$ csvchk test/test.csv test/test.tab
==> test.csv <==
// ****** Record 1 ****** //
id  : 1
val : foo

==> test.tab <==
// ****** Record 1 ****** //
id  : 1
val : foo

Duplicate Column Names

Duplicate column names will have a suffix of _<num> starting at the second occurrence. For instance, this file:

$ cat tests/duplicate_cols.csv
name,age,age
Keith,42,42
Jorge,35,35
Geoffrey,51,51

Will produce this output:

$ csvchk tests/duplicate_cols.csv
// ****** Record 1 ****** //
name  : Keith
age   : 42
age_2 : 42

Limiting the Columns Shown

You may wish to limit the number of columns shown using the -L|--field-limit option:

$ csvchk --field-limit 1 tests/test.csv
// ****** Record 1 ****** //
id  : 1

Author

Ken Youens-Clark kyclark@gmail.com

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

csvchk-0.3.0.tar.gz (6.0 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page