A tool to manipulate csv files with headers.
Project description
csvspoon: a tool to manipulate csv file with headers
Again, again, and again.
Installing
From pypi:
pipx install csvspoon
Or developer version:
git clone <this repo>
cd csvspoon
# make and activate a venv
pip3 install -e .
Enable completion (for bash or other shells using bash-completion)
mkdir -p ~/.local/share/bash-completion/completions
register-python-argcomplete csvspoon > ~/.local/share/bash-completion/completions/csvspoon
Python module
All methods and functions are accessible in the python module.
Cli example
csvspoon cat: Concatenate CSV files
- Cat two csv files:
csvspoon cat file1.csv file2.csv
- Display a csv file with a high number of columns:
csvspoon cat -S wide_data.csv
- Change delimiter of a csv file:
csvspoon cat -d "\t" -u ";" file.csv > result.csv
- Change delimiter of a csv file with specified output:
csvspoon cat -o result.csv -d "\t" -u ";" file.csv
- Reformat two columns of a csv files:
csvspoon cat -f a_colname:5.1f -f another_colname:04d file.csv
- Cat one csv file, keeping only a column:
csvspoon cat file.csv:a_col
- Cat two csv files, renaming a column on the second file:
csvspoon cat file1.csv file2.csv:new_col=old_col,another_col
- Cat a csv file, keeping all columns except one:
csvspoon cat file.csv:-col_to_drop
- Cat a csv file with a renamed column and one excluded (all other cols kept):
csvspoon cat file.csv:display_name=internal_name,-skip_col
csvspoon apply: Apply functions to add columns
- Combine text columns by a formula:
csvspoon apply -a name "lastname.upper()+' '+firstname.lower()" file.csv
- Column names with spaces (use --rowvar):
csvspoon apply --rowvar r -a "full name" "r['first name']+' '+r['last name']" file.csv
- Combine text columns by a formula and remove original columns:
csvspoon apply -a name "lastname.upper()+' '+firstname.lower()" \
-O-lastname,-firstname file.csv
- Sum to integer columns:
csvspoon apply -t cola:int -t colb:int -a colsum "cola+colb" file.csv
- Sum to integer columns and format the result:
csvspoon apply -t cola:int -t colb:int -a colsum:05d "cola+colb" file.csv
- Compute complex expression between columns:
csvspoon apply \
-b "import math" \
-t x:float \
-t y:float \
-a norm "math.sqrt(x**2+y**2)" \
file.csv
- Use a custom function from a local module (current dir) for a new column:
csvspoon apply -p . -b "from stuff import myfun" -a new_col "myfun(col1, col2)" file.csv
- Multiple computation can be done reusing newly created columns:
csvspoon apply -t x:int -a x2p1 "x**2+1" -a x2p1m1 "x2p1-1" file.csv
csvspoon sort: Sort CSV file
- Sort csv file using column cola:
csvspoon sort -k cola file.csv
- Sort csv file using columns cola and colb:
csvspoon sort -k cola -k colb file.csv
- Sort csv file using numerical mode on column numcol:
csvspoon sort -n -k numcol file.csv
- Shuffle csv file:
csvspoon sort -R file.csv
csvspoon filter: Filter CSV from given conditions
- Filter csv file using two columns:
csvspoon filter -a "lastname!=firstname" file.csv
- Column name with spaces (use --rowvar):
csvspoon filter --rowvar r -a "r['unit price']>10" file.csv
- Chain filters on csv file:
csvspoon filter \
-a "lastname.startswith('Doe')" \
-a "firstname.starswith('John')" \
file.csv
- Filter csv file with float column price:
csvspoon filter -t price:float -a "price>12.5" file.csv
- Filter csv file with complex expression:
csvspoon filter \
-b "import math" \
-t x:float \
-t y:float \
-t z:float \
-a "math.sqrt(x**2+y**2)>z" \
file.csv
csvspoon join: Join CSV files
- Operate NATURAL JOIN on two csv files:
csvspoon join file1.csv file2.csv
- Operate two NATURAL JOIN on three csv files:
csvspoon join file1.csv file2.csv file3.csv
- Operate LEFT JOIN on two csv files
csvspoon join -l file1.csv file2.csv
- Operate RIGHT JOIN on two csv files
csvspoon join -r file1.csv file2.csv
- Operate OUTER JOIN on two csv files
csvspoon join -lr file1.csv file2.csv
csvspoon aggregate: Compute aggregation on CSV file
- Keeping unique lines, one line per group:
csvspoon aggregate \
-k group \
file.csv
- Column name with spaces (use --rowvar):
csvspoon aggregate --rowvar r -k category -a total "sum(r['unit price'])" file.csv
- Computing the total mean grade:
csvspoon aggregate \
--np \
-t grade:float \
-a meangrade "np.mean(grade)" \
file.csv
- Computing the total mean grade specifing a format:
csvspoon aggregate \
--np \
-t grade:float \
-a meangrade:.2f "np.mean(grade)" \
file.csv
- Computing the mean grade by group:
csvspoon aggregate \
--np \
-t grade:float \
-a meangrade "np.mean(grade)" \
-k group \
file.csv
- Computing the mean grade, median, standard deviation by group:
csvspoon aggregate \
--np \
-t grade:float \
-a meangrade "np.mean(grade)" \
-a mediangrade "np.median(grade)" \
-a stdgrade "np.std(grade)" \
-k group \
file.csv
csvspoon sample: Sample rows in CSV file
- Sample 10 rows without replacement:
csvspoon sample -k 10 file.csv
- Weighted sample of 10 rows using column weight:
csvspoon sample -W weight -k 10 file.csv
- Sample 30 rows with replacement:
csvspoon sample -r -k 30 file.csv
Cli usage
usage: csvspoon [-h]
{aggregate,agg,apply,ap,cat,c,filter,fi,join,jo,sample,sa,sort,so} ...
A tool to manipulate csv files with headers. Again, again and again.
options:
-h, --help show this help message and exit
subcommands:
{aggregate,agg,apply,ap,cat,c,filter,fi,join,jo,sample,sa,sort,so}
aggregate (agg) Apply a aggregation formula to compute a new column.
apply (ap) Apply a formula to compute a new column.
cat (c) Concatenate csv files.
filter (fi) Filter a csv with a formula.
join (jo) Operate join on csv files
sample (sa) Random sampling in streaming (with or without
replacement).
sort (so) Sort csv files.
csvspoon aggregate
usage: csvspoon aggregate [-h] [-d DELIM] [-c INPUTENC] [-o OUTPUT]
[-f FORMAT] [-O COLS] [-F OUTPUT_FORMAT] [-u ODELIM]
[-C OUTPUTENC] [-H N | -T N] [-D] [-w TEXTWIDTH |
-S] [--infer-lines INFER_LINES] [-L | -N]
[-b BEFORE] [--np] [--sp] [-p PATH] [-t TYPE]
[-r NAME] [-a COLSPEC FORMULA] [-k KEYS]
[input]
Apply a formula to compute a new column.
The formula must be a valid python expression evaluated for each groupped row.
With --rowvar NAME, do not use column names as variables; use the dictionary
NAME[column_name] instead.
Only aggregation or column with non ambiguous values are keeped. Warning: this
method need to store in memory all the input csv file.
positional arguments:
input Input file specification. If no input file is
provided, stdin is used as input file. Can be a
filename (e.g. "file.csv"), or a filename followed by
a semicolon and column names separated by commas (e.g.
"file.csv:a_colname,another_colname"). A column can be
renamed with "new_name=old_name" (e.g.
"file.csv:new_col=old_col"). When column names are
specified (and no exclusion is used), only those
columns are used, in the given order. Prefixing a
column with "-" excludes it. When at least one column
is excluded, all columns from the file are kept except
the excluded ones; columns that are also listed
(possibly with a rename) appear under their (new)
name. Excluded columns cannot be renamed (no "=" after
"-col").
options:
-h, --help show this help message and exit
Input options:
-d, --delim DELIM Input delimiter. (default: ',')
-c, --inputenc INPUTENC
Input encoding. (default: 'utf8')
Output options:
-o, --output OUTPUT Output file, else output on stdout.
-f, --format FORMAT Apply a format on a column on output. The argument
must be a column name followed by a colon and a format
specifier. e.g. "a_colname:5d" or "a_colname:+07.2f".
This option can be specified multiple time to format
different columns.
-O, --output-columns COLS
Output only the given columns, in that order. Same
format as after ':' in the input: column names,
optional renames (newname=oldname), optional
exclusions (-col). e.g. "a,b,label=id,-skip". These
transformations are applied after all other processing
(including the --format option). When excluding
columns (-col), use -O-col or --output-columns=-col
(no space, or use =) so that -col is not parsed as a
separate option.
-F, --output-format OUTPUT_FORMAT
Output format. Choices: auto, csv, terminal, ascii,
unicode, markdown (or md), latex (or tex). auto
(default): CSV except when output on stdout and stdout
is a tty, then terminal. csv: Raw CSV using the output
delimiter (-u/--output-delim). terminal: Pretty-print
with Unicode box-drawing for a nice display. ascii:
Pretty-print as text table with ASCII characters only.
unicode: Pretty-print as text table with Unicode box-
drawing characters. markdown (md): Pretty-print as a
Markdown table. latex (tex): Pretty-print as a LaTeX
tabular (booktabs).
-u, --output-delim ODELIM
Output delimiter. (default: ',')
-C, --outputenc OUTPUTENC
Output encoding. (default: 'utf8')
-H, --head N Output only the first N rows.
-T, --tail N Output only the last N rows.
-D, --dark-background
Use dark background colors for terminal output. This
option only has an effect when terminal pretty-print
is used.
-w, --textwidth TEXTWIDTH
Maximum total width of the table in characters
(including borders and padding) for pretty output
(terminal, ascii, unicode, markdown). Content is
truncated with "…" when needed. (default: terminal
width when stdout is a tty, else no limit)
-S, --keep-long-lines
Do not wrap or truncate long lines (mutually exclusive
with -w). Implies -L in terminal output mode except
when -N is specified.
--infer-lines INFER_LINES
Number of rows used to infer column types and sizes
for pretty output (markdown, terminal, etc.). Use
"inf" for no limit (infer on all rows). (default:
1000)
-L, --less Pipe output through less (only when writing to stdout
on a tty). Automatically activated if the number of
lines exceed the terminal size (except when '--no-
less' or '-N' is used).
-N, --no-less Never output through less (only when writing to stdout
on a tty).
Processing options:
-b, --before BEFORE Run the following code before evaluate the expression
on each row. Can be specified multiple times. (e.g.
"import math").
--np Shortcut to `--before "import numpy as np"`
--sp Shortcut to `--np --before "import scipy as sp"`
-p, --python-path PATH
Append path provided in argument to Python path.
Usefull in with -b. e.g. -p . -b "import myfunctions".
-t, --type TYPE Apply type conversion on specified command prior to
expression. The argument must be a column name
followed by a valid Python type. See "--before" to
define non standard type. e.g. "a_column:int" or
"a_column:float". This option can be specified
multiple time to type different columns.
-r, --rowvar NAME When set, row values are not exposed as local
variables in expressions; instead a single dictionary
mapping column names to values is exposed under this
name. Useful when column names contain spaces or
special characters. e.g. --rowvar row then use
row["col name"] in apply/aggregate/filter expressions.
-a, --add, --add-aggregation COLSPEC FORMULA
Append a new column by aggregation of values. Take two
argument, COLSPEC and FORMULA. COLSPEC is the name of
the created column obtained the aggregation. The
COLSPEC can also contains a colon and a format
specifier, see "--format" for example. FORMULA must be
a valid python expression. For each column, list of
values to aggregate are accessible as local variable
(or via the dict given by --rowvar NAME). The formula
should return a single value. e.g. "sum(a_colname) +
sum(other_colname)" or with --rowvar r: "sum(r['a
col']) + sum(r['other col'])". See "--type" for typing
other columns and "--before" for run code before
evaluating expression. Can be specified multiple time.
-k, --key KEYS Column used groupping the aggregate. Can be specified
multiple time. Similar to "GROUP BY" in SQL.
Examples:
Keeping unique lines, one line per group:
csvspoon aggregate \
-k group \
file.csv
Column name with spaces (use --rowvar):
csvspoon aggregate --rowvar r -k category -a total "sum(r['unit price'])" file.csv
Computing the total mean grade:
csvspoon aggregate \
--np \
-t grade:float \
-a meangrade "np.mean(grade)" \
file.csv
Computing the total mean grade specifing a format:
csvspoon aggregate \
--np \
-t grade:float \
-a meangrade:.2f "np.mean(grade)" \
file.csv
Computing the mean grade by group:
csvspoon aggregate \
--np \
-t grade:float \
-a meangrade "np.mean(grade)" \
-k group \
file.csv
Computing the mean grade, median, standard deviation by group:
csvspoon aggregate \
--np \
-t grade:float \
-a meangrade "np.mean(grade)" \
-a mediangrade "np.median(grade)" \
-a stdgrade "np.std(grade)" \
-k group \
file.csv
csvspoon apply
usage: csvspoon apply [-h] [-d DELIM] [-c INPUTENC] [-o OUTPUT] [-f FORMAT]
[-O COLS] [-F OUTPUT_FORMAT] [-u ODELIM] [-C OUTPUTENC]
[-H N | -T N] [-D] [-w TEXTWIDTH | -S]
[--infer-lines INFER_LINES] [-L | -N] [-b BEFORE] [--np]
[--sp] [-p PATH] [-t TYPE] [-r NAME]
[-a COLSPEC FORMULA]
[input]
Apply a formula to compute a new column.
The formula must be a valid python expression evaluated on each row. With
--rowvar NAME, do not use column names as variables; use the dictionary
NAME[column_name] instead.
This method is completely streamed and no data is stored in memory.
positional arguments:
input Input file specification. If no input file is
provided, stdin is used as input file. Can be a
filename (e.g. "file.csv"), or a filename followed by
a semicolon and column names separated by commas (e.g.
"file.csv:a_colname,another_colname"). A column can be
renamed with "new_name=old_name" (e.g.
"file.csv:new_col=old_col"). When column names are
specified (and no exclusion is used), only those
columns are used, in the given order. Prefixing a
column with "-" excludes it. When at least one column
is excluded, all columns from the file are kept except
the excluded ones; columns that are also listed
(possibly with a rename) appear under their (new)
name. Excluded columns cannot be renamed (no "=" after
"-col").
options:
-h, --help show this help message and exit
Input options:
-d, --delim DELIM Input delimiter. (default: ',')
-c, --inputenc INPUTENC
Input encoding. (default: 'utf8')
Output options:
-o, --output OUTPUT Output file, else output on stdout.
-f, --format FORMAT Apply a format on a column on output. The argument
must be a column name followed by a colon and a format
specifier. e.g. "a_colname:5d" or "a_colname:+07.2f".
This option can be specified multiple time to format
different columns.
-O, --output-columns COLS
Output only the given columns, in that order. Same
format as after ':' in the input: column names,
optional renames (newname=oldname), optional
exclusions (-col). e.g. "a,b,label=id,-skip". These
transformations are applied after all other processing
(including the --format option). When excluding
columns (-col), use -O-col or --output-columns=-col
(no space, or use =) so that -col is not parsed as a
separate option.
-F, --output-format OUTPUT_FORMAT
Output format. Choices: auto, csv, terminal, ascii,
unicode, markdown (or md), latex (or tex). auto
(default): CSV except when output on stdout and stdout
is a tty, then terminal. csv: Raw CSV using the output
delimiter (-u/--output-delim). terminal: Pretty-print
with Unicode box-drawing for a nice display. ascii:
Pretty-print as text table with ASCII characters only.
unicode: Pretty-print as text table with Unicode box-
drawing characters. markdown (md): Pretty-print as a
Markdown table. latex (tex): Pretty-print as a LaTeX
tabular (booktabs).
-u, --output-delim ODELIM
Output delimiter. (default: ',')
-C, --outputenc OUTPUTENC
Output encoding. (default: 'utf8')
-H, --head N Output only the first N rows.
-T, --tail N Output only the last N rows.
-D, --dark-background
Use dark background colors for terminal output. This
option only has an effect when terminal pretty-print
is used.
-w, --textwidth TEXTWIDTH
Maximum total width of the table in characters
(including borders and padding) for pretty output
(terminal, ascii, unicode, markdown). Content is
truncated with "…" when needed. (default: terminal
width when stdout is a tty, else no limit)
-S, --keep-long-lines
Do not wrap or truncate long lines (mutually exclusive
with -w). Implies -L in terminal output mode except
when -N is specified.
--infer-lines INFER_LINES
Number of rows used to infer column types and sizes
for pretty output (markdown, terminal, etc.). Use
"inf" for no limit (infer on all rows). (default:
1000)
-L, --less Pipe output through less (only when writing to stdout
on a tty). Automatically activated if the number of
lines exceed the terminal size (except when '--no-
less' or '-N' is used).
-N, --no-less Never output through less (only when writing to stdout
on a tty).
Processing options:
-b, --before BEFORE Run the following code before evaluate the expression
on each row. Can be specified multiple times. (e.g.
"import math").
--np Shortcut to `--before "import numpy as np"`
--sp Shortcut to `--np --before "import scipy as sp"`
-p, --python-path PATH
Append path provided in argument to Python path.
Usefull in with -b. e.g. -p . -b "import myfunctions".
-t, --type TYPE Apply type conversion on specified command prior to
expression. The argument must be a column name
followed by a valid Python type. See "--before" to
define non standard type. e.g. "a_column:int" or
"a_column:float". This option can be specified
multiple time to type different columns.
-r, --rowvar NAME When set, row values are not exposed as local
variables in expressions; instead a single dictionary
mapping column names to values is exposed under this
name. Useful when column names contain spaces or
special characters. e.g. --rowvar row then use
row["col name"] in apply/aggregate/filter expressions.
-a, --add, --add-column COLSPEC FORMULA
Append a new column (or update existing one). Take two
argument, COLSPEC and FORMULA. COLSPEC is the name of
the created column obtained by appling the formula.
The column is remplaced if already exists. The COLSPEC
can also contains a colon and a format specifier, see
"--format" for example. FORMULA must be a valid python
expression. For the current row, columns values are
accessible as local variable (or via the dict given by
--rowvar NAME). e.g. "a_colname + other_colname" or
"min(a_colname, other_colname)"; with --rowvar row:
row["col name"] + row["other col"]. See "--type" for
typing other columns and "--before" for run code
before evaluating expression. Can be specified
multiple time.
Examples:
Combine text columns by a formula:
csvspoon apply -a name "lastname.upper()+' '+firstname.lower()" file.csv
Column names with spaces (use --rowvar):
csvspoon apply --rowvar r -a "full name" "r['first name']+' '+r['last name']" file.csv
Combine text columns by a formula and remove original columns:
csvspoon apply -a name "lastname.upper()+' '+firstname.lower()" \
-O-lastname,-firstname file.csv
Sum to integer columns:
csvspoon apply -t cola:int -t colb:int -a colsum "cola+colb" file.csv
Sum to integer columns and format the result:
csvspoon apply -t cola:int -t colb:int -a colsum:05d "cola+colb" file.csv
Compute complex expression between columns:
csvspoon apply \
-b "import math" \
-t x:float \
-t y:float \
-a norm "math.sqrt(x**2+y**2)" \
file.csv
Use a custom function from a local module (current dir) for a new column:
csvspoon apply -p . -b "from stuff import myfun" -a new_col "myfun(col1, col2)" file.csv
Multiple computation can be done reusing newly created columns:
csvspoon apply -t x:int -a x2p1 "x**2+1" -a x2p1m1 "x2p1-1" file.csv
csvspoon cat
usage: csvspoon cat [-h] [-d DELIM] [-c INPUTENC] [-o OUTPUT] [-f FORMAT]
[-O COLS] [-F OUTPUT_FORMAT] [-u ODELIM] [-C OUTPUTENC]
[-H N | -T N] [-D] [-w TEXTWIDTH | -S]
[--infer-lines INFER_LINES] [-L | -N]
[input ...]
Concatenate csv files.
Empty fields added if some columns do not exist in all files.
This method is completely streamed and no data is stored in memory.
positional arguments:
input Input file specification. If no input file is
provided, stdin is used as first input file, otherwise
use explicitly "-" for stdin. Can be a filename (e.g.
"file.csv"), or a filename followed by a semicolon and
column names separated by commas (e.g.
"file.csv:a_colname,another_colname"). A column can be
renamed with "new_name=old_name" (e.g.
"file.csv:new_col=old_col"). When column names are
specified (and no exclusion is used), only those
columns are used, in the given order. Prefixing a
column with "-" excludes it. When at least one column
is excluded, all columns from the file are kept except
the excluded ones; columns that are also listed
(possibly with a rename) appear under their (new)
name. Excluded columns cannot be renamed (no "=" after
"-col").
options:
-h, --help show this help message and exit
Input options:
-d, --delim DELIM Input delimiter. (default: ',')
-c, --inputenc INPUTENC
Input encoding. (default: 'utf8')
Output options:
-o, --output OUTPUT Output file, else output on stdout.
-f, --format FORMAT Apply a format on a column on output. The argument
must be a column name followed by a colon and a format
specifier. e.g. "a_colname:5d" or "a_colname:+07.2f".
This option can be specified multiple time to format
different columns.
-O, --output-columns COLS
Output only the given columns, in that order. Same
format as after ':' in the input: column names,
optional renames (newname=oldname), optional
exclusions (-col). e.g. "a,b,label=id,-skip". These
transformations are applied after all other processing
(including the --format option). When excluding
columns (-col), use -O-col or --output-columns=-col
(no space, or use =) so that -col is not parsed as a
separate option.
-F, --output-format OUTPUT_FORMAT
Output format. Choices: auto, csv, terminal, ascii,
unicode, markdown (or md), latex (or tex). auto
(default): CSV except when output on stdout and stdout
is a tty, then terminal. csv: Raw CSV using the output
delimiter (-u/--output-delim). terminal: Pretty-print
with Unicode box-drawing for a nice display. ascii:
Pretty-print as text table with ASCII characters only.
unicode: Pretty-print as text table with Unicode box-
drawing characters. markdown (md): Pretty-print as a
Markdown table. latex (tex): Pretty-print as a LaTeX
tabular (booktabs).
-u, --output-delim ODELIM
Output delimiter. (default: ',')
-C, --outputenc OUTPUTENC
Output encoding. (default: 'utf8')
-H, --head N Output only the first N rows.
-T, --tail N Output only the last N rows.
-D, --dark-background
Use dark background colors for terminal output. This
option only has an effect when terminal pretty-print
is used.
-w, --textwidth TEXTWIDTH
Maximum total width of the table in characters
(including borders and padding) for pretty output
(terminal, ascii, unicode, markdown). Content is
truncated with "…" when needed. (default: terminal
width when stdout is a tty, else no limit)
-S, --keep-long-lines
Do not wrap or truncate long lines (mutually exclusive
with -w). Implies -L in terminal output mode except
when -N is specified.
--infer-lines INFER_LINES
Number of rows used to infer column types and sizes
for pretty output (markdown, terminal, etc.). Use
"inf" for no limit (infer on all rows). (default:
1000)
-L, --less Pipe output through less (only when writing to stdout
on a tty). Automatically activated if the number of
lines exceed the terminal size (except when '--no-
less' or '-N' is used).
-N, --no-less Never output through less (only when writing to stdout
on a tty).
Examples:
Cat two csv files:
csvspoon cat file1.csv file2.csv
Display a csv file with a high number of columns:
csvspoon cat -S wide_data.csv
Change delimiter of a csv file:
csvspoon cat -d "\t" -u ";" file.csv > result.csv
Change delimiter of a csv file with specified output:
csvspoon cat -o result.csv -d "\t" -u ";" file.csv
Reformat two columns of a csv files:
csvspoon cat -f a_colname:5.1f -f another_colname:04d file.csv
Cat one csv file, keeping only a column:
csvspoon cat file.csv:a_col
Cat two csv files, renaming a column on the second file:
csvspoon cat file1.csv file2.csv:new_col=old_col,another_col
Cat a csv file, keeping all columns except one:
csvspoon cat file.csv:-col_to_drop
Cat a csv file with a renamed column and one excluded (all other cols kept):
csvspoon cat file.csv:display_name=internal_name,-skip_col
csvspoon filter
usage: csvspoon filter [-h] [-d DELIM] [-c INPUTENC] [-o OUTPUT] [-f FORMAT]
[-O COLS] [-F OUTPUT_FORMAT] [-u ODELIM] [-C OUTPUTENC]
[-H N | -T N] [-D] [-w TEXTWIDTH | -S]
[--infer-lines INFER_LINES] [-L | -N] [-b BEFORE]
[--np] [--sp] [-p PATH] [-t TYPE] [-r NAME]
[-a FILTER_FORMULA]
[input]
Evaluate a formula on each row, and keep only rows where the formula is
evaluated True.
The formula must be a valid python expression evaluated on each row. With
--rowvar NAME, do not use column names as variables; use the dictionary
NAME[column_name] instead.
This method is completely streamed and no data is stored in memory.
positional arguments:
input Input file specification. If no input file is
provided, stdin is used as input file. Can be a
filename (e.g. "file.csv"), or a filename followed by
a semicolon and column names separated by commas (e.g.
"file.csv:a_colname,another_colname"). A column can be
renamed with "new_name=old_name" (e.g.
"file.csv:new_col=old_col"). When column names are
specified (and no exclusion is used), only those
columns are used, in the given order. Prefixing a
column with "-" excludes it. When at least one column
is excluded, all columns from the file are kept except
the excluded ones; columns that are also listed
(possibly with a rename) appear under their (new)
name. Excluded columns cannot be renamed (no "=" after
"-col").
options:
-h, --help show this help message and exit
Input options:
-d, --delim DELIM Input delimiter. (default: ',')
-c, --inputenc INPUTENC
Input encoding. (default: 'utf8')
Output options:
-o, --output OUTPUT Output file, else output on stdout.
-f, --format FORMAT Apply a format on a column on output. The argument
must be a column name followed by a colon and a format
specifier. e.g. "a_colname:5d" or "a_colname:+07.2f".
This option can be specified multiple time to format
different columns.
-O, --output-columns COLS
Output only the given columns, in that order. Same
format as after ':' in the input: column names,
optional renames (newname=oldname), optional
exclusions (-col). e.g. "a,b,label=id,-skip". These
transformations are applied after all other processing
(including the --format option). When excluding
columns (-col), use -O-col or --output-columns=-col
(no space, or use =) so that -col is not parsed as a
separate option.
-F, --output-format OUTPUT_FORMAT
Output format. Choices: auto, csv, terminal, ascii,
unicode, markdown (or md), latex (or tex). auto
(default): CSV except when output on stdout and stdout
is a tty, then terminal. csv: Raw CSV using the output
delimiter (-u/--output-delim). terminal: Pretty-print
with Unicode box-drawing for a nice display. ascii:
Pretty-print as text table with ASCII characters only.
unicode: Pretty-print as text table with Unicode box-
drawing characters. markdown (md): Pretty-print as a
Markdown table. latex (tex): Pretty-print as a LaTeX
tabular (booktabs).
-u, --output-delim ODELIM
Output delimiter. (default: ',')
-C, --outputenc OUTPUTENC
Output encoding. (default: 'utf8')
-H, --head N Output only the first N rows.
-T, --tail N Output only the last N rows.
-D, --dark-background
Use dark background colors for terminal output. This
option only has an effect when terminal pretty-print
is used.
-w, --textwidth TEXTWIDTH
Maximum total width of the table in characters
(including borders and padding) for pretty output
(terminal, ascii, unicode, markdown). Content is
truncated with "…" when needed. (default: terminal
width when stdout is a tty, else no limit)
-S, --keep-long-lines
Do not wrap or truncate long lines (mutually exclusive
with -w). Implies -L in terminal output mode except
when -N is specified.
--infer-lines INFER_LINES
Number of rows used to infer column types and sizes
for pretty output (markdown, terminal, etc.). Use
"inf" for no limit (infer on all rows). (default:
1000)
-L, --less Pipe output through less (only when writing to stdout
on a tty). Automatically activated if the number of
lines exceed the terminal size (except when '--no-
less' or '-N' is used).
-N, --no-less Never output through less (only when writing to stdout
on a tty).
Processing options:
-b, --before BEFORE Run the following code before evaluate the expression
on each row. Can be specified multiple times. (e.g.
"import math").
--np Shortcut to `--before "import numpy as np"`
--sp Shortcut to `--np --before "import scipy as sp"`
-p, --python-path PATH
Append path provided in argument to Python path.
Usefull in with -b. e.g. -p . -b "import myfunctions".
-t, --type TYPE Apply type conversion on specified command prior to
expression. The argument must be a column name
followed by a valid Python type. See "--before" to
define non standard type. e.g. "a_column:int" or
"a_column:float". This option can be specified
multiple time to type different columns.
-r, --rowvar NAME When set, row values are not exposed as local
variables in expressions; instead a single dictionary
mapping column names to values is exposed under this
name. Useful when column names contain spaces or
special characters. e.g. --rowvar row then use
row["col name"] in apply/aggregate/filter expressions.
-a, --add, --add-filter FILTER_FORMULA
FORMULA must be a valid python expression, which is
casted to bool(). For the current row, columns values
are accessible as local variable (or via the dict
given by --rowvar NAME). e.g. "a_colname >
other_colname" or "a_colname=='fixedvalue'"; with
--rowvar row: "row['col name'] > row['other col']".
See "--type" for typing other columns and "--before"
for run code before evaluating filter expression. Can
be specified multiple time.
Examples:
Filter csv file using two columns:
csvspoon filter -a "lastname!=firstname" file.csv
Column name with spaces (use --rowvar):
csvspoon filter --rowvar r -a "r['unit price']>10" file.csv
Chain filters on csv file:
csvspoon filter \
-a "lastname.startswith('Doe')" \
-a "firstname.starswith('John')" \
file.csv
Filter csv file with float column price:
csvspoon filter -t price:float -a "price>12.5" file.csv
Filter csv file with complex expression:
csvspoon filter \
-b "import math" \
-t x:float \
-t y:float \
-t z:float \
-a "math.sqrt(x**2+y**2)>z" \
file.csv
csvspoon join
usage: csvspoon join [-h] [-d DELIM] [-c INPUTENC] [-o OUTPUT] [-f FORMAT]
[-O COLS] [-F OUTPUT_FORMAT] [-u ODELIM] [-C OUTPUTENC]
[-H N | -T N] [-D] [-w TEXTWIDTH | -S]
[--infer-lines INFER_LINES] [-L | -N] [-l] [-r] [-e]
input [input ...]
Natural join of csv files.
Joins are performed from left to right.
Warning: this method need to store in memory all csv except the first which is
streamed.
If neither --left or --right specified, inner join is realized. For complete
outer join, use --left and --right together.
positional arguments:
input Input file specification. If less than two input files
are provided, stdin is used as first input file,
otherwise use explicitly "-" for stdin. Can be a
filename (e.g. "file.csv"), or a filename followed by
a semicolon and column names separated by commas (e.g.
"file.csv:a_colname,another_colname"). A column can be
renamed with "new_name=old_name" (e.g.
"file.csv:new_col=old_col"). When column names are
specified (and no exclusion is used), only those
columns are used, in the given order. Prefixing a
column with "-" excludes it. When at least one column
is excluded, all columns from the file are kept except
the excluded ones; columns that are also listed
(possibly with a rename) appear under their (new)
name. Excluded columns cannot be renamed (no "=" after
"-col").
options:
-h, --help show this help message and exit
Input options:
-d, --delim DELIM Input delimiter. (default: ',')
-c, --inputenc INPUTENC
Input encoding. (default: 'utf8')
Output options:
-o, --output OUTPUT Output file, else output on stdout.
-f, --format FORMAT Apply a format on a column on output. The argument
must be a column name followed by a colon and a format
specifier. e.g. "a_colname:5d" or "a_colname:+07.2f".
This option can be specified multiple time to format
different columns.
-O, --output-columns COLS
Output only the given columns, in that order. Same
format as after ':' in the input: column names,
optional renames (newname=oldname), optional
exclusions (-col). e.g. "a,b,label=id,-skip". These
transformations are applied after all other processing
(including the --format option). When excluding
columns (-col), use -O-col or --output-columns=-col
(no space, or use =) so that -col is not parsed as a
separate option.
-F, --output-format OUTPUT_FORMAT
Output format. Choices: auto, csv, terminal, ascii,
unicode, markdown (or md), latex (or tex). auto
(default): CSV except when output on stdout and stdout
is a tty, then terminal. csv: Raw CSV using the output
delimiter (-u/--output-delim). terminal: Pretty-print
with Unicode box-drawing for a nice display. ascii:
Pretty-print as text table with ASCII characters only.
unicode: Pretty-print as text table with Unicode box-
drawing characters. markdown (md): Pretty-print as a
Markdown table. latex (tex): Pretty-print as a LaTeX
tabular (booktabs).
-u, --output-delim ODELIM
Output delimiter. (default: ',')
-C, --outputenc OUTPUTENC
Output encoding. (default: 'utf8')
-H, --head N Output only the first N rows.
-T, --tail N Output only the last N rows.
-D, --dark-background
Use dark background colors for terminal output. This
option only has an effect when terminal pretty-print
is used.
-w, --textwidth TEXTWIDTH
Maximum total width of the table in characters
(including borders and padding) for pretty output
(terminal, ascii, unicode, markdown). Content is
truncated with "…" when needed. (default: terminal
width when stdout is a tty, else no limit)
-S, --keep-long-lines
Do not wrap or truncate long lines (mutually exclusive
with -w). Implies -L in terminal output mode except
when -N is specified.
--infer-lines INFER_LINES
Number of rows used to infer column types and sizes
for pretty output (markdown, terminal, etc.). Use
"inf" for no limit (infer on all rows). (default:
1000)
-L, --less Pipe output through less (only when writing to stdout
on a tty). Automatically activated if the number of
lines exceed the terminal size (except when '--no-
less' or '-N' is used).
-N, --no-less Never output through less (only when writing to stdout
on a tty).
Processing options:
-l, --left Perform left join. If more than two files are
provided, each join in a left join. Can be used with
`-r` to obtain a outer join.
-r, --right Perform right join. If more than two files are
provided, each join in a right join. Can be used with
`-l` to obtain a outer join.
-e, --empty Indicate than empty field have to be considered as a
value.
Examples:
Operate NATURAL JOIN on two csv files:
csvspoon join file1.csv file2.csv
Operate two NATURAL JOIN on three csv files:
csvspoon join file1.csv file2.csv file3.csv
Operate LEFT JOIN on two csv files
csvspoon join -l file1.csv file2.csv
Operate RIGHT JOIN on two csv files
csvspoon join -r file1.csv file2.csv
Operate OUTER JOIN on two csv files
csvspoon join -lr file1.csv file2.csv
csvspoon sample
usage: csvspoon sample [-h] [-d DELIM] [-c INPUTENC] [-o OUTPUT] [-f FORMAT]
[-O COLS] [-F OUTPUT_FORMAT] [-u ODELIM] [-C OUTPUTENC]
[-H N | -T N] [-D] [-w TEXTWIDTH | -S]
[--infer-lines INFER_LINES] [-L | -N] [-W COL] [-r |
-R] [--seed SEED] [-k K]
[input]
Random sampling in streaming.
Two modes: without replacement (default) returns at most k distinct rows; with
replacement (-r) returns exactly k rows (each row may appear multiple times).
This method stores only in memory up to k rows.
positional arguments:
input Input file specification. If no input file is
provided, stdin is used as input file. Can be a
filename (e.g. "file.csv"), or a filename followed by
a semicolon and column names separated by commas (e.g.
"file.csv:a_colname,another_colname"). A column can be
renamed with "new_name=old_name" (e.g.
"file.csv:new_col=old_col"). When column names are
specified (and no exclusion is used), only those
columns are used, in the given order. Prefixing a
column with "-" excludes it. When at least one column
is excluded, all columns from the file are kept except
the excluded ones; columns that are also listed
(possibly with a rename) appear under their (new)
name. Excluded columns cannot be renamed (no "=" after
"-col").
options:
-h, --help show this help message and exit
Input options:
-d, --delim DELIM Input delimiter. (default: ',')
-c, --inputenc INPUTENC
Input encoding. (default: 'utf8')
Output options:
-o, --output OUTPUT Output file, else output on stdout.
-f, --format FORMAT Apply a format on a column on output. The argument
must be a column name followed by a colon and a format
specifier. e.g. "a_colname:5d" or "a_colname:+07.2f".
This option can be specified multiple time to format
different columns.
-O, --output-columns COLS
Output only the given columns, in that order. Same
format as after ':' in the input: column names,
optional renames (newname=oldname), optional
exclusions (-col). e.g. "a,b,label=id,-skip". These
transformations are applied after all other processing
(including the --format option). When excluding
columns (-col), use -O-col or --output-columns=-col
(no space, or use =) so that -col is not parsed as a
separate option.
-F, --output-format OUTPUT_FORMAT
Output format. Choices: auto, csv, terminal, ascii,
unicode, markdown (or md), latex (or tex). auto
(default): CSV except when output on stdout and stdout
is a tty, then terminal. csv: Raw CSV using the output
delimiter (-u/--output-delim). terminal: Pretty-print
with Unicode box-drawing for a nice display. ascii:
Pretty-print as text table with ASCII characters only.
unicode: Pretty-print as text table with Unicode box-
drawing characters. markdown (md): Pretty-print as a
Markdown table. latex (tex): Pretty-print as a LaTeX
tabular (booktabs).
-u, --output-delim ODELIM
Output delimiter. (default: ',')
-C, --outputenc OUTPUTENC
Output encoding. (default: 'utf8')
-H, --head N Output only the first N rows.
-T, --tail N Output only the last N rows.
-D, --dark-background
Use dark background colors for terminal output. This
option only has an effect when terminal pretty-print
is used.
-w, --textwidth TEXTWIDTH
Maximum total width of the table in characters
(including borders and padding) for pretty output
(terminal, ascii, unicode, markdown). Content is
truncated with "…" when needed. (default: terminal
width when stdout is a tty, else no limit)
-S, --keep-long-lines
Do not wrap or truncate long lines (mutually exclusive
with -w). Implies -L in terminal output mode except
when -N is specified.
--infer-lines INFER_LINES
Number of rows used to infer column types and sizes
for pretty output (markdown, terminal, etc.). Use
"inf" for no limit (infer on all rows). (default:
1000)
-L, --less Pipe output through less (only when writing to stdout
on a tty). Automatically activated if the number of
lines exceed the terminal size (except when '--no-
less' or '-N' is used).
-N, --no-less Never output through less (only when writing to stdout
on a tty).
Processing options:
-W, --weight COL Column name used as weight. Non-numeric or negative
values are treated as zero. If not set, sampling is
uniform.
-r, --with-replacement
Sample with replacement: output exactly k rows (rows
may repeat).
-R, --revert Return the complement of sampling: the k lines left
after selecting all others except k. Incompatible with
-r (with replacement).
--seed SEED Random seed for reproducibility (any string of
characters).
-k K Number of rows to return. Without replacement: at most
k rows; if the input has fewer than k lines, all are
selected. With replacement (-r): exactly k rows.
(default: 1)
Examples:
Sample 10 rows without replacement:
csvspoon sample -k 10 file.csv
Weighted sample of 10 rows using column weight:
csvspoon sample -W weight -k 10 file.csv
Sample 30 rows with replacement:
csvspoon sample -r -k 30 file.csv
csvspoon sort
usage: csvspoon sort [-h] [-d DELIM] [-c INPUTENC] [-o OUTPUT] [-f FORMAT]
[-O COLS] [-F OUTPUT_FORMAT] [-u ODELIM] [-C OUTPUTENC]
[-H N | -T N] [-D] [-w TEXTWIDTH | -S]
[--infer-lines INFER_LINES] [-L | -N] [-k KEYS] [-n] [-r]
[-R] [--seed SEED]
[input]
Sort csv file.
Warning: this method need to store in memory all the input csv file.
positional arguments:
input Input file specification. If no input file is
provided, stdin is used as input file. Can be a
filename (e.g. "file.csv"), or a filename followed by
a semicolon and column names separated by commas (e.g.
"file.csv:a_colname,another_colname"). A column can be
renamed with "new_name=old_name" (e.g.
"file.csv:new_col=old_col"). When column names are
specified (and no exclusion is used), only those
columns are used, in the given order. Prefixing a
column with "-" excludes it. When at least one column
is excluded, all columns from the file are kept except
the excluded ones; columns that are also listed
(possibly with a rename) appear under their (new)
name. Excluded columns cannot be renamed (no "=" after
"-col").
options:
-h, --help show this help message and exit
Input options:
-d, --delim DELIM Input delimiter. (default: ',')
-c, --inputenc INPUTENC
Input encoding. (default: 'utf8')
Output options:
-o, --output OUTPUT Output file, else output on stdout.
-f, --format FORMAT Apply a format on a column on output. The argument
must be a column name followed by a colon and a format
specifier. e.g. "a_colname:5d" or "a_colname:+07.2f".
This option can be specified multiple time to format
different columns.
-O, --output-columns COLS
Output only the given columns, in that order. Same
format as after ':' in the input: column names,
optional renames (newname=oldname), optional
exclusions (-col). e.g. "a,b,label=id,-skip". These
transformations are applied after all other processing
(including the --format option). When excluding
columns (-col), use -O-col or --output-columns=-col
(no space, or use =) so that -col is not parsed as a
separate option.
-F, --output-format OUTPUT_FORMAT
Output format. Choices: auto, csv, terminal, ascii,
unicode, markdown (or md), latex (or tex). auto
(default): CSV except when output on stdout and stdout
is a tty, then terminal. csv: Raw CSV using the output
delimiter (-u/--output-delim). terminal: Pretty-print
with Unicode box-drawing for a nice display. ascii:
Pretty-print as text table with ASCII characters only.
unicode: Pretty-print as text table with Unicode box-
drawing characters. markdown (md): Pretty-print as a
Markdown table. latex (tex): Pretty-print as a LaTeX
tabular (booktabs).
-u, --output-delim ODELIM
Output delimiter. (default: ',')
-C, --outputenc OUTPUTENC
Output encoding. (default: 'utf8')
-H, --head N Output only the first N rows.
-T, --tail N Output only the last N rows.
-D, --dark-background
Use dark background colors for terminal output. This
option only has an effect when terminal pretty-print
is used.
-w, --textwidth TEXTWIDTH
Maximum total width of the table in characters
(including borders and padding) for pretty output
(terminal, ascii, unicode, markdown). Content is
truncated with "…" when needed. (default: terminal
width when stdout is a tty, else no limit)
-S, --keep-long-lines
Do not wrap or truncate long lines (mutually exclusive
with -w). Implies -L in terminal output mode except
when -N is specified.
--infer-lines INFER_LINES
Number of rows used to infer column types and sizes
for pretty output (markdown, terminal, etc.). Use
"inf" for no limit (infer on all rows). (default:
1000)
-L, --less Pipe output through less (only when writing to stdout
on a tty). Automatically activated if the number of
lines exceed the terminal size (except when '--no-
less' or '-N' is used).
-N, --no-less Never output through less (only when writing to stdout
on a tty).
Processing options:
-k, --key KEYS Column used for sorting. Can be specified multiple
time.
-n, --numeric-sort Compare according to numerical value.
-r, --reverse Reverse the result of comparisons.
-R, --random-sort Shuffle. If key specified, shuffle is performed inside
lines with the same key.
--seed SEED Random seed for reproducibility (any string of
characters).
Examples:
Sort csv file using column cola:
csvspoon sort -k cola file.csv
Sort csv file using columns cola and colb:
csvspoon sort -k cola -k colb file.csv
Sort csv file using numerical mode on column numcol:
csvspoon sort -n -k numcol file.csv
Shuffle csv file:
csvspoon sort -R file.csv
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
csvspoon-1.3.1.tar.gz
(49.0 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
csvspoon-1.3.1-py3-none-any.whl
(58.3 kB
view details)
File details
Details for the file csvspoon-1.3.1.tar.gz.
File metadata
- Download URL: csvspoon-1.3.1.tar.gz
- Upload date:
- Size: 49.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
aeb4fa177d894c183868a0857d40257869ab3f6757019bbf9c94ad8043740ef3
|
|
| MD5 |
fe8f192c9ae825b99fc21112a59ab34a
|
|
| BLAKE2b-256 |
6cde3fe148bebd82d7fce7b9432910dd45354c3366e338edf20fecc7e68b8acf
|
File details
Details for the file csvspoon-1.3.1-py3-none-any.whl.
File metadata
- Download URL: csvspoon-1.3.1-py3-none-any.whl
- Upload date:
- Size: 58.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
28c20615f845a53610d4426c8e5a10f418f8c3af0e5651900de7288da35bc391
|
|
| MD5 |
49c01ff525af8a5123f8b6f9ca7d969e
|
|
| BLAKE2b-256 |
0cf58b46864f658414d06d5fb4f4034ffd7afb1b06c97e43f8b2db4aa1d824e0
|