Skip to main content

A small tool to merge CSV/TSV files

Project description

combine_csv

Based on an idea from https://github.com/ekapope/Combine-CSV-files-in-the-folder/blob/master/Combine_CSVs.py, this small script simply focus on merging CSV/TSV files, by combining either lines or column.

Item Project site
Source https://github.com/gmtsciencedev/combine_csv
Documentation https://combine_csv.readthedocs.io/
Download https://pypi.org/project/combine-csv/
Keywords python, csv, merge, combine

Basic usage

The tool can be used either :

  • in the line mode (default) which use all different CSV to create new lines in a merged CSV,
  • or in the column mode (using flag -c) which use all different CSV to add new columns, using the first column as an index in all files.

Line mode

combine_csv -i '*.csv' -o my_merged_csv.csv

Thus if folder contains:

1.csv

name,age
Jean,23
Paul,12

2.csv

name,age,sex
Jane,19,female
John,74,male

It will create this file: my_merged_csv.csv

name,age,sex
Jean,23,
Paul,12,
Jane,19,female
John,74,male

Column mode

combine_csv -c -i '*.csv' -o my_merged_csv.csv

Thus if folder contains: 1.csv

task_id,name,desc
1,create,create a new object
2,delete,delete an object

2.csv

task_id,program
1,create.py
2,delete.py
3,random.py

It will create this file: my_merged_csv.csv

task_id,name,desc,program
1,create,create a new object,create.py
2,delete,delete an object,delete.py
3,,,random.py

Main options

See command line combine_csv -h for all options. Here we would like to point the most convenient ones.

As you have seen -i is the input selector which takes a python glob.glob pattern (protect it with single quotes as in the examples above to prevent shell interpretation), and -o give the name of the file (which default to combine.csv)

-s --separator : Change the default field separator from , to whatever you need. For TSV file, say \t (add single quotes around to prevent backslash interpretation by shells, e.g. -s '\t' or -s \\t). This separator will be used to read input files and to write the output file. You can choose to have a different output separator with the -t option which behaves likewise.

-a --addname : Add the name of the input files (without extension). In line mode, this will add a new column named source (which name can be changed with the --source-column option) containing the name of the files. In column mode, this will add the names to non-index columns preceded by an underscore.

Split

Base usage

In version 2.0 a new utility was introduced, split_csv, enabling to do the opposite of merging: splitting. It uses the same flags as above (-s for the input separator, -t for the output separator, -c for column mode).

For instance, with a sample CSV my.csv file like this:

x,a,b
1,0,1
2,0,1
3,1,0

Using the line mode like this:

split_csv my.csv

Will produce three files:

my_1.csv :

x,a,b
1,0,1

my_2.csv :

x,a,b
2,0,1

my_3.csv:

x,a,b
3,1,0

While using the column mode with the -c option:

split_csv -c my.csv

will produce two files:

my_a.csv :

x,a
1,0
2,0
3,1

my_b.csv:

x,b
1,1
2,1
3,0

Controlling new file naming

The file naming uses a default pattern of {input}_{index}.{ext} which uses python F-string syntax. You can provide your own pattern with the -o option. Each bracket term is dynamically replaced during splitting:

  • input: input is replaced by the base name of the input file including any path if provided,
  • index: index is replaced by the current name of either the column or the line depending on the mode,
  • ext: ext is replaced by the extension (csv in the above example).

In the case of the line-split mode, the pattern may also use one of the column name: the term is replaced by the current value of the column for the line.

For instance, in the above example:

split_csv -o 'my{index}-{a}.csv' /tmp/test1/t.csv

Will create three files: my1-0.csv, my2-0.csv and my3-1.csv

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

combine_csv-2.0.1.tar.gz (7.1 kB view details)

Uploaded Source

Built Distribution

combine_csv-2.0.1-py3-none-any.whl (8.1 kB view details)

Uploaded Python 3

File details

Details for the file combine_csv-2.0.1.tar.gz.

File metadata

  • Download URL: combine_csv-2.0.1.tar.gz
  • Upload date:
  • Size: 7.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.6

File hashes

Hashes for combine_csv-2.0.1.tar.gz
Algorithm Hash digest
SHA256 ee4f42f7f9653436b4b08307f1980b8df50e704bf20fe64663a8142a1be302c8
MD5 6966c6dcf173c87053dcbae5e34f5913
BLAKE2b-256 d38f3d9bc28ef3d7223482c1a75c0b5ed1ff0ae71434c6b6c5b0d17c91217451

See more details on using hashes here.

File details

Details for the file combine_csv-2.0.1-py3-none-any.whl.

File metadata

  • Download URL: combine_csv-2.0.1-py3-none-any.whl
  • Upload date:
  • Size: 8.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.6

File hashes

Hashes for combine_csv-2.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 c7b2939a66fc389fe782198d644eb56e133a22c30e757335548d02850cc42626
MD5 32e8c8054bc47a781c70e419b213674c
BLAKE2b-256 158b4e4f54e0c76cb49742213cd6f3bda90ec6ac7f7bc7175d68c760d7b2f4d7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page