Aggregate CSV files
Project description
Agg
A Python library to aggregate files and data. This release supports merging two or more csv files.
Documentation
merge_csv(files_to_merge: tuple,
output_file: Union[str, pathlib.Path],
first_line_is_header: Optional[bool] = None) -> dict:
The method merge_csv
merges multiple CSV files in the order they are specified. It will overwrite any existing file with the same name.
Parameters:
files_to_merge
: A tuple containing paths to a files in the order they are to be merged.output_file
: The path to the result file. The folder must already exist. An existing file with the same name will be overwritten.first_line_is_header
: if True agg will remove the first line of all csv files except for the first. If not set agg will guess if the first line is a header or not.
Its return value is a dictionary containing:
- a SHA256 hash of the result file,
- the name of the result file,
- its absolute path,
- a boolean indicating whether the first line is a header or not,
- its size in bytes,
- its number of lines (including the header),
- a list of the files merged (absolute path).
Example
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import agg
# tuples are ordered:
my_files = ('file_01.csv', 'file_02.csv')
# Merge the CSV-files - in the order specified by the tuple - into a new file
# called "merged_file". Meanwhile copy the header / first line only once from
# first file.
merged_file = agg.merge_csv(my_files, 'merged_file', True)
# The return value is a dictionary!
print(merged_file)
# {'sha256hash': 'fff30942d3d042c5128062d1a29b2c50494c3d1d033749a58268d2e687fc98c6',
# 'file_name': 'merged_file',
# 'file_path': '/home/exampleuser/merged_file',
# 'first_line_is_header': True,
# 'file_size_bytes': 76,
# 'line_count': 8,
# 'merged_files': ['/home/exampleuser/file_01.csv',
# '/home/exampleuser/file_02.csv']
# }
print(merged_file['file_path'])
# '/home/exampleuser/merged_file'
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
agg-0.3.1.tar.gz
(4.5 kB
view hashes)
Built Distribution
agg-0.3.1-py3-none-any.whl
(9.2 kB
view hashes)