CSV GP allows you to pinpoint these common issues with a CSV file, as well as export just the parsable lines from a file.
Project description
CSV GP: Diagnose all your CSV issues
CSVs are a ubiquitous format for data transfer that are commonly riddled with issues. Most CSV libraries abort with an unhelpful error, CSV GP allows you to pinpoint these common issues with a CSV file, as well as export just the parsable lines from a file.
Installation
CSV GP can be used in three ways.
Standalone binary
- Install rust
- Clone the repo and navigate into it
- Run
cargo install --path csv_gp
- The
csv-gp
command will now be available to run, please seecsv-gp --help
for usage
Rust library
Add the following to your Cargo.toml
:
csv-gp = { git = "https://github.com/xelixdev/csv-gp", rev = "<optional git tag>" }
Python library
From package manager
The library is available on PyPI, at https://pypi.org/project/csv-gp/ so you can just run:
pip install csv-gp
Compiling from source
- Install rust
- Install (
pip install maturin
) - Clone the repo
- Run
make all
cd csv_gp_python && maturin develop
Usage
Rust standalone binary
After installing the binary, the default usage is running csv-gp $FILE
. This will print a diagnosis of the file. The command provides options to change the delimiter and the encoding of the file. See csv-gp -h
for details.
Another option provided is --correct-rows-path
which will export only the correct rows to the provided path.
Python library
The python library exposes two main functions, check_file
and get_rows
.
The check file function takes a path to file, the delimiter and the encoding (see https://github.com/xelixdev/csv-gp/blob/0f77c62841509c134a3bbe06ec178426e9c5aa10/csv_gp_python/csv_gp.pyi) and returns an instance of a class CSVDetails
which provides details about the file. See the same file to see all the available attributes and their names/types.
If the valid_rows_output_path
argument is provided to the function, only the correct rows will be exported to that path.
The get_rows once again takes a path to file, the delimiter and the encoding and additionally a list of row numbers. The function will then return the parsed cells for given rows. See the above file for the exact typing of the parameter and returned values.
Releasing a new version of the Python lib
- Update version numbers in
csv_gp_python/Cargo.toml
andcsv_gp/Cargo.toml
- Merge this change into main
- Create a new release on GitHub, creating a tag in the form
vX.Y.Z
- The 'Publish' pipeline should begin running, and the new version will be published
Running tests
Running Rust tests
Run cargo test
.
Running Python tests
Follow the instructions on compiling from source. Then you can run pytest
.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for csv_gp-0.2.1-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c5dbf4371629470b287d43642baa40802c2e3f9439cd1e716b4d180149ca9882 |
|
MD5 | 0a24b9b8619099580d8ad8dc24e4539f |
|
BLAKE2b-256 | 7c120d02e5b3ee4ec5b2f84ce881954262345ffc5324ecaea1cdbcdf95d1edbd |