CSV GP allows you to pinpoint these common issues with a CSV file, as well as export just the parsable lines from a file.
Project description
CSV GP: Diagnose all your CSV issues
CSVs are a ubiquitous format for data transfer that are commonly riddled with issues. Most CSV libraries abort with an unhelpful error, CSV GP allows you to pinpoint these common issues with a CSV file, as well as export just the parsable lines from a file.
Installation
CSV GP can be used in three ways.
Standalone binary
- Install rust
- Clone the repo and navigate into it
- Run
cargo install --path csv_gp - The
csv-gpcommand will now be available to run, please seecsv-gp --helpfor usage
Rust library
Add the following to your Cargo.toml:
csv-gp = { git = "https://github.com/xelixdev/csv-gp", rev = "<optional git tag>" }
Python library
From package manager
The library is available on PyPI, at https://pypi.org/project/csv-gp/ so you can just run:
pip install csv-gp
Compiling from source
- Install rust
- Install (
pip install maturin) - Clone the repo
- Run
make all cd csv_gp_python && maturin develop
Usage
Rust standalone binary
After installing the binary, the default usage is running csv-gp $FILE. This will print a diagnosis of the file. The command provides options to change the delimiter and the encoding of the file. See csv-gp -h for details.
Another option provided is --correct-rows-path which will export only the correct rows to the provided path.
Python library
The python library exposes two main functions, check_file and get_rows.
The check file function takes a path to file, the delimiter and the encoding (see https://github.com/xelixdev/csv-gp/blob/0f77c62841509c134a3bbe06ec178426e9c5aa10/csv_gp_python/csv_gp.pyi) and returns an instance of a class CSVDetails which provides details about the file. See the same file to see all the available attributes and their names/types.
If the valid_rows_output_path argument is provided to the function, only the correct rows will be exported to that path.
The get_rows once again takes a path to file, the delimiter and the encoding and additionally a list of row numbers. The function will then return the parsed cells for given rows. See the above file for the exact typing of the parameter and returned values.
Releasing a new version of the Python lib
- Update version numbers in
csv_gp_python/Cargo.toml,csv_go/Cargo.toml, andcsv_gp_python/pyproject.toml - Run
cargo checkto update the lock files with new versions - Merge this change into main
- Create a new release on GitHub, creating a tag in the form
vX.Y.Z - The 'Publish' pipeline should begin running, and the new version will be published
Running tests
Running Rust tests
Run cargo test.
Running Python tests
Follow the instructions on compiling from source. Then you can run pytest.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file csv_gp-0.4.0.tar.gz.
File metadata
- Download URL: csv_gp-0.4.0.tar.gz
- Upload date:
- Size: 24.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
98ba7bc4e8b5faa42d984c79d9dc0d3487f82158ad35ad8cc38a22c5b68b60e0
|
|
| MD5 |
ffb106dc21dec3a35384448b511560c3
|
|
| BLAKE2b-256 |
9ab247d4dec3cef1b1acd928c5f2c6bc2872996e98bd75cf9f9d8b700a827415
|
Provenance
The following attestation bundles were made for csv_gp-0.4.0.tar.gz:
Publisher:
publish.yml on xelixdev/csv-gp
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
csv_gp-0.4.0.tar.gz -
Subject digest:
98ba7bc4e8b5faa42d984c79d9dc0d3487f82158ad35ad8cc38a22c5b68b60e0 - Sigstore transparency entry: 904894876
- Sigstore integration time:
-
Permalink:
xelixdev/csv-gp@5d365fc5fdb06502fe86d88826fa1c02ee75f573 -
Branch / Tag:
refs/tags/v0.4.0 - Owner: https://github.com/xelixdev
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@5d365fc5fdb06502fe86d88826fa1c02ee75f573 -
Trigger Event:
push
-
Statement type:
File details
Details for the file csv_gp-0.4.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: csv_gp-0.4.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 479.4 kB
- Tags: CPython 3.9+, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
68e7c2b9bc2ab59753c6b8da383f36a1e7ca6704ac4c397b60577795df5819b4
|
|
| MD5 |
d4af510bdc1b1d2482bd56399172af73
|
|
| BLAKE2b-256 |
2e7841c0e1a7f8c8b06fca2f94e45b2b04d6d8c6c82bd5807e2e093f5d3633b9
|
Provenance
The following attestation bundles were made for csv_gp-0.4.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:
Publisher:
publish.yml on xelixdev/csv-gp
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
csv_gp-0.4.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl -
Subject digest:
68e7c2b9bc2ab59753c6b8da383f36a1e7ca6704ac4c397b60577795df5819b4 - Sigstore transparency entry: 904894945
- Sigstore integration time:
-
Permalink:
xelixdev/csv-gp@5d365fc5fdb06502fe86d88826fa1c02ee75f573 -
Branch / Tag:
refs/tags/v0.4.0 - Owner: https://github.com/xelixdev
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@5d365fc5fdb06502fe86d88826fa1c02ee75f573 -
Trigger Event:
push
-
Statement type: