validated reading of tabular files (CVS, Excel, ODS, PRN)
Project description
Cutplace is a tool and API to validate that tabular data stored in CSV, Excel, ODS and PRN files conform to a cutplace interface definition (CID).
As an example, consider the following customers.csv file that stores data about customers:
ID,surname,first_name,born,gender 3798,Miller,John,1978-11-27,male 19253,Webster Inc.,,1950-01-12, 46418,Jane,Doe,2003-06-29,female
A CID can describe such a file in an easy to read way. It consists of three sections. First, there is the general data format:
Property |
Value |
|
---|---|---|
D |
Format |
Delimited |
D |
Encoding |
UTF-8 |
D |
Header |
1 |
D |
Line delimiter |
LF |
D |
Item delimiter |
, |
Next there are the fields stored in the data file:
Name |
Example |
Empty |
Length |
Type |
Rule |
|
---|---|---|---|---|---|---|
F |
customer_id |
3798 |
Integer |
0…99999 |
||
F |
surname |
Miller |
…60 |
|||
F |
first_name |
John |
X |
…60 |
||
F |
date_of_birth |
1978-11-27 |
DateTime |
YYYY-MM-DD |
||
F |
gender |
male |
X |
Choice |
female, male |
Optionally you can describe conditions that must be met across the whole file:
Description |
Type |
Rule |
|
---|---|---|---|
C |
customer must be unique |
IsUnique |
customer_id |
The CID can be stored in common spreadsheet formats, in particular Excel and ODS, for example customers_cid.ods.
Cutplace can validate that the data file conforms to the CID:
$ cutplace customers_cid.ods customers.csv
Now add a new line with a broken date_of_birth:
73921,Harris,Diana,04.08.1913,female
Cutplace rejects this file with the error message:
customers.csv (R5C4): cannot accept field ‘date_of_birth’: date must match format YYYY-MM-DD (%Y-%m-%d) but is: ‘04.08.1913’
Additionally, cutplace provides an easy to use API to read and write tabular data files using a common interface without having to deal with the intrinsic of data format specific modules. To read and validate the above example:
import cutplace import cutplace.errors cid_path = 'customers_cid.ods' data_path = 'customers.csv' try: for row in cutplace.rows(cid_path, data_path): pass # We could also do something useful with the data in ``row`` here. except cutplace.errors.DataError as error: print(error)
For more information, read the documentation at http://cutplace.readthedocs.org/ or visit the project at https://github.com/roskakori/cutplace.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for cutplace-0.8.7-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e6b51fa0ea0634c7ffda9ce6db6cd92a41d5e19994c1b8975576e5b43ac60e13 |
|
MD5 | 6e4783522c73974a848ed44f380b089c |
|
BLAKE2b-256 | aa852f5e6d513c1d77d534b814048a315e51ce14ee9579dca7289d067285efaa |