DEVO — CSV to iCSV enrichment and Frictionless validation
Project description
DEVO
Data Enrichment and Validation Operator. Takes a plain CSV, infers types and constraints, writes a self-documenting iCSV file plus a Frictionless schema, and validates the data against it.
If you give it a .csv, it enriches → schema → validates. If you give it an .icsv, it skips enrichment.
Install from PyPI
pip install py-devo
Install from local cloned repository
pip install -e .
For the Flask web demo:
pip install -e ".[webui]"
Requires Python 3.9+ and frictionless (v4 or v5).
Try it out
A small sample dataset lives at examples/sample.csv — three columns (timestamp, PSUM, TA) representing hourly weather observations. Use it to take DEVO for a spin without needing your own data.
CLI
# Enrich, build schema, and validate in one command
devo run examples/sample.csv
# Results land in DEVO_output/ by default:
# sample.icsv — annotated iCSV
# sample_schema.json — Frictionless Table Schema
# sample_DEVO_report.txt — human-readable validation report
Run devo run examples/sample.csv --out my_output to write to a different directory.
Python
from devo.enrich import ICSVEnricher
from devo.validate import validate_icsv
icsv, schema = ICSVEnricher().make_icsv("examples/sample.csv", "DEVO_output")
report_path, valid = validate_icsv(icsv, schema_path=schema)
print(f"Valid: {valid}")
print(f"Report written to: {report_path}")
Web demo
Install the optional Flask dependency first (if you haven't already):
pip install -e ".[webui]"
Start the local server:
flask --app devo.webui run
Then open http://127.0.0.1:5000 in your browser. Click Choose File, select examples/sample.csv, and click Upload. The page will display the paths to the generated iCSV, schema, and report, along with the overall Valid result.
The web UI is a local demo only — do not expose it to a network.
CLI
devo enrich data.csv # write data.icsv + data_schema.json
devo validate data.icsv # validate against neighbouring schema
devo run data.csv # do both in one go
Common flags: --out DIR (default DEVO_output/), --delimiter CHAR, --nodata VALUE, --app PROFILE, --schema PATH.
Exit codes: 0 = success, 1 = validation failed, 2 = usage or runtime error.
What lands on disk
For input data.csv, after devo run:
| File | What |
|---|---|
DEVO_output/data.icsv |
iCSV with # [METADATA], # [FIELDS], # [DATA] |
DEVO_output/data_schema.json |
Frictionless Table Schema JSON |
DEVO_output/data_DEVO_report.txt |
Validation report (read this) |
Python API
from devo.enrich import ICSVEnricher
from devo.validate import validate_icsv
icsv, schema = ICSVEnricher().make_icsv("data.csv", "DEVO_output")
report_path, valid = validate_icsv(icsv, schema_path=schema)
Files
devo/
├── cli.py # argparse front-end (enrich / validate / run)
├── enrich.py # CSV → iCSV + schema (ICSVEnricher class)
├── validate.py # iCSV + schema → Frictionless validation + report
├── _infer.py # pure type-inference functions (shared by enrich + validate)
├── _parser.py # iCSV header parser (shared by enrich + validate)
├── _schema.py # per-column statistics + Frictionless schema builder
├── _report.py # plain-text report writer
├── exceptions.py # DEVOError hierarchy
└── webui.py # Flask demo (optional; requires pip install -e ".[webui]")
tests/
├── conftest.py
├── fixtures/ # sample CSV and iCSV files
└── test_*.py
How it works
Enrichment (devo enrich)
- Read — the CSV is read in one pass. If no
--delimiteris given,csv.Snifferdetects it from the first 10 lines. - Delimiter mapping — comma is remapped to pipe in the iCSV output (pipe is also the default fallback for non-spec delimiters). Column names that contain the output delimiter are rejected with a clear error.
- Normalisation — every row is padded or clipped to header length and stripped of leading/trailing whitespace.
- Type inference — each column is classified:
integer → number → datetime → string. Scientific notation (1.5e-3,2E10) is recognised asnumber. Missing-value sentinels (and any custom--nodatavalue) are excluded before inference. - Statistics — per-column
min,max, andmissing_countare computed from the normalised data and written to the iCSV# [FIELDS]section. They do not appear in the Frictionless schema JSON. - Geometry detection — if the header contains
lat/latitude+lon/lng/longitude, DEVO writesgeometry = column:lat,lonandsrid = EPSG:4326to metadata. A single column namedgeometry(WKT) getsgeometry = column:geometryonly — nosrid, because WKT embeds its own CRS. - Write — the normalised rows are written to the iCSV
# [DATA]section, and the Frictionless schema is written to_schema.json.
Validation (devo validate)
- Parse header —
_parser.pyreads the# [METADATA]and# [FIELDS]sections, usingfield_delimiterfrom metadata to split field values. - Metadata check — required keys are verified.
geometryandsridare only checked when spatial column names are present;sridis only required for lat/lon columns (not WKT). - Type cross-check (Option A) — column types are re-inferred from up to 500 data rows and compared to the declared types. The iCSV's own
nodatasentinel is merged with the standard missing-value set before re-inference so custom sentinels are not mistaken for real data. Inferred type narrower than or equal to declared →[OK]. Inferred wider →[WARN]. - Frictionless validation — data is written to a temporary comma-delimited CSV and validated against the schema using
frictionless.Resource. The temp file is always deleted in afinallyblock. - Report — a plain-text
.txtreport is written with three sections:METADATA,TYPE CONSISTENCY, andDATA VALIDATION.Valid: YESonly when metadata has no[FAIL]entries and Frictionless reports no data errors. Type warnings do not affect the valid flag.
Limitations
- Type inference is conservative:
integer → number → datetime → string. Mixed-format columns fall back tostring. - Datetime detection uses
datetime.fromisoformat()and a fixed list of common strptime formats. Unusual formats need a custom schema. - Column descriptions are left blank in the iCSV
# [FIELDS]section; fill them in by hand. - The web UI (
webui.py) is a local demo only — do not expose it to a network.
License
MIT. See LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file py_devo-0.2.1.tar.gz.
File metadata
- Download URL: py_devo-0.2.1.tar.gz
- Upload date:
- Size: 22.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c682292bc5ca396f392cdbb5226c35f20bcdb005c2e8dd7b46aa0c3ebbccd1ea
|
|
| MD5 |
987de7262c60a18819a157fea7f1992e
|
|
| BLAKE2b-256 |
f3f6d013f49b48095763faec8198edb331d5872e799eb74371e1e99599d99eb1
|
File details
Details for the file py_devo-0.2.1-py3-none-any.whl.
File metadata
- Download URL: py_devo-0.2.1-py3-none-any.whl
- Upload date:
- Size: 19.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5dcc5917a8f33dc90f9375f8c35afb57746ca1925e06a48f5db41076e62aaa52
|
|
| MD5 |
b0bf3cd3fd6b0802df9c7ed70e2bfa6e
|
|
| BLAKE2b-256 |
9a924007d32b22bd0e6e83de4d0a62c21c2c53c97b81ee914843c5e666e4c167
|