Skip to main content

DEVO — CSV to iCSV enrichment and Frictionless validation

Project description

DEVO

Data Enrichment and Validation Operator. Takes a plain CSV, infers types and constraints, writes a self-documenting iCSV file plus a Frictionless schema, and validates the data against it.

If you give it a .csv, it enriches → schema → validates. If you give it an .icsv, it skips enrichment.

Install from PyPI

pip install py-devo

Install from local cloned repository

pip install -e .

For the Flask web demo:

pip install -e ".[webui]"

Requires Python 3.9+ and frictionless (v4 or v5).

Try it out

A small sample dataset lives at examples/sample.csv — three columns (timestamp, PSUM, TA) representing hourly weather observations. Use it to take DEVO for a spin without needing your own data.

CLI

# Enrich, build schema, and validate in one command
devo run examples/sample.csv

# Results land in DEVO_output/ by default:
#   sample.icsv               — annotated iCSV
#   sample_schema.json        — Frictionless Table Schema
#   sample_DEVO_report.txt    — human-readable validation report

Run devo run examples/sample.csv --out my_output to write to a different directory.

Python

from devo.enrich import ICSVEnricher
from devo.validate import validate_icsv

icsv, schema = ICSVEnricher().make_icsv("examples/sample.csv", "DEVO_output")
report_path, valid = validate_icsv(icsv, schema_path=schema)

print(f"Valid: {valid}")
print(f"Report written to: {report_path}")

Web demo

Install the optional Flask dependency first (if you haven't already):

pip install -e ".[webui]"

Start the local server:

flask --app devo.webui run

Then open http://127.0.0.1:5000 in your browser. Click Choose File, select examples/sample.csv, and click Upload. The page will display the paths to the generated iCSV, schema, and report, along with the overall Valid result.

The web UI is a local demo only — do not expose it to a network.


CLI

devo enrich   data.csv                    # write data.icsv + data_schema.json
devo validate data.icsv                   # validate against neighbouring schema
devo run      data.csv                    # do both in one go

Common flags: --out DIR (default DEVO_output/), --delimiter CHAR, --nodata VALUE, --app PROFILE, --schema PATH.

Exit codes: 0 = success, 1 = validation failed, 2 = usage or runtime error.

What lands on disk

For input data.csv, after devo run:

File What
DEVO_output/data.icsv iCSV with # [METADATA], # [FIELDS], # [DATA]
DEVO_output/data_schema.json Frictionless Table Schema JSON
DEVO_output/data_DEVO_report.txt Validation report (read this)

Python API

from devo.enrich import ICSVEnricher
from devo.validate import validate_icsv

icsv, schema = ICSVEnricher().make_icsv("data.csv", "DEVO_output")
report_path, valid = validate_icsv(icsv, schema_path=schema)

Files

devo/
├── cli.py          # argparse front-end (enrich / validate / run)
├── enrich.py       # CSV → iCSV + schema (ICSVEnricher class)
├── validate.py     # iCSV + schema → Frictionless validation + report
├── _infer.py       # pure type-inference functions (shared by enrich + validate)
├── _parser.py      # iCSV header parser (shared by enrich + validate)
├── _schema.py      # per-column statistics + Frictionless schema builder
├── _report.py      # plain-text report writer
├── exceptions.py   # DEVOError hierarchy
└── webui.py        # Flask demo (optional; requires pip install -e ".[webui]")
tests/
├── conftest.py
├── fixtures/       # sample CSV and iCSV files
└── test_*.py

How it works

Enrichment (devo enrich)

  1. Read — the CSV is read in one pass. If no --delimiter is given, csv.Sniffer detects it from the first 10 lines.
  2. Delimiter mapping — comma is remapped to pipe in the iCSV output (pipe is also the default fallback for non-spec delimiters). Column names that contain the output delimiter are rejected with a clear error.
  3. Normalisation — every row is padded or clipped to header length and stripped of leading/trailing whitespace.
  4. Type inference — each column is classified: integer → number → datetime → string. Scientific notation (1.5e-3, 2E10) is recognised as number. Missing-value sentinels (and any custom --nodata value) are excluded before inference.
  5. Statistics — per-column min, max, and missing_count are computed from the normalised data and written to the iCSV # [FIELDS] section. They do not appear in the Frictionless schema JSON.
  6. Geometry detection — if the header contains lat/latitude + lon/lng/longitude, DEVO writes geometry = column:lat,lon and srid = EPSG:4326 to metadata. A single column named geometry (WKT) gets geometry = column:geometry only — no srid, because WKT embeds its own CRS.
  7. Write — the normalised rows are written to the iCSV # [DATA] section, and the Frictionless schema is written to _schema.json.

Validation (devo validate)

  1. Parse header_parser.py reads the # [METADATA] and # [FIELDS] sections, using field_delimiter from metadata to split field values.
  2. Metadata check — required keys are verified. geometry and srid are only checked when spatial column names are present; srid is only required for lat/lon columns (not WKT).
  3. Type cross-check (Option A) — column types are re-inferred from up to 500 data rows and compared to the declared types. The iCSV's own nodata sentinel is merged with the standard missing-value set before re-inference so custom sentinels are not mistaken for real data. Inferred type narrower than or equal to declared → [OK]. Inferred wider → [WARN].
  4. Frictionless validation — data is written to a temporary comma-delimited CSV and validated against the schema using frictionless.Resource. The temp file is always deleted in a finally block.
  5. Report — a plain-text .txt report is written with three sections: METADATA, TYPE CONSISTENCY, and DATA VALIDATION. Valid: YES only when metadata has no [FAIL] entries and Frictionless reports no data errors. Type warnings do not affect the valid flag.

Limitations

  • Type inference is conservative: integer → number → datetime → string. Mixed-format columns fall back to string.
  • Datetime detection uses datetime.fromisoformat() and a fixed list of common strptime formats. Unusual formats need a custom schema.
  • Column descriptions are left blank in the iCSV # [FIELDS] section; fill them in by hand.
  • The web UI (webui.py) is a local demo only — do not expose it to a network.

License

MIT. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

py_devo-0.2.1.tar.gz (22.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

py_devo-0.2.1-py3-none-any.whl (19.5 kB view details)

Uploaded Python 3

File details

Details for the file py_devo-0.2.1.tar.gz.

File metadata

  • Download URL: py_devo-0.2.1.tar.gz
  • Upload date:
  • Size: 22.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for py_devo-0.2.1.tar.gz
Algorithm Hash digest
SHA256 c682292bc5ca396f392cdbb5226c35f20bcdb005c2e8dd7b46aa0c3ebbccd1ea
MD5 987de7262c60a18819a157fea7f1992e
BLAKE2b-256 f3f6d013f49b48095763faec8198edb331d5872e799eb74371e1e99599d99eb1

See more details on using hashes here.

File details

Details for the file py_devo-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: py_devo-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 19.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for py_devo-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 5dcc5917a8f33dc90f9375f8c35afb57746ca1925e06a48f5db41076e62aaa52
MD5 b0bf3cd3fd6b0802df9c7ed70e2bfa6e
BLAKE2b-256 9a924007d32b22bd0e6e83de4d0a62c21c2c53c97b81ee914843c5e666e4c167

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page