Skip to main content

Vladiate is a strict validation tool for CSV files

Project description

https://travis-ci.org/di/vladiate.svg?branch=master https://coveralls.io/repos/di/vladiate/badge.svg?branch=master Requirements Status

Description

Vladiate helps you write explicit assertions for every field of your CSV file.

Features

Write validation schemas in plain-old Python

No UI, no XML, no JSON, just code.

Write your own validators

Vladiate comes with a few by default, but there’s no reason you can’t write your own.

Validate multiple files at once

Either with the same schema, or different ones.

Documentation

Installation

Installing:

$ pip install vladiate

Quickstart

Below is an example of a vladfile.py

from vladiate import Vlad
from vladiate.validators import UniqueValidator, SetValidator
from vladiate.inputs import LocalFile

class YourFirstValidator(Vlad):
    source = LocalFile('vampires.csv')
    validators = {
        'Column A': [
            UniqueValidator()
        ],
        'Column B': [
            SetValidator(['Vampire', 'Not A Vampire'])
        ]
    }

Here we define a number of validators for a local file vampires.csv, which would look like this:

Column A,Column B
Vlad the Impaler,Not A Vampire
Dracula,Vampire
Count Chocula,Vampire

We then run vladiate in the same directory as your .csv file:

$ vladiate

And get the following output:

Validating YourFirstValidator(source=LocalFile('vampires.csv'))
Passed! :)

Handling Changes

Let’s imagine that you’ve gotten a new CSV file, potential_vampires.csv, that looks like this:

Column A,Column B
Vlad the Impaler,Not A Vampire
Dracula,Vampire
Count Chocula,Vampire
Ronald Reagan,Maybe A Vampire

If we were to update our first validator to use this file as follows:

- class YourFirstValidator(Vlad):
-     source = LocalFile('vampires.csv')
+ class YourFirstFailingValidator(Vlad):
+     source = LocalFile('potential_vampires.csv')

we would get the following error:

Validating YourFirstFailingValidator(source=LocalFile('potential_vampires.csv'))
Failed :(
  SetValidator failed 1 time(s) (25.0%) on field: 'Column B'
    Invalid fields: ['Maybe A Vampire']

And we would know that we’d either need to sanitize this field, or add it to the SetValidator.

Starting from scratch

To make writing a new vladfile.py easy, Vladiate will give meaningful error messages.

Given the following as real_vampires.csv:

Column A,Column B,Column C
Vlad the Impaler,Not A Vampire
Dracula,Vampire
Count Chocula,Vampire
Ronald Reagan,Maybe A Vampire

We could write a bare-bones validator as follows:

class YourFirstEmptyValidator(Vlad):
    source = LocalFile('real_vampires.csv')
    validators = {}

Running this with vladiate would give the following error:

Validating YourFirstEmptyValidator(source=LocalFile('real_vampires.csv'))
Missing...
  Missing validators for:
    'Column A': [],
    'Column B': [],
    'Column C': [],

Vladiate expects something to be specified for every column, even if it is an empty list (more on this later). We can easily copy and paste from the error into our vladfile.py to make it:

class YourFirstEmptyValidator(Vlad):
    source = LocalFile('real_vampires.csv')
    validators = {
        'Column A': [],
        'Column B': [],
        'Column C': [],
    }

When we run this with vladiate, we get:

Validating YourSecondEmptyValidator(source=LocalFile('real_vampires.csv'))
Failed :(
  EmptyValidator failed 4 time(s) (100.0%) on field: 'Column A'
    Invalid fields: ['Dracula', 'Vlad the Impaler', 'Count Chocula', 'Ronald Reagan']
  EmptyValidator failed 4 time(s) (100.0%) on field: 'Column B'
    Invalid fields: ['Maybe A Vampire', 'Not A Vampire', 'Vampire']
  EmptyValidator failed 4 time(s) (100.0%) on field: 'Column C'
    Invalid fields: ['Real', 'Not Real']

This is because Vladiate interprets an empty list of validators for a field as an EmptyValidator, which expects an empty string in every field. This helps us make meaningful decisions when adding validators to our vladfile.py. It also ensures that we are not forgetting about a column or field which is not empty.

Built-in Validators

Vladiate comes with a few common validators built-in:

class Validator

Generic validator. Should be subclassed by any custom validators. Not to be used directly.

class CastValidator

Generic “can-be-cast-to-x” validator. Should be subclassed by any cast-test validator. Not to be used directly.

class IntValidator

Validates whether a field can be cast to an int type or not.

empty_ok=False:

Specify whether a field which is an empty string should be ignored.

class FloatValidator

Validates whether a field can be cast to an float type or not.

empty_ok=False:

Specify whether a field which is an empty string should be ignored.

class SetValidator

Validates whether a field is in the specified set of possible fields.

valid_set=[]:

List of valid possible fields

empty_ok=False:

Implicity adds the empty string to the specified set.

class UniqueValidator

Ensures that a given field is not repeated in any other column. Can optionally determine “uniqueness” with other fields in the row as well via unique_with.

unique_with=[]:

List of field names to make the primary field unique with.

class RegexValidator

Validates whether a field matches the given regex using re.match().

pattern=r'di^':

The regex pattern. Fails for all fields by default.

class EmptyValidator

Ensure that a field is always empty. Essentially the same as an empty SetValidator. This is used by default when a field has no validators.

class Ignore

Always passes validation. Used to explicity ignore a given column.

Built-in Input Types

Vladiate comes with the following input types:

class VladInput

Generic input. Should be subclassed by any custom inputs. Not to be used directly.

class LocalFile

Read from a file local to the filesystem.

filename:

Path to a local CSV file.

class S3File

Read from a file in S3. Uses the boto library. Optionally can specify either a full path, or a bucket/key pair.

path=None:

A full S3 filepath (e.g., s3://foo.bar/path/to/file.csv)

bucket=None:

S3 bucket. Must be specified with a key.

key=None:

S3 key. Must be specified with a bucket.

class String

Read CSV from a string. Can take either an str or a StringIO.

:string_input=None

Regular Python string input.

:string_io=None

StringIO input.

Running Vlads Programatically

class Vlad

Initialize a Vlad programatically

source:

Required. Any VladInput.

validators={}:

List of validators. Optional, defaults to the class variable validators if set, otherwise uses EmptyValidator for all fields.

delimiter=',':

The delimiter used within your csv source. Optional, defaults to ,.

ignore_missing_validators=False:

Whether to fail validation if there are fields in the file for which the Vlad does not have validators. Optional, defaults to False.

For example:

from vladiate import Vlad
from vladiate.inputs import LocalFile
Vlad(source=LocalFile('path/to/local/file.csv').validate()

Testing

To run the tests:

make test

To run the linter:

make lint

Command Line Arguments

Usage: vladiate [options] [VladClass [VladClass2 ... ]]

Options:
  -h, --help            show this help message and exit
  -f VLADFILE, --vladfile=VLADFILE
                        Python module file to import, e.g. '../other.py'.
                        Default: vladfile
  -l, --list            Show list of possible vladiate classes and exit
  -V, --version         show version number and exit
  -p PROCESSES, --processes=PROCESSES
                        attempt to use this number of processes, Default: 1

Authors

License

Open source MIT license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vladiate-0.0.14.tar.gz (16.9 kB view details)

Uploaded Source

Built Distributions

vladiate-0.0.14-py3.6.egg (37.5 kB view details)

Uploaded Source

vladiate-0.0.14-py2-none-any.whl (20.4 kB view details)

Uploaded Python 2

File details

Details for the file vladiate-0.0.14.tar.gz.

File metadata

  • Download URL: vladiate-0.0.14.tar.gz
  • Upload date:
  • Size: 16.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for vladiate-0.0.14.tar.gz
Algorithm Hash digest
SHA256 2d4b5e6ca4c3de5bcd1c29ed5499ec516b1144a277b4fdc8b092dc50c2cfe617
MD5 8e9715dd4060e389b8de39a39caac1c8
BLAKE2b-256 7938a50008295f858f48246b0e7326b1c896e6ca52fcc590eeb9db9745c047a5

See more details on using hashes here.

File details

Details for the file vladiate-0.0.14-py3.6.egg.

File metadata

File hashes

Hashes for vladiate-0.0.14-py3.6.egg
Algorithm Hash digest
SHA256 d30b3311f853f364ffefb986b23506abb2a0fb65e38066730b37e14f75c9c948
MD5 207c86ddaa285b8522db607741e1394f
BLAKE2b-256 b0f9eed2648030278903efc50ab914a33dd029882c68d2ec73bf744283f49571

See more details on using hashes here.

File details

Details for the file vladiate-0.0.14-py2-none-any.whl.

File metadata

File hashes

Hashes for vladiate-0.0.14-py2-none-any.whl
Algorithm Hash digest
SHA256 94622bb70a627cc9c76aa087b22c0f26671fb7287852c446a04c324bc12e6db0
MD5 130ac1f0885aebb8aee363d41d55e2c0
BLAKE2b-256 24496f41ba660b75275d4746b7f30f4c8d0e2217b1c48275448e830c61e3efce

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page