CsvSchema·PyPI

Module for describing CSV data structure.

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 5 - Production/Stable
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 2.7
Topic
- Software Development
- Text Processing

Project description

CsvSchema is easy to use module designed to make CSV file checking easier. It allows to create more complex validation rules faster thanks to some predefined building blocks.

Basics

Like most similar modules build for CSV file checking CsvSchema allows you to describe how proper file should look like. In order to do that you will need to implement a subclass of csv_schema.structure.BaseCsvStructure.

Example:

from csv_schema.structure.base import BaseCsvStructure
from csv_schema.columns import (
  IntColumn,
  DecimalColumn,
  StringColumn,
)

class TestCsvStructure(BaseCsvStructure):

   a = StringColumn()
   b = IntColumn()
   c = DecimalColumn()

In above example we defined CSV schema class that represents file with three columns. First column may contain any kind of characters. Second allows only numerical values and third one may contain only decimal values.

NOTE:: Order of attributes in schema class is important. If you put b before a then column b in CSV file will be the first one. Of course this will change validation and first column will no longer allow values like e.g 'Python'

After defining your schema you can use it like this:

schema = TestCsvStructure(['A', '6', ''], 1)
schema.is_valid()

First constructor parameter is a list of data representing single line form CSV file. Second is a position in file from which it was taken.

NOTE:: You can use whatever CSV reading method you like. Just make sure that each CSV line is transformed into list.

If is_valid return False you can see errors that has been found in schema.errors. Each error message is formatted in two ways:

Line <line number>: <error message> when error message applies to whole line
Line <line number>, <column number>: <error message> when error message applies to particular column

More about columns

There are three types of columns. Their behavior can be altered by some additional keyword arguments:

StringColumn([blank, min_length, max_length, permissible_values]):

blank

If set to True column does not has to be filled
min_length

Value can not be shorter that min_length
max_length

Maximal lenght of value
permissible_values

List of allowed values

IntColumn([blank, min_exclusive, max_exclusive, min_inclusive, max_inclusive]):

blank

If set to True column does not has to be filled
min_exclusive

Minimum allowed value, exclusive
max_exclusive

Maximal allowed value, exclusive
min_inclusive

Minimum allowed value, inclusive
max_inclusive

Maximal allowed value, inclusive

DecimalColumn([blank, min_exclusive, max_exclusive, min_inclusive, max_inclusive, fraction_digits, total_digits]):

blank

If set to True column does not has to be filled
min_exclusive

Minimum allowed value, exclusive
max_exclusive

Maximal allowed value, exclusive
min_inclusive

Minimum allowed value, inclusive
max_inclusive

Maximal allowed value, inclusive
fraction_digits

Number of digits before comma
total_digits

Total number of digits in whole value (before and after comma)

NOTE:: DecimalColumn operates on decimal.Decimal objects. Have that in mind when you will be setting min_exclusive, max_exclusive, min_inclusive or max_inclusive.

Remember that you can always make your own columns by simply subclassing csv_schema.columns.base.BaseColumn:

from csv_schema.columns.base import BaseColumn
from csv_schema.exceptions import ImproperValueRestrictionException

class MyColumn(BaseColumn):

   value_template = ''  # Regular expression describing how proper value should look like in CSV file

   def convert(self, raw_val):  # This method is called in order to transform raw value into Python object
      return None

   def check_restriction(self, value):  # This method is optional. It allows you to specify keyword arguments that can alter column behavior.
      required_value = self.options.get('required_value', None)
      if required_value is not None:
         if required_value != value:
            # Message from ImproperValueRestrictionException will be added to structure errors
            raise ImproperValueRestrictionException('That is not the value you are looking for...')

Column set

Till now you have seen how to use CsvSchema for simple CSV file description. Sometimes specifying types of columns and their behavior just is not enough. What if you would like to describe more complex validation rules? Let’s say that you want a validation rule that says: you have to fill column A or column B or both of them. This is the situation when you need Cs objects.

Cs stands for Column Set and allows you to express more complex validation rules by simply combining Cs with use of some logic operators. Let’s consider simple validation rule that we mentioned earlier: you have to fill column A or column B or both of them:

from csv_schema.structure.set import Cs

class TestCsvStructure(BaseCsvStructure):

   a = IntColumn(blank=True)
   b = IntColumn(blank=True)

   class Rules(object):
      a_or_b_rule = Cs('a') | Cs('b')

Changed in 1.1.0: CsvSchema will now store rules in special inner class - Rules

NOTE:: If you are going to use column sets remeber to set columns used in Cs instances as blank.

Each Cs instance has assigned columns that needs to be filled in order to evaluate Cs as true. In our example each Cs instance has only one column but you can assign them as many as you need. For example, if you create Cs instance like this:

Cs('a', 'b')

will mean that you want both column, a and b to be filled because Cs will evaluate true only if every column in set is filled. We used | operator to combine two Cs. | can be referred as rule that demands at least one Cs instance to be evaluated as true. Cs supports also ^ operator. It is used to express rule that demands only one Cs instance to be filled. If you create rule Cs('a') ^ Cs('b') and fill both columns the whole expression will be evaluated as false.

NOTE:

Defined rules are evaluated during is_valid() call and their error messages are added to structure errors attribute. If custom error message is not appropriate to your needs you can override it by calling error method on whole rule:

...
class Rules(object):
   a_or_b_rule = (Cs('a') | Cs('b')).error('Column A or B needs to be filled')
...

If you want to define more than one rule in single structure class you can do it like this:

...
class Rules(object):
   rule_1 = Cs('a') | Cs('b')
   rule_2 = Cs('c') ^ (Cs('d') | Cs('e'))
...

Similarly as columns, Cs behavior can be altered by keyword arguments:

...
class Rules(object):
   rule = Cs('a', b='B')
...

In above example Cs instance will be evaluated true if column a is filled and column b has value equal to 'B'. Table below shows possible Cs states depending on different data and settings:

Setting	Column a	Column b	Evaluation
Cs(‘a’)	‘’	‘’	False
Cs(‘a’)	‘A’	‘’	True
Cs(‘a’)	‘’	‘B’	False
Cs(‘a’)	‘A’	‘B’	True
Cs(‘a’, b=’B’)	‘’	‘’	True
Cs(‘a’, b=’B’)	‘A’	‘’	False
Cs(‘a’, b=’B’)	‘’	‘B’	False
Cs(‘a’, b=’B’)	‘A’	‘B’	True
Cs(‘a’, b=’B’)	‘’	‘C’	True
Cs(‘a’, b=’B’)	‘A’	‘C’	False

Notice that when column b is empty or has wrong value column a can not be filled.

NOTE:

Rememer that you can have more than one value condition in Cs. Creating object like this:

Cs('a', b='B', c='C')

will make it true if b is equal to 'B' and c is equal to 'C' (and of course, a is not empty). You can even demand that particular column has to have specific value:

Cs('a', a='A')

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 5 - Production/Stable
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 2.7
Topic
- Software Development
- Text Processing

Release history Release notifications | RSS feed

This version

1.1.1

Mar 22, 2014

1.1.0

Apr 20, 2013

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

CsvSchema-1.1.1.zip (15.4 kB view details)

Uploaded Mar 22, 2014 Source

File details

Details for the file CsvSchema-1.1.1.zip.

File metadata

Download URL: CsvSchema-1.1.1.zip
Upload date: Mar 22, 2014
Size: 15.4 kB
Tags: Source
Uploaded using Trusted Publishing? No

File hashes

Hashes for CsvSchema-1.1.1.zip
Algorithm	Hash digest
SHA256	`fa3650cb39748f8fd0a755e734e237ee14520d5cb90e7565a48cea28b49029ab`
MD5	`5619d38e63390869293d3b833b76d6ed`
BLAKE2b-256	`785a6146e737bb2b4333d9b48b602b8666377db675eb18f04477cdc6936478e2`

See more details on using hashes here.

CsvSchema 1.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Basics

More about columns

Column set

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes