Skip to main content

A better csv.DictReader (supporting UTF-8 and type conversion)

Project description

DocReader

A utility for reading a CSV file and mapping it’s contents to python objects based on the field names in the document.

Contains classes for reading loosly structured documents containing rows of text (eg. CSV files) and turning them into slightly higher level lists of dictionaries.

Typical usage is to create a subclass of DocReader and define the cols attribute once. To use it, instantiate the subclass with an open file-like object (that supports the iterator protocol and returns a full line of text on each next() call).

Iterating over the DocReader subclass instance will then yield a dict where the keys are as defined in the cols attribute and the values are the cell’s logical values that have passed through the correct conversion function.

The cols attribute is itself is either a list of column names or a dictionary mapping keys (which will be used as the keys in the final returned data) to an options dictionary for that column.

An options dictionary can have the following keys:
  • column: The name of the column as it appears in the document

  • convert: A callable that will be called with the string associated with

    the data in this cell. Often a type (eg int, float) or a lambda.

>>> import decimal
>>> def _price(val):
...     txt = val.replace('$', '').strip().lower()
...     txt = txt or None
...     return decimal.Decimal(txt)
...
>>> class MyReader(DocReader):
...     cols = dict(
...         a = dict(column='a', convert=int),
...         q = dict(column='b'),
...         c = dict(column='c', convert=lambda x: int(x)),
...         price = dict(convert=_price),
...     )
...
>>> import StringIO
>>> #Ignore the use of chr(10), doctest doesn't like \n.
>>> list(MyReader(StringIO.StringIO("a,b,c,price" + chr(10) + "1,2,3,$45.6" + chr(10) + "4,5,6,$55")))
[{u'a': 1, u'q': u'2', u'c': 3, u'price': Decimal('45.6')}, {u'a': 4, u'q': u'5', u'c': 6, u'price': Decimal('55')}]
>>> list(MyReader(StringIO.StringIO("a,b" + chr(10) + "1,2")))
Traceback (most recent call last):
...
MissingColumnException: Columns missing from input: c, price

Project details


Release history Release notifications | RSS feed

This version

1.0

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

docreader-1.0.tar.gz (3.4 kB view details)

Uploaded Source

File details

Details for the file docreader-1.0.tar.gz.

File metadata

  • Download URL: docreader-1.0.tar.gz
  • Upload date:
  • Size: 3.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for docreader-1.0.tar.gz
Algorithm Hash digest
SHA256 e770f8742de05d10d1c848592a90db1803cf2f52aaad92dab92b8d8fac221361
MD5 7d180d6c376b2206d0d128e04cc55005
BLAKE2b-256 18330f8e08bb3b0b3e4488b0eb9f610ee1111b667f676b4ff3a3d4fbf4177090

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page