Skip to main content

Map CSV data into dataclasses

Project description

Dataclass CSV

Dataclass CSV makes working with CSV files easier and much better than working with Dicts. It uses Python's Dataclasses to store data of every row on the CSV file and also uses type annotations which enables proper type checking and validation.

Installation

pipenv install dataclass-csv

Getting started

First, add the necessary imports:

from dataclasses import dataclass

from dataclass_csv import DataclassReader

Assuming that we have a CSV file with the contents below:

firstname,email,age
Elsa,elsa@test.com, 11
Astor,astor@test.com, 7
Edit,edit@test.com, 3
Ella,ella@test.com, 2

Let's create a dataclass that will represent a row in the CSV file above:

class User():
    firstname: str
    email: str
    age: int

The dataclass User has 3 properties, firstname and email is of type str and age is of type int.

To load and read the contents of the CSV file we do the same thing as if we would be using the DictReader from the csv module in the Python's standard library. After opening the file we create an instance of the DataclassReader passing two arguments. The first is the file and the second is the dataclass that we wish to use to represent the data of every row of the CSV file. Like so:

with open(filename) as users_csv:
    reader = DataclassReader(users_csv, User)
    for row in reader:
        print(row)

The DataclassReader internally uses the DictReader from the csv module to read the CSV file which means that you can pass the same arguments that you would pass to the DictReader. The complete argument list is shown below:

dataclass_csv.DataclassReader(f, cls, fieldnames=None, restkey=None, restval=None, dialect='excel', *args, **kwds)

If you run this code you should see an output like this:

User(firstname='Elsa', email='elsa@test.com', age=11)
User(firstname='Astor', email='astor@test.com', age=7)
User(firstname='Edit', email='edit@test.com', age=3)
User(firstname='Ella', email='ella@test.com', age=2)

Error handling

One of the advantages of using the DataclassReader is that it makes it easy to detect when the type of data in the CSV file is not what your application's model is expecting. And, the DataclassReader shows errors that will help to identify the rows with problem in your CSV file.

For example, say we change the contents of the CSV file shown in the Getting started section and, modify the age of the user Astor, let's change it to a string value:

Astor, astor@test.com, test

Remember that in the dataclass User the age property is annotated with int. If we run the code again an exception will be raised with the message below:

ValueError: The field age is of type <class 'int'> but received a value of type <class 'str'>

Default values

The DataclassReader also handles properties with default values. Let's modify the dataclass User and add a default value for the field email:

class User():
    firstname: str
    email: str = 'Not specified'
    age: int

And we modify the CSV file and remove the email for the user Astor:

Astor,, 7

If we run the code we should see the output below:

User(firstname='Elsa', email='elsa@test.com', age=11)
User(firstname='Astor', email='Not specified', age=7)
User(firstname='Edit', email='edit@test.com', age=3)
User(firstname='Ella', email='ella@test.com', age=2)

Note that now the object for the user Astor have the default value Not specified assigned to the email property.

Mapping dataclass fields to columns

The mapping between a dataclass property and a column in the CSV file will be done automatically if the names match, however, there are situations that the name of the header for a column is different. We can easily tell the DataclassReader how the mapping should be done using the method map. Assuming that we have a CSV file with the contents below:

First Name,email,age
Elsa,elsa@test.com, 11

Note that now, the column is called First Name and not firstname

And we can use the method map, like so:

reader = DataclassReader(users_csv, User)
reader.map('First name').to('firstname')

Now the DataclassReader will know how to extract the data from the column First Name and add it to the to dataclass property firstname

Copyright and License

Copyright (c) 2018 Daniel Furtado. Code released under BSD 3-clause license

Credits

This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.

History

0.1.0 (2018-11-25)

  • First release on PyPI.

0.1.1 (2018-11-25)

  • Documentation fixes.

0.1.2 (2018-11-25)

  • Documentation fixes.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dataclass-csv-0.1.3.tar.gz (6.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dataclass_csv-0.1.3-py2.py3-none-any.whl (5.9 kB view details)

Uploaded Python 2Python 3

File details

Details for the file dataclass-csv-0.1.3.tar.gz.

File metadata

  • Download URL: dataclass-csv-0.1.3.tar.gz
  • Upload date:
  • Size: 6.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.20.1 setuptools/40.6.2 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.5.3

File hashes

Hashes for dataclass-csv-0.1.3.tar.gz
Algorithm Hash digest
SHA256 72e400c129a903ab994362f1bbdbc2df42bc71a01389ec32b4c0b3363a595f66
MD5 7c32a6d6c087b27ad3bd3dfc7b1c0f53
BLAKE2b-256 70cbbc608abdd051d5496683c91d843979353785a50b819cde882bd2b5720414

See more details on using hashes here.

File details

Details for the file dataclass_csv-0.1.3-py2.py3-none-any.whl.

File metadata

  • Download URL: dataclass_csv-0.1.3-py2.py3-none-any.whl
  • Upload date:
  • Size: 5.9 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.20.1 setuptools/40.6.2 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.5.3

File hashes

Hashes for dataclass_csv-0.1.3-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 1af5d7400f1bbc7f66c210a7a3489f4713633a8f054ad6c514597392f7553b56
MD5 4d95c02101ae8d6176972f423c168cc7
BLAKE2b-256 124c1e7c739a6ce8e816d08854dfa0d389634a019a669ece542594c38518cfc4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page