Map CSV data into dataclasses
Project description
Dataclass CSV
Dataclass CSV makes working with CSV files easier and much better than working with Dicts. It uses Python's Dataclasses to store data of every row on the CSV file and also uses type annotations which enables proper type checking and validation.
Installation
pipenv install dataclass-csv
Getting started
First, add the necessary imports:
from dataclasses import dataclass
from dataclass_csv import DataclassReader
Assuming that we have a CSV file with the contents below:
firstname,email,age
Elsa,elsa@test.com, 11
Astor,astor@test.com, 7
Edit,edit@test.com, 3
Ella,ella@test.com, 2
Let's create a dataclass that will represent a row in the CSV file above:
class User():
firstname: str
email: str
age: int
The dataclass User
has 3 properties, firstname
and email
is of type str
and age
is of type int
.
To load and read the contents of the CSV file we do the same thing as if we would be using the DictReader
from the csv
module in the Python's standard library. After opening the file we create an instance of the DataclassReader
passing two arguments. The first is the file
and the second is the dataclass that we wish to use to represent the data of every row of the CSV file. Like so:
with open(filename) as users_csv:
reader = DataclassReader(users_csv, User)
for row in reader:
print(row)
The DataclassReader
internally uses the DictReader
from the csv
module to read the CSV file which means that you can pass the same arguments that you would pass to the DictReader
. The complete argument list is shown below:
dataclass_csv.DataclassReader(f, cls, fieldnames=None, restkey=None, restval=None, dialect='excel', *args, **kwds)
If you run this code you should see an output like this:
User(firstname='Elsa', email='elsa@test.com', age=11)
User(firstname='Astor', email='astor@test.com', age=7)
User(firstname='Edit', email='edit@test.com', age=3)
User(firstname='Ella', email='ella@test.com', age=2)
Error handling
One of the advantages of using the DataclassReader
is that it makes it easy to detect when the type of data in the CSV file is not what your application's model is expecting. And, the DataclassReader
shows errors that will help to identify the rows with problem in your CSV file.
For example, say we change the contents of the CSV file shown in the Getting started section and, modify the age
of the user Astor, let's change it to a string value:
Astor, astor@test.com, test
Remember that in the dataclass User
the age
property is annotated with int
. If we run the code again an exception will be raised with the message below:
ValueError: The field age is of type <class 'int'> but received a value of type <class 'str'>
Default values
The DataclassReader
also handles properties with default values. Let's modify the dataclass User
and add a default value for the field email
:
class User():
firstname: str
email: str = 'Not specified'
age: int
And we modify the CSV file and remove the email for the user Astor:
Astor,, 7
If we run the code we should see the output below:
User(firstname='Elsa', email='elsa@test.com', age=11)
User(firstname='Astor', email='Not specified', age=7)
User(firstname='Edit', email='edit@test.com', age=3)
User(firstname='Ella', email='ella@test.com', age=2)
Note that now the object for the user Astor have the default value Not specified
assigned to the email property.
Mapping dataclass fields to columns
The mapping between a dataclass property and a column in the CSV file will be done automatically if the names match, however, there are situations that the name of the header for a column is different. We can easily tell the DataclassReader
how the mapping should be done using the method map
. Assuming that we have a CSV file with the contents below:
First Name,email,age
Elsa,elsa@test.com, 11
Note that now, the column is called First Name and not firstname
And we can use the method map
, like so:
reader = DataclassReader(users_csv, User)
reader.map('First name').to('firstname')
Now the DataclassReader will know how to extract the data from the column First Name and add it to the to dataclass property firstname
Supported type annotation
At the moment the DataclassReader
support int
, str
, float
, complex
and datetime
. When defining a datetime
property, it is necessary to use the dateformat
decorator, for example:
from dataclasses import dataclass
from datetime import datetime
from dataclass_csv import DataclassReader, dateformat
@dataclass
@dateformat('%Y/%m/%d')
class User:
name: str
email: str
birthday: datetime
if __name__ == '__main__':
with open('users.csv') as f:
reader = DataclassReader(f, User)
for row in reader:
print(row)
Assuming that the CSV file have the following contents:
name,email,birthday
Edit,edit@test.com,2018/11/23
The output would look like this:
User(name='Edit', email='edit@test.com', birthday=datetime.datetime(2018, 11, 23, 0, 0))
Fields metadata
It is important to note that the dateformat
decorator will define the date format that will be used to parse date to all properties
in the class. Now there are situations that the data in a CSV file contains two or more columns with date values in different formats. It is possible
to set a format specific for every property using the dataclasses.field
. Let's say that we now have a CSV file with the following contents:
name,email,birthday, create_date
Edit,edit@test.com,2018/11/23,2018/11/23 10:43
As you can see the create_date
contains time information as well.
The dataclass
User can be defined like this:
from dataclasses import dataclass, field
from datetime import datetime
from dataclass_csv import DataclassReader, dateformat
@dataclass
@dateformat('%Y/%m/%d')
class User:
name: str
email: str
birthday: datetime
create_date: datetime = field(metadata={'dateformat': '%Y/%m/%d %H:%M'})
Note that the format for the birthday
field was not speficied using the field
metadata. In this case the format specified in the dateformat
decorator will be used.
Handling values with empty spaces
When defining a property of type str
in the dataclass
, the DataclassReader
will treat values with only white spaces as invalid. To change this
behavior, there is a decorator called @accept_whitespaces
. When decorating the class with the @accept_whitespaces
all the properties in the class
will accept values with only white spaces.
If you need a specific field to accept white spaces, you can set the property accept_whitespaces
in the field's metadata, like so:
@dataclass
class User:
name: str
email: str = field(metadata={'accept_whitespaces': True})
birthday: datetime
created_at: datetime
Copyright and License
Copyright (c) 2018 Daniel Furtado. Code released under BSD 3-clause license
Credits
This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.
History
0.1.0 (2018-11-25)
- First release on PyPI.
0.1.1 (2018-11-25)
- Documentation fixes.
0.1.2 (2018-11-25)
- Documentation fixes.
0.1.3 (2018-11-26)
- Bug fixes
- Removed the requirement of setting the dataclass init to
True
0.1.5 (2018-11-29)
- Support for parsing datetime values.
- Better handling when default values are set to
None
0.1.6 (2018-12-01)
- Added support for reader default values from the default property of the
dataclasses.field
. - Added support for allowing string values with only white spaces in a class level using the
@accept_whitespaces
decorator or through thedataclasses.field
metadata. - Added support for specifying date format using the
dataclasses.field
metadata.
0.1.7 (2018-12-01)
- Added support for default values from
default_factory
in the field's metadata. This allows adding mutable default values to the dataclass properties.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for dataclass_csv-0.1.7-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 53ecfbe87a2a4c6159f58ecf29a9c4e629c72a1151cc097087a1426e7f7819b7 |
|
MD5 | 9b50a99a1ca29f62f7b48ed12b638cea |
|
BLAKE2b-256 | 609b79e27c884bdacfad053b561a9a873eeb7c4c25272b2951fd24cec230ffb4 |