Skip to main content

Import CSV files to django models

Project description

Django CSV Import

Ed Crewe - December 2023

Overview

django-csvimport is a generic importer tool to allow the upload of CSV files for populating data. The egg installs an admin csvimport model that has a file upload field. Add a new csvimport and upload a comma separated values file or MS Excel file.

The upload triggers the import mechanism which matches the header line of the files field names to the fields in the selected model. Importing any rows that include all required fields. Optionally required fields can be specified as part of the upload. By default duplicate value rows are not inserted.

The import can also be run as a custom command, ie manage.py importcsv filename for possible use via cronjob etc.

For CSV files import where their schema is unknown, and there is no existing model to import to, there is another command, inspectcsv, to generate the model code from the CSV file, guessing data types from the data using code from https://messytables.readthedocs.org

The core import code was based on http://djangosnippets.org/snippets/633/ by Jonathan Holst. It adds character encoding handling, model field and column autodetection, admin interface, custom command etc.

Version 3 - Dec 2023

  1. Test with Django 5 and Python 3.12

  2. Incorporate messytables relevant code, the types.py file, since its no longer supported for Python 3.12

  3. Use dateparser for auto-handling a much wider range of date formats

Version 2 - Sept 2014

  1. New management command csvinspect to generate models from CSV files

  2. General code refactor

  3. Management command renamed from csvimport to importcsv

  4. More features to cope with bad encoding and date types

Version Compatibility

  • version 3.0 tested with Django 5.0 Python 3.12

  • version 2.16 tested with Django 3.2.16 on Python 3.9.6

  • version 2.14 tested with Django 3.0.5 on Python 3.7.6, 3.8.2

  • version 2.13 was tested with Django 2.2.5 on Python 3.7.3

  • version 2.6 was tested with Django 1.7, 1.8, 1.9, 1.10, 1.11 on Python 2.7.13, Python 3.5.6

Please use version 2.1, eg. pip install django-csvimport==2.1 for Django versions prior to 1.7

This Django >= 1.7 requirement is because django-csvimport uses the newly added AppConfig for versions > 2.1 (NB: To fix this issue you could install django-appconf to django 1.6 or earlier and tweak csvimport to use it in csvimport.app)

For really old Django versions < 1.4 you may have to dial back the versions until it works!

Note that only versions > 2.2 are compatible with Python 3.4

Installation instructions

Add the following to the INSTALLED_APPS in the settings.py of your project:

>>>  pip install django-csvimport
...
...  INSTALLED_APPS = (
...  ...
...  'csvimport.app.CSVImportConf',  # use AppConfig for django >=1.7 csvimport >=2.2
...  )
...
...  python manage.py migrate  (or syncdb if django < 1.9)

Note that migrate has the core tables in 0001_initial migration and test tables in 0002 so rm migrations/0002_test_models.py if you do not want these cluttering your database

Custom commands

INSPECTCSV

manage.py inspectcsv importfile.csv > models.py

This returns the code for a new models file with a guesstimated model for the CSV file. Add it to your app then run

>>> makemigrations your_app
>>> migrate

You can then run the import to that model for importfile.csv

NB: As it says its a guesstimate, you may have to manually tweak the generated models.py to get the import to work better.

If there are no headings in the CSV file, then it just uses automated ones col_1, col_2 … etc.

IMPORTCSV

(Please note this command used to be csvimport but that caused name clash issues with the module)

manage.py importcsv –mappings=’’ –model=’app_label.model_name’ –delimiter=’t’ importfile.csv

For mappings enter a list of fields in order only if you dont have a header row with matching field names - or you want to override it, eg.

–mappings = ‘1=shared_code,2=org(otherapp.Organisation|name),3=date’

where (model|foreign key field) is used to specify relations if again, you want to override what would be looked up from your models.

If you have no real field names in your csv file, then you can use –mappings=’none’ and it will assume the fields are named col_1, col_2 … etc.

Note that if you have a header row and specify mappings then it will treat the header as a data row, so delete it first.

Admin interface import

Just add a csvimport item, fill in the form and submit. Failed import rows are added to the log field.

Demonstration installation instructions

To see how it works, you can install a demo easily enough eg. via virtual environment, then use the tests settings to have some sample models for importing data, and the fixtures are sample csv files.

  • Run the following in your shell:

>>> virtualenv mysite
... cd mysite
... pip install django
... pip install django-csvimport
...
... cat > bin/django-admin.py << EOF
... #!/usr/bin/env python
... from django.core import management
... import os
... os.environ["DJANGO_SETTINGS_MODULE"] = "csvimport.settings"
... if __name__ == "__main__":
...     management.execute_from_command_line()
... EOF
...
... django-admin.py migrate
... django-admin.py runserver
  • Go to http://127.0.0.1:8000/admin/ in your browser - pay attention to the trailing / !

  • Click on add Csvimport

  • Pick the django-csvimport/csvimport/tests/fixtures/countries.csv [1] and upload it

  • Check to see if the Country model is now populated.

[1] also available from https://raw.github.com/edcrewe/django-importcsv/master/importcsv/tests/fixtures/countries.csv

Alternatively you can use the command line to upload

django-admin.py importcsv –model=’csvimport.Country’ django-csvimport/csvimport/tests/fixtures/countries.csv –settings=’csvimport.settings’

Tests

To run the django-csvimport tests use:

>>> bin/python3 -m django test --settings='csvimport.settings' csvimport.tests

Foreign Keys

It is not viable for csvimport to determine complex table relations. However if it finds something marked as an ForeignKey with a lookup field in its model mappings, then it checks if the data exists already for the related model and pulls back an id for the field or creates a new entry if possible in the fkey model and pulls back a new id.

For this to be useful then you need a related table that has a unique and more meaningful field that is being used in your data than a numeric primary key.

eg. for an organisation column, org, that holds the unique name of the organisation from a separate table, you can add

column2=org(Organisation|name)

to the mappings, so it knows that the org field relates to a separate Organisation table with a unique name field to be used for it to lookup and replace with org_id FKey

More complex relations

For any more sophisticated relations, eg. multiple keys, many to many fields etc. The recommended approach is to create a temporary or intermediate import table that holds the data from your CSV file with the lookup data in as columns, you can use

inspectcsv importfile.csv > models.py

to automatically generate the import model from your CSV file.

Whenever you do an import to that table you would use a bulk insert database query to take the data in it and populate complex relations of the final model tables appropriately. If imports are happening repeatedly, eg. once a day, you retain your import CSV format table, and can add a database trigger for the table to automatically run your stored data conversion synchronisation query into the target tables.

DateTime data

Note that the importer uses dateparser to try to convert any datetime types you have in your CSV file. See https://dateparser.readthedocs.io/en/latest/settings.html for DATEPARSER_SETTINGS env variable as a JSON map.

Acknowledgements

This package was created as part of a django dash at the House of Omni, Bristol UK, organised by Dan Fairs and my local django users group, #DBBUG. It was a core component for an application for aid agency supply chain sharing, prompted by Fraser Stephens of the HELIOS foundation and developed by Ed Crewe and Tom Dunham.

Other Django CSV importers

Changelog

Contributors listed as [github.com username] where they are on github

3.2 - Use dateparser for import - 26 Dec 2023

  1. Use dateparser for importing a much wider range of mixed date formats without specifying a format

  2. Remove old tz monkeypatch

3.1 - Add messytables for django 5 - 23 Dec 2023

  1. Use dateparser rather than python-dateutil and add to setup.py dependencies

3.0 - Add messytables for django 5 - 23 Dec 2023

  1. Upgrade to Django 5, hence Python 3.12

  2. Add csvimport/messytables with two files to provide functionality used from OKN messytables, since the whole package is no longer maintained for Python 3.12 [OKN]

  3. Autoformat the code with black

2.17 - Fix for django 4 - 2 Jan 2023

  1. Fix incompatible methods for django 4

  2. Add a test for single row CSV and fix issue of trailing line returns

  3. Minor issue PRs [edouard-gv]

2.16 - Fix and add tests - 28 June 2020

  1. Move test fixture migrations to separate migration folder picked up by custom testrunner

  2. Add regression tests for all reported issues

  3. Add admin UI tests

  4. Enable and fix all existing tests

  5. Use regex match group for failover CSV splitter

2.15 - Fix compatiblity issue for Django 3 - 27 April 2020

2.13 - Fix issue with inspectcsv for different header and data line number of cols - 26th Sept 2019

2.12 - Fix for latest django required on_delete and duplicated log msgs - 24 April 2019

2.11 - Fix command line summary messages - 14 July 2017

2.10 - Bug with models var not set if not bulk import - 6 July 2017

2.9 - Fix for using foreign keys from other apps - 4 July 2017

  1. Handle boolean types actually being ints for sqlite

  2. Allow dots in –mappings so app labels of foreign keys can be set by the user, since model lookup may assume wrong label.

2.8 - Minor fixes - 19 June 2017

  1. Force datetimes to local timezone to prevent warning on insert

  2. Rebuild initial migrations file

  3. Add option to do bulk create with –bulk to speed up import

2.7 - Merge in pull requests for minor fixes and test with Django 1.11 - 30 May 2017

  1. Fix uninitiated variable bug if type not detected

  2. Fix command line arg usage for Django 1.10 or 1.11 and make it backwards compatible down to 1.7

  3. For multiple django version compatibility code try excepts make sure new syntax is first to avoid deprecation warnings

  4. Add test shell script to make it easy to run test suite and a command line test

  5. Report back matched and unmatched cols in CSV rather than just showing list of all of them

  6. Allow clean=false option to not fix numeric values to match allowed range

  7. Use atomic to wrap row inserts so we can still report count of inserts Ok after a transactional exception

2.5 - Merge in various pull requests and test with Django 1.7 to 1.10 - 15 Nov 2016

  1. Fix for all recent versions of Django and Python

  2. Add in delimiter option eg. for tab separated values

  3. Add migrations

  4. Make importcsv parsing more tolerant of unparsable rows in CSV

2.4 - Fix app config path so that management commands are found - 11 Nov 2014

  1. Fix log print for commands

2.3 - Make compatible with Python 3 - 29 October 2014

2.2 - Move to core appconfig usage - 28 October 2014

  1. Add appconfig path to cater for module install test running

  2. Check for SMALLINT_DBS including sqlite - since can fail big int imports

2.1 - Fix for django 1.7 - 27 October 2014

2.0 - Added model creation and datetime clean up - 29th Sept 2014

  1. Generate django model files based on CSV data type sniffing with messytables

  2. Use flag to generate model code eg. –model=create_new_model.modelname and add test

  3. Use the settings date formats and add clean up for them

  4. Factor out type clean up method for data values

  5. Allow defaults to be passed as a command line argument

  6. Use –mappings=none for CSV created models with no column name row

  7. Handle badly encoded files as raw

  8. Set date limit in case of strftime bug for pre-1900 dates

[edcrewe]

1.1 - Added mappings and defaults tests - 18 March 2014

  1. Allow mapping format to be simple header row style

  2. Refactored heading parsing and added tests for optional args

[edcrewe]

1.0 - Added signals and code cleanup - 29th January 2013

  1. Fixed number overflow test and tested with django 1.5.1 [edcrewe]

  2. Added signals to the app, to allow the model to custom format a row.

  3. added appconf to allow the app to be more customizable - eg. select models for import.

  4. Removed except all and return database errors.

[cwood]

0.9 - Missed import of re for change to display of error log - 9th Dec 2012

  1. Import re in csvimport/models.py [blorenz]

0.8 - Tidy up logging and Boolean handling - 9th Dec 2012

  1. Fix display of error log so there are newlines for each entry [smeyfroi]

  2. Log processed rows to the csvimport.management.commands.csvimport logger [smeyfroi]

  3. Fix for boolean data [smeyfroi]

  4. Add setting for using standard logging machinery for Admin UI log lines [edcrewe]

  5. Add test to check creation and content of csvimport log file [edcrewe]

0.7 - Fixes and merge in changes for running up the test app - 24th Nov 2012

  1. Merge in changes for 1.4 templates in settings [elena]

  2. Merge in doc changes [elena, valhallasw]

  3. Fix bug with columns that are floats [imposeren]

  4. Fix imports via the admin not having charset encoding argument [edcrewe]

0.6 - Handle text not number or special float to integer - 7th March 2012

  1. Handle numeric columns with text in - zero if not nan or inf

[edcrewe]

0.5 - Add command line encoding option and error return - 6th March 2012

  1. Make sure command line usage returns errors and warnings

  2. Add the option to force –charset=utf-8 for example rather than chardet

  3. Add numeric type handling for empty fields or other issues plus tests

[Tessa Alexander, edcrewe]

0.4 - Add settings to allow demo site install from tests - 4th October 2011

  1. Add demo countries.csv file

  2. Add MEDIA_ROOT for file uploads

[edcrewe]

0.3 - Fix issue with adding in admin - 25th Sept 2011

  1. Fix empty mapping submitted

  2. Remove non-generic filename country processing code

  3. Use get_models to populate drop down for admin import form

  4. Add upload message about whether mapping is from CSV header row

  5. Ensure header row only used if mapping is not supplied.

[edcrewe]

0.2 - Initial beta release version - 29th July 2011

  1. Build it as a django-csvimport egg

  2. Create csvimport management command

  3. Improve character set detection and unicode handling

  4. Add admin csvimport model to upload CSV files with logging of import

  5. Use file upload save to trigger csvimport command

  6. Autodetect column mappings by matching model fields and CSV header text

  7. Add a deduplicate feature

  8. Start to add test suite with data / app for HELIOS cross agency supply chain data sharing http://www.helios-foundation.org/cbha-project/index_html

[edcrewe]

0.1 - Unreleased

  1. Use django snippet as starting point http://djangosnippets.org/snippets/633/

[Jonathan Holst]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

django-csvimport-3.2.tar.gz (60.6 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page