This is a pre-production deployment of Warehouse, however changes made here WILL affect the production instance of PyPI.
Latest Version Dependencies status unknown Test status unknown Test coverage unknown
Project Description

Django import data is a command-line tool for importing XML, HTML or JSON data to django models via XSLT mapping.

Source code is located here - https://github.com/lev-veshnyakov/django-import-data.

Basic features

Django import data can take any XML, HTML or JSON source file or URL as an input and save entities from it to the django models without need to modify an existing code.

It also supports saving of a related data in form one-to-many and many-to-many.

Dependencies

It uses lxml library for all XML manipulations and dicttoxml library for the transformation from JSON to XML.

Installation

First you need to install dependencies for lxml library:

sudo apt-get install libxml2-dev libxslt-dev python-dev

Then install django-import-data using pip:

pip install django-import-data

If you want the latest version you can install it from Github:

pip install git+https://github.com/lev-veshnyakov/django-import-data

Add import_data to INSTALLED_APPS:

INSTALLED_APPS = [
    ...
    'import_data',
]

Usage

Django import data is a management command-line tool, that can be used from the code as well.

Too see the list of console commands type:

python manage.py help

In the output you will find import_data section like below:

[import_data]
    process_xslt
    validate_xml

To get help for the particular command type:

python manage.py help process_xslt
python manage.py help validate_xml
python manage.py help json_to_xml

To call console commands from your code use django.core.management.call_command:

from django.core.management import call_command

call_command('process_xslt', 'http://stackoverflow.com/', 'transform.xslt', '--save')

How it works

In a few words it takes a source in either XML or HTML, then takes provided by you XSLT file, transforms the source into the specific XML representation, and then saves the data from this XML to the database using models.

The point is, that you don’t need to write procedural code for saving data. You only need to write XSLT files, which is actually XML. One file for one source. By the source I mean a range of XML or HTML files in the same format. For example all google search result pages have one schema. That means that you can write only one XSLT transformation file to import all search pages data.

The difficult moment is that you have to be familiar with XSLT and Xpath.

XSLT and XPath

XSLT is a language for transforming XML documents into XHTML documents or to other XML documents.

XSLT uses XPath to find information in an XML document. XPath is used to navigate through elements and attributes in XML documents.

If you are not familiar with that I reccomend you to read a short tutorial on www.w3school.com.

Moreover, you have to know what an XML Schema is and a particular schema language RELAX NG.

XML Schema and RELAX NG

Django import data uses RELAX NG to validate resuls of transformations. That means if you write XSLT file wrong, it wouldn’t be accepted.

But you dont have to write RELAX NG schema yoursef, it’s already included in the module.

Resulting XML

After XSLT transformation and schema validation the resulting XML file should be like following:

<?xml version="1.0" encoding="UTF-8"?>
<mapping>
    <model model="app.Author">
        <item key="1">
            <field name="name">Andrew Tanenbaum</field>
        </item>
        <item key="2">
            <field name="name">Donald Knuth</field>
        </item>
    </model>
    <model model="app.Book">
        <item key="1">
            <field name="name">Computer Networks</field>
            <field name="ISBN">0130661023</field>
            <fk model="app.Author" key="1"/>
        </item>
        <item key="2">
            <field name="name">The Art of Computer Programming</field>
            <field name="ISBN">0321751043</field>
            <m2mk model="app.Author" key="2"/>
        </item>
    </model>
</mapping>

This XML can be automatically saved to the models.

It contains the root element <mapping/>. Into it are nested <model/> elements. Each model element represents a particular django model. You must provide model="" attributes, in which specify a related model. Path to the model is in following format: application_name.ModelName, the same format like manage.py dumpdata uses.

Model elements don’t have to be unique. If you specify several model elements with the same model attribute, they will be merged together. This concerns to item elements as well.

Model elements contain <item/> elements, representing particular records in the database. They have only one required attribute name="", which sets the name of a related model field.

Foreign keys

Django import data supports import of related entities in the form one-to-many and many-to-many. To save such entities your models should have appropriate foreign keys.

In a resulting XML you can use <fk/> and <m2m/> elements (see above). They have model="" and key="" attributes, pointing to the related <item/> elements.

Setting key attribute

The key="" attribute of <item/> elements must be unique by each unique record. It has not to be the same as a primary key value in the database. It even will not be stored (if you want to store a primary key value, use <field/> element).

Therefore, the value of the key="" attribute not obliged to be integer. You can use any sring. Often it’s convenient to use an URL as the key.

You can even omit filling that attribute if you don’t have related items.

But one case is special. That’s when you don’t have any unique attributes in the source. In that case you can use generate-id(..) XPath function. It will generate unique IDs for every separate XML node in the source.

Using JSON sources

It’s possible to use JSON sources. Because the transformation is XSLT-based, JSON is converted to the appropriate XML.

For example the following JSON code:

{
  "firstName": "John",
  "lastName": "Smith",
  "age": 25,
  "address": {
    "streetAddress": "21 2nd Street",
    "city": "New York",
    "state": "NY",
    "postalCode": "10021"
  },
  "phoneNumber": [
    {
      "type": "home",
      "number": "212 555-1234"
    },
    {
      "type": "fax",
      "number": "646 555-4567"
    }
  ],
  "gender": {
    "type": "male"
  }
}

will be converted to this XML:

<?xml version="1.0" encoding=""?>
<root>
  <firstName type="str">John</firstName>
  <lastName type="str">Smith</lastName>
  <age type="int">25</age>
  <address type="dict">
    <postalCode type="str">10021</postalCode>
    <city type="str">New York</city>
    <streetAddress type="str">21 2nd Street</streetAddress>
    <state type="str">NY</state>
  </address>
  <phoneNumber type="list">
    <item type="dict">
      <type type="str">home</type>
      <number type="str">212 555-1234</number>
    </item>
    <item type="dict">
      <type type="str">fax</type>
      <number type="str">646 555-4567</number>
    </item>
  </phoneNumber>
  <gender type="dict">
    <type type="str">male</type>
  </gender>
</root>

That XML is supposed to be used for writing an XSLT transformation.

If you use some JSON source and want to find out which XML is related for it, then use the command:

python manage.py json_to_xml <URL>

After writing an XSLT transformation file you can use process_xslt specifying the URL of the JSON source.

JSON to XML transformations is performed by dicttoxml library written by Ryan McGreal https://github.com/quandyfactory/dicttoxml.

Examples

Save data to one model

In this simple example we will parse the main page of stackoverflow.com and save titles of recent questions to this model:

from django.db import models

class Question(models.Model):
    title = models.CharField(max_length=255)

First we need to write an XSLT file:

<?xml version="1.0" encoding="UTF-8"?>
<mapping xsl:version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <model model="test_app.Question">
        <xsl:for-each select="//a[@class='question-hyperlink']">
            <item key="">
                <field name="title">
                    <xsl:value-of select="."/>
                </field>
            </item>
        </xsl:for-each>
    </model>
</mapping>

Name it transform.xslt and perform the following command:

python manage.py process_xslt http://stackoverflow.com/questions transform.xslt --validate

The output will be like this (but longer):

<?xml version="1.0" encoding="utf-8"?>
<mapping>
  <model model="xml_json_import.Article">
    <item key="">
      <field name="title">customizing soap response attribute format</field>
    </item>
    <item key="">
      <field name="title">Second fragment loaded but not visible on screen</field>
    </item>
    <item key="">
      <field name="title">django-oscar :first time use "python manage.py migrate" gets error</field>
    </item>
    <item key="">
      <field name="title">JTable fireTableDataChanged() method doesn't refresh table</field>
    </item>
    <item key="">
      <field name="title">why the dynamic nodes dont respond to click in jstree?</field>
    </item>
    <item key="">
      <field name="title">Connecting kdb+ to R</field>
    </item>
  </model>
</mapping>

Parameter --validate adds to output Document is valid.

To save the result add the parameter --save to the command above.

Release History

Release History

0.4.2

This version

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.4.1

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.3.1

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.3.0

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.2.2

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.2.1

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.2.0

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.1.4

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.1.2

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.1.1

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.1.0

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

Download Files

Download Files

TODO: Brief introduction on what you do with files - including link to relevant help section.

File Name & Checksum SHA256 Checksum Help Version File Type Upload Date
django_import_data-0.4.2-py2.py3-none-any.whl (67.6 kB) Copy SHA256 Checksum SHA256 py2.py3 Wheel Feb 8, 2016
django-import-data-0.4.2.tar.gz (64.2 kB) Copy SHA256 Checksum SHA256 Source Feb 8, 2016

Supported By

WebFaction WebFaction Technical Writing Elastic Elastic Search Pingdom Pingdom Monitoring Dyn Dyn DNS HPE HPE Development Sentry Sentry Error Logging CloudAMQP CloudAMQP RabbitMQ Heroku Heroku PaaS Kabu Creative Kabu Creative UX & Design Fastly Fastly CDN DigiCert DigiCert EV Certificate Rackspace Rackspace Cloud Servers DreamHost DreamHost Log Hosting