xmljson·PyPI

Converts XML into JSON/Python dicts/arrays and vice-versa.

These details have not been verified by PyPI

Project links

Homepage

Project description

https://img.shields.io/travis/sanand0/xmljson.svg

https://img.shields.io/pypi/v/xmljson.svg

This library is not actively maintained. Alternatives are xmltodict and untangle. Use only if you need to parse using specific XML to JSON conventions.

xmljson converts XML into Python dictionary structures (trees, like in JSON) and vice-versa.

About

XML can be converted to a data structure (such as JSON) and back. For example:

<employees>
    <person>
        <name value="Alice"/>
    </person>
    <person>
        <name value="Bob"/>
    </person>
</employees>

can be converted into this data structure (which also a valid JSON object):

{
    "employees": [{
        "person": {
            "name": {
                "@value": "Alice"
            }
        }
    }, {
        "person": {
            "name": {
                "@value": "Bob"
            }
        }
    }]
}

This uses the BadgerFish convention that prefixes attributes with @. The conventions supported by this library are:

Abdera: Use "attributes" for attributes, "children" for nodes
BadgerFish: Use "$" for text content, @ to prefix attributes
Cobra: Use "attributes" for sorted attributes (even when empty), "children" for nodes, values are strings
GData: Use "$t" for text content, attributes added as-is
Parker: Use tail nodes for text content, ignore attributes
Yahoo Use "content" for text content, attributes added as-is

Convert data to XML

To convert from a data structure to XML using the BadgerFish convention:

>>> from xmljson import badgerfish as bf
>>> bf.etree({'p': {'@id': 'main', '$': 'Hello', 'b': 'bold'}})

This returns an array of etree.Element structures. In this case, the result is identical to:

>>> from xml.etree.ElementTree import fromstring
>>> [fromstring('<p id="main">Hello<b>bold</b></p>')]

The result can be inserted into any existing root etree.Element:

>>> from xml.etree.ElementTree import Element, tostring
>>> result = bf.etree({'p': {'@id': 'main'}}, root=Element('root'))
>>> tostring(result)
'<root><p id="main"/></root>'

This includes lxml.html as well:

>>> from lxml.html import Element, tostring
>>> result = bf.etree({'p': {'@id': 'main'}}, root=Element('html'))
>>> tostring(result, doctype='<!DOCTYPE html>')
'<!DOCTYPE html>\n<html><p id="main"></p></html>'

For ease of use, strings are treated as node text. For example, both the following are the same:

>>> bf.etree({'p': {'$': 'paragraph text'}})
>>> bf.etree({'p': 'paragraph text'})

By default, non-string values are converted to strings using Python’s str, except for booleans – which are converted into true and false (lower case). Override this behaviour using xml_fromstring:

>>> tostring(bf.etree({'x': 1.23, 'y': True}, root=Element('root')))
'<root><y>true</y><x>1.23</x></root>'
>>> from xmljson import BadgerFish              # import the class
>>> bf_str = BadgerFish(xml_tostring=str)       # convert using str()
>>> tostring(bf_str.etree({'x': 1.23, 'y': True}, root=Element('root')))
'<root><y>True</y><x>1.23</x></root>'

If the data contains invalid XML keys, these can be dropped via invalid_tags='drop' in the constructor:

>>> bf_drop = BadgerFish(invalid_tags='drop')
>>> data = bf_drop.etree({'$': '1', 'x': '1'}, root=Element('root'))    # Drops invalid <$> tag
>>> tostring(data)
'<root>1<x>1</x></root>'

Convert XML to data

To convert from XML to a data structure using the BadgerFish convention:

>>> bf.data(fromstring('<p id="main">Hello<b>bold</b></p>'))
{"p": {"$": "Hello", "@id": "main", "b": {"$": "bold"}}}

To convert this to JSON, use:

>>> from json import dumps
>>> dumps(bf.data(fromstring('<p id="main">Hello<b>bold</b></p>')))
'{"p": {"b": {"$": "bold"}, "@id": "main", "$": "Hello"}}'

To preserve the order of attributes and children, specify the dict_type as OrderedDict (or any other dictionary-like type) in the constructor:

>>> from collections import OrderedDict
>>> from xmljson import BadgerFish              # import the class
>>> bf = BadgerFish(dict_type=OrderedDict)      # pick dict class

By default, values are parsed into boolean, int or float where possible (except in the Yahoo method). Override this behaviour using xml_fromstring:

>>> dumps(bf.data(fromstring('<x>1</x>')))
'{"x": {"$": 1}}'
>>> bf_str = BadgerFish(xml_fromstring=False)   # Keep XML values as strings
>>> dumps(bf_str.data(fromstring('<x>1</x>')))
'{"x": {"$": "1"}}'
>>> bf_str = BadgerFish(xml_fromstring=repr)    # Custom string parser
'{"x": {"$": "\'1\'"}}'

xml_fromstring can be any custom function that takes a string and returns a value. In the example below, only the integer 1 is converted to an integer. Everything else is retained as a float:

>>> def convert_only_int(val):
...     return int(val) if val.isdigit() else val
>>> bf_int = BadgerFish(xml_fromstring=convert_only_int)
>>> dumps(bf_int.data(fromstring('<p><x>1</x><y>2.5</y><z>NaN</z></p>')))
'{"p": {"x": {"$": 1}, "y": {"$": "2.5"}, "z": {"$": "NaN"}}}'

Conventions

To use a different conversion method, replace BadgerFish with one of the other classes. Currently, these are supported:

>>> from xmljson import abdera          # == xmljson.Abdera()
>>> from xmljson import badgerfish      # == xmljson.BadgerFish()
>>> from xmljson import cobra           # == xmljson.Cobra()
>>> from xmljson import gdata           # == xmljson.GData()
>>> from xmljson import parker          # == xmljson.Parker()
>>> from xmljson import yahoo           # == xmljson.Yahoo()

Options

Conventions may support additional options.

The Parker convention absorbs the root element by default. parker.data(preserve_root=True) preserves the root instance:

>>> from xmljson import parker, Parker
>>> from xml.etree.ElementTree import fromstring
>>> from json import dumps
>>> dumps(parker.data(fromstring('<x><a>1</a><b>2</b></x>')))
'{"a": 1, "b": 2}'
>>> dumps(parker.data(fromstring('<x><a>1</a><b>2</b></x>'), preserve_root=True))
'{"x": {"a": 1, "b": 2}}'

Installation

This is a pure-Python package built for Python 2.7+ and Python 3.0+. To set up:

pip install xmljson

Simple CLI utility

After installation, you can benefit from using this package as simple CLI utility. By now only XML to JSON conversion supported. Example:

$ python -m xmljson -h
usage: xmljson [-h] [-o OUT_FILE]
            [-d {abdera,badgerfish,cobra,gdata,parker,xmldata,yahoo}]
            [in_file]

positional arguments:
in_file               defaults to stdin

optional arguments:
-h, --help            show this help message and exit
-o OUT_FILE, --out_file OUT_FILE
                        defaults to stdout
-d {abdera,badgerfish,...}, --dialect {...}
                        defaults to parker

$ python -m xmljson -d parker tests/mydata.xml
{
  "foo": "spam",
  "bar": 42
}

This is a typical UNIX filter program: it reads file (or stdin), processes it in some way (convert XML to JSON in this case), then prints it to stdout (or file). Example with pipe:

$ some-xml-producer | python -m xmljson | some-json-processor

There is also pip’s console_script entry-point, you can call this utility as xml2json:

$ xml2json -d abdera mydata.xml

Roadmap

Test cases for Unicode
Support for namespaces and namespace prefixes
Support XML comments

History

0.2.1 (25 Apr 2020)

Bugfix: Don’t strip whitespace in xml text values (@imoore76)
Bugfix: Yahoo convention should convert <x>0</x> into {x: 0}. Empty elements become '' not {}
Suggest alternate libraries in documentation

0.2.0 (21 Nov 2018)

xmljson command line script converts from XML to JSON (@tribals)
invalid_tags='drop' in the constructor drops invalid XML tags in .etree() (@Zurga)
Bugfix: Parker converts {'x': null} to <x></x> instead of <x>None</x> (@jorndoe #29)

0.1.9 (1 Aug 2017)

Bugfix and test cases for multiple nested children in Abdera convention

Thanks to @mukultaneja

0.1.8 (9 May 2017)

Add Abdera and Cobra conventions
Add Parker.data(preserve_root=True) option to preserve root element in Parker convention.

Thanks to @dagwieers

0.1.6 (18 Feb 2016)

Add xml_fromstring= and xml_tostring= parameters to constructor to customise string conversion from and to XML.

0.1.5 (23 Sep 2015)

Add the Yahoo XML to JSON conversion method.

0.1.4 (20 Sep 2015)

Fix GData.etree() conversion of attributes. (They were ignored. They should be added as-is.)

0.1.3 (20 Sep 2015)

Simplify {'p': {'$': 'text'}} to {'p': 'text'} in BadgerFish and GData conventions.
Add test cases for .etree() – mainly from the MDN JXON article.
dict_type/list_type do not need to inherit from dict/list

0.1.2 (18 Sep 2015)

Always use the dict_type class to create dictionaries (which defaults to OrderedDict to preserve order of keys)
Update documentation, test cases
Remove support for Python 2.6 (since we need collections.Counter)
Make the Travis CI build pass

0.1.1 (18 Sep 2015)

Convert true, false and numeric values from strings to Python types
xmljson.parker.data() is compliant with Parker convention (bugs resolved)

0.1.0 (15 Sep 2015)

Two-way conversions via BadgerFish, GData and Parker conventions.
First release on PyPI.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.2.1

Apr 25, 2020

0.2.0

Nov 21, 2018

0.1.9

Aug 1, 2017

0.1.8

May 9, 2017

0.1.7

Sep 13, 2016

0.1.6

Feb 18, 2016

0.1.5

Sep 23, 2015

0.1.4

Sep 20, 2015

0.1.3

Sep 20, 2015

0.1.2

Sep 18, 2015

0.1.1

Sep 18, 2015

0.1.0

Sep 15, 2015

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xmljson-0.2.1.tar.gz (29.2 kB view details)

Uploaded Apr 25, 2020 Source

Built Distribution

xmljson-0.2.1-py2.py3-none-any.whl (10.1 kB view details)

Uploaded Apr 25, 2020 Python 2Python 3

File details

Details for the file xmljson-0.2.1.tar.gz.

File metadata

Download URL: xmljson-0.2.1.tar.gz
Upload date: Apr 25, 2020
Size: 29.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.7.3

File hashes

Hashes for xmljson-0.2.1.tar.gz
Algorithm	Hash digest
SHA256	`b4158e66aa1e62ee39f7f80eb2fe4f767670ba3c0d5de9804420dc53427fdec8`
MD5	`fc4df2390ad209928ee4311a3540cb17`
BLAKE2b-256	`e86fd9f109ba19be510fd3098bcb72143c67ca6743cedb48ac75aef05ddfe960`

See more details on using hashes here.

File details

Details for the file xmljson-0.2.1-py2.py3-none-any.whl.

File metadata

Download URL: xmljson-0.2.1-py2.py3-none-any.whl
Upload date: Apr 25, 2020
Size: 10.1 kB
Tags: Python 2, Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.7.3

File hashes

Hashes for xmljson-0.2.1-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`8f1d7aba2c0c1bfa0203b577f21a1d95fde4485205ff638b854cb4d834e639b0`
MD5	`527685fc40c28fd696124737840389ca`
BLAKE2b-256	`912d7191efe15406b8b99e2b5905ca676a8a3dc2936416ade7ed17752902c250`

See more details on using hashes here.

xmljson 0.2.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

About

Convert data to XML

Convert XML to data

Conventions

Options

Installation

Simple CLI utility

Roadmap

History

0.2.1 (25 Apr 2020)

0.2.0 (21 Nov 2018)

0.1.9 (1 Aug 2017)

0.1.8 (9 May 2017)

0.1.6 (18 Feb 2016)

0.1.5 (23 Sep 2015)

0.1.4 (20 Sep 2015)

0.1.3 (20 Sep 2015)

0.1.2 (18 Sep 2015)

0.1.1 (18 Sep 2015)

0.1.0 (15 Sep 2015)

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes