Skip to main content

A set of utilities for processing XML documents and converting to other formats

Project description

# xmlutils.py
xmlutils.py is a set of Python utilities for processing xml files serially, namely converting
them to other formats (SQL, CSV, JSON). The scripts use ElementTree.iterparse() to iterate
through nodes in an XML file, thus not needing to load the whole DOM into memory.
The scripts can be used to churn through large XML files (albeit taking long :P) without memory hiccups.

Blind conversion of XML to CSV and SQL is not recommended.
It only works if the structure of the XML document is simple (flat).
On the other hand, xml2json supports complex XML documents with multiple nested hierarchies.
Lastly, the XML files are not validated at the time of conversion.


Kailash Nadh, October 2011

License: MIT License

Documentation: http://nadh.in/code/xmlutils.py

Pypi: https://pypi.python.org/pypi/xmlutils


#Installation
With pip or easy_install

```pip install xmlutils``` or ```easy_install xmlutils```

Or from the source

```python setup.py install```

#Commandline utilities

##xml2csv
Convert an XML document to a CSV file.

<pre>
xml2csv --input "samples/fruits.xml" --output "samples/fruits.csv" --tag "item"
</pre>

######Arguments
```
--input Input XML document's filename*
--output Output CSV file's filename*
--tag The tag of the node that represents a single record (Eg: item, record)*
--delimiter Delimiter for seperating items in a row. Default is , (a comma followed by a space)
--ignore A space separated list of element tags in the XML document to ignore.
--header Whether to print the CSV header (list of fields) in the first line; 1=yes, 0=no. Default is 1.
--encoding Character encoding of the document. Default is utf-8
--limit Limit the number of records to be processed from the document to a particular number. Default is no limit (-1)
--buffer The number of records to be kept in memory before it is written to the output CSV file. Helps reduce the number
of disk writes. Default is 1000.
```

##xml2sql
Convert an XML document to an SQL file.

```xml2sql --input "samples/fruits.xml" --output "samples/fruits.sql" --tag "item" --table "myfruits"```

######Arguments
```
tag -- the record tag. eg: item
table -- table name
ignore -- list of tags to ignore
limit -- maximum number of records to process
packet -- maximum size of an insert query in MB (MySQL's max_allowed_packet)

Returns:
{ num: number of records converted,
num_insert: number of sql insert statements generated
}
```

##xml2json
Convert XML to JSON.
xml2json supports hierarchies nested to any number of levels.

```xml2json --input "samples/fruits.xml" --output "samples/fruits.sql"```

#Modules

##xmlutils.xml2sql
```python
from xmlutils.xml2sql import xml2sql

converter = xml2sql("samples/fruits.xml", "samples/fruits.sql", encoding="utf-8")
converter.convert(tag="item", table="table")
```

######Arguments
```
tag -- the record tag. eg: item
table -- table name
ignore -- list of tags to ignore
limit -- maximum number of records to process
packet -- maximum size of an insert query in MB (MySQL's max_allowed_packet)

Returns:
{ num: number of records converted,
num_insert: number of sql insert statements generated
}
```

##xmlutils.xml2csv
```python
from xmlutils.xml2csv import xml2csv

converter = xml2csv("samples/fruits.xml", "samples/fruits.sql", encoding="utf-8")
converter.convert(tag="item")
```

######Arguments
```
tag -- the record tag. eg: item
delimiter -- csv field delimiter
ignore -- list of tags to ignore
limit -- maximum number of records to process
buffer -- number of records to keep in buffer before writing to disk

Returns:
number of records converted
```

##xmlutils.xml2json
```python
from xmlutils.xml2json import xml2json

converter = xml2json("samples/fruits.xml", "samples/fruits.sql", encoding="utf-8")
converter.convert()

# to get a json string
converter = xml2json("samples/fruits.xml", encoding="utf-8")
print converter.get_json()
```

######Arguments
```
pretty -- pretty print?
```

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xmlutils-0.91.zip (10.1 kB view details)

Uploaded Source

File details

Details for the file xmlutils-0.91.zip.

File metadata

  • Download URL: xmlutils-0.91.zip
  • Upload date:
  • Size: 10.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for xmlutils-0.91.zip
Algorithm Hash digest
SHA256 546feb82c7999cd90609046875b02fd2c807465076c8d317f97245a1dbd1ccba
MD5 b18585d14ed8b1197001845193e3cc65
BLAKE2b-256 923be4b28f1bc4a27240ea974ea9b1643e94e404387e71a334cc113efdc55115

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page