A small python library for quickly traversing XML data.
Project description
## Basic Usage
import drill
doc = drill.parse(path_or_url_or_string)
# Drill down to a specific element.
print unicode(doc.book.title)
# Iterate through all "title" tags in the document.
for t in doc.iter('title'):
print t.attrs, t.data
# Find all "bar" nodes with a "baz" child and a "foo" parent.
q = doc.find('//foo/bar[baz]')
# Easily access the first and last elements of matching results.
print q.first(), q.last()
# Iterate over results.
for e in q:
do_something(e)
# Parse only elements matching some path
for e in drill.iterparse(filelike, xpath='root/*/something'):
print e.tagname, e.data
## Features
* Runnable test suite
* Python 3 support
## Advantages
* Pure python
* Faster, more efficient parsing than ElementTree
* Using ElementTree, a ~150 MB XML file (~3,000,000 elements) took ~46 seconds to parse, consuming ~1.3 GB of RAM
* Parsing the same file using drill took ~24 seconds and consumed ~950 MB of RAM
* Very unscientific benchmarks performed on a Core i5 @ 2.8 GHz, running Windows 7. YMMV.
* Lots of convenience methods for accessing elements quickly:
* doc.response.resultCode.data
* root[2].child['attr']
* first/last/prev/next methods for traversing siblings
import drill
doc = drill.parse(path_or_url_or_string)
# Drill down to a specific element.
print unicode(doc.book.title)
# Iterate through all "title" tags in the document.
for t in doc.iter('title'):
print t.attrs, t.data
# Find all "bar" nodes with a "baz" child and a "foo" parent.
q = doc.find('//foo/bar[baz]')
# Easily access the first and last elements of matching results.
print q.first(), q.last()
# Iterate over results.
for e in q:
do_something(e)
# Parse only elements matching some path
for e in drill.iterparse(filelike, xpath='root/*/something'):
print e.tagname, e.data
## Features
* Runnable test suite
* Python 3 support
## Advantages
* Pure python
* Faster, more efficient parsing than ElementTree
* Using ElementTree, a ~150 MB XML file (~3,000,000 elements) took ~46 seconds to parse, consuming ~1.3 GB of RAM
* Parsing the same file using drill took ~24 seconds and consumed ~950 MB of RAM
* Very unscientific benchmarks performed on a Core i5 @ 2.8 GHz, running Windows 7. YMMV.
* Lots of convenience methods for accessing elements quickly:
* doc.response.resultCode.data
* root[2].child['attr']
* first/last/prev/next methods for traversing siblings
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
drill-1.2.0.tar.gz
(7.5 kB
view details)
Built Distribution
File details
Details for the file drill-1.2.0.tar.gz
.
File metadata
- Download URL: drill-1.2.0.tar.gz
- Upload date:
- Size: 7.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.20.1 setuptools/40.6.2 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.7.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d2645ed6d3cfc925bd7bf5328982d8a5aff7cda9c7e56107c7a74482f7037b7d |
|
MD5 | 8b995f9ce6739ee3f2722b4aff6c065e |
|
BLAKE2b-256 | e4213d1dec8958c74c3d1f46a6f264e12b146a4b97458240a68ad10ab3a41031 |
File details
Details for the file drill-1.2.0-py2.py3-none-any.whl
.
File metadata
- Download URL: drill-1.2.0-py2.py3-none-any.whl
- Upload date:
- Size: 8.4 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.20.1 setuptools/40.6.2 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.7.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | fb5a1eae68993076d033034dd446a832f2b9757222478c47f0cf129bafe70a74 |
|
MD5 | 0183625d1a2b40b1ab2d9465c6e16448 |
|
BLAKE2b-256 | 736c2871f4b4ad4dbc2d0fc7078e1b91fbdd2a62b33b9ffe9cba3dc610fd669b |