No project description provided
Project description
xdwarf
Mining XML to tabular data, FAAAAAAST
Install
pip install xdwarf
The library is an wrapping on rust_dwarf
, a rust based mining tool.
Mining
# finding in glob pattern, project name, use all - 2 CPUs
dwarf = Dwarf.from_glob("../test/data/*.xml", "PMC",-2)
Define the mining detail as xpath query pattern, chaining multistage mining is well supported.
dwarf.find_one('article-meta > article-id[pub-id-type=pmid]' , "pmid")
dwarf.find_one("abstract", "abstract").find_many("p", "paragraph")
# mining stage can be chained to longer detials
reference = dwarf.find_one("ref-list", "ref_list").find_many("ref","reference")
reference.find_one("pub-id[pub-id-type=pmid]", "ref_id")
reference.find_one("pub-id[pub-id-type=doi]", "doi")
ref_name = reference.find_many("name", "ref_name")
ref_name.find_one("surname", "ref_surname")
dwarf.set_necessary("pmid")
dwarf.create_children()
Mining start
result = dwarf()
See result
result.child_df().head(2)
See child result
result['ref_list'].child_df().head()
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
xdwarf-0.1.0.tar.gz
(4.8 kB
view details)
Built Distribution
xdwarf-0.1.0-py3-none-any.whl
(16.1 kB
view details)
File details
Details for the file xdwarf-0.1.0.tar.gz
.
File metadata
- Download URL: xdwarf-0.1.0.tar.gz
- Upload date:
- Size: 4.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/47.3.1 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.7.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | fa640061be87bba9e3fde8270b2ef266927ee0f1d437177834b2efc5822a9818 |
|
MD5 | b721104f82667a5338a87762e3d4b34e |
|
BLAKE2b-256 | 83738f2519f6e77c4b01cc1b6f68bd39102896359468d93ea687873c0486f8ce |
File details
Details for the file xdwarf-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: xdwarf-0.1.0-py3-none-any.whl
- Upload date:
- Size: 16.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/47.3.1 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.7.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ff133af7e271f6ff9b188d2bbbc77604997338ff0fe036ebbe807de6addf0f44 |
|
MD5 | 451ef4a6a1b95ae5f2370826cc2fe319 |
|
BLAKE2b-256 | b096af1021f53c9c75906264ffcbed2b44e94082f61651c71184d14ffc1636e4 |