Skip to main content

No project description provided

Project description

xdwarf

Mining XML to tabular data, FAAAAAAST

PyPI version Python version License PyPI Downloads

Install

pip install xdwarf

The library is an wrapping on rust_dwarf, a rust based mining tool.

Mining

# finding in glob pattern, project name, use all - 2 CPUs
dwarf = Dwarf.from_glob("../test/data/*.xml", "PMC",-2)

Define the mining detail as xpath query pattern, chaining multistage mining is well supported.

dwarf.find_one('article-meta > article-id[pub-id-type=pmid]' , "pmid")
dwarf.find_one("abstract", "abstract").find_many("p", "paragraph")

# mining stage can be chained to longer detials
reference = dwarf.find_one("ref-list", "ref_list").find_many("ref","reference")
reference.find_one("pub-id[pub-id-type=pmid]", "ref_id")
reference.find_one("pub-id[pub-id-type=doi]", "doi")
ref_name = reference.find_many("name", "ref_name")
ref_name.find_one("surname", "ref_surname")
dwarf.set_necessary("pmid")
dwarf.create_children()

Mining start

result = dwarf()

See result

result.child_df().head(2)

See child result

result['ref_list'].child_df().head()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xdwarf-0.1.0.tar.gz (4.8 kB view details)

Uploaded Source

Built Distribution

xdwarf-0.1.0-py3-none-any.whl (16.1 kB view details)

Uploaded Python 3

File details

Details for the file xdwarf-0.1.0.tar.gz.

File metadata

  • Download URL: xdwarf-0.1.0.tar.gz
  • Upload date:
  • Size: 4.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/47.3.1 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.7.4

File hashes

Hashes for xdwarf-0.1.0.tar.gz
Algorithm Hash digest
SHA256 fa640061be87bba9e3fde8270b2ef266927ee0f1d437177834b2efc5822a9818
MD5 b721104f82667a5338a87762e3d4b34e
BLAKE2b-256 83738f2519f6e77c4b01cc1b6f68bd39102896359468d93ea687873c0486f8ce

See more details on using hashes here.

File details

Details for the file xdwarf-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: xdwarf-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 16.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/47.3.1 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.7.4

File hashes

Hashes for xdwarf-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ff133af7e271f6ff9b188d2bbbc77604997338ff0fe036ebbe807de6addf0f44
MD5 451ef4a6a1b95ae5f2370826cc2fe319
BLAKE2b-256 b096af1021f53c9c75906264ffcbed2b44e94082f61651c71184d14ffc1636e4

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page