Skip to main content

decimate xml data while keeping others intact

Project description

Xml Subsetter

decimate data while keeping others intact

before

# bulk.xml
<r>
    <meta>
    some meta data
    </meta>

    <something>
    thing thing thing thing
    </something>

    <e>e0</e>
    <e>e1</e>
    <e>e2</e>
    <e>e3</e>
    <some-annoying-non-data-you-have-to-keep-1>ah yah yah</some-annoying-non-data-you-have-to-keep-1>
    <some-annoying-non-data-you-have-to-keep-2>ah yah yah</some-annoying-non-data-you-have-to-keep-2>
    <some-annoying-non-data-you-have-to-keep-3>ah yah yah</some-annoying-non-data-you-have-to-keep-3>
    <e>e4</e>
    <e>e5</e>
    <e>e6</e>
    <e>e7</e>
    <e>e8</e>
    <e>e9</e>
    <e>e10</e>
    ...
    <e>e99</e>
</r>

subset_head("bulk.xml", target_file='/tmp/small.xml', data_tag='e',ratio=0.05)

after

# small.xml
<r>
    <meta>
    some meta data
    </meta>

    <something>
    thing thing thing thing
    </something>

    <e>e0</e>
    <e>e1</e>
    <e>e2</e>
    <e>e3</e>
    <some-annoying-non-data-you-have-to-keep-1>ah yah yah</some-annoying-non-data-you-have-to-keep-1>
    <some-annoying-non-data-you-have-to-keep-2>ah yah yah</some-annoying-non-data-you-have-to-keep-2>
    <some-annoying-non-data-you-have-to-keep-3>ah yah yah</some-annoying-non-data-you-have-to-keep-3>
    <e>e4</e>
</r>

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xml-subsetter-0.0.2.tar.gz (4.7 kB view details)

Uploaded Source

File details

Details for the file xml-subsetter-0.0.2.tar.gz.

File metadata

  • Download URL: xml-subsetter-0.0.2.tar.gz
  • Upload date:
  • Size: 4.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.6.0 requests-toolbelt/0.9.1 tqdm/4.37.0 CPython/3.7.3

File hashes

Hashes for xml-subsetter-0.0.2.tar.gz
Algorithm Hash digest
SHA256 3c102e20288d37d1fff6e081b3e22339acc710e8e24458a050bd777985589837
MD5 f641bb0e774ad2d20c89b96d49b14aec
BLAKE2b-256 6eab37c989d1c2bda5f47e5e5157bd5ec1ee4c13ae4577ec8b50cbbf13fc904c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page