Skip to main content

Split an XML document by milestone element.

Project description

This is useful for XML files containing multiple hierarchies.

Example

One example is an XML dialect called TEI which might have been created to represent a book using chapters (div elements) but you want to use the text by page (pb element).

Imagine an XML file called myfile.xml that contains milestone elements <pb/> dotted throughout the XML.

The following command will split the input file into separate output files for each pb element:

python3 milestone.py -t pb myfile.xml

Or if you have installed the module through pip, you can use:

python3 -m milestone -b pb myfile.xml

The above commands will name the output files with integers.

Now imagine that the <pb> elements have an attribute called ‘n’ that we want to use for the name of each output file.

The following command will split the input file into separate output files, named according to the ‘n’ attribute:

python3 milestone.py -t pb -n n myfile.xml

If you want to transform the hierarchy but not split the data into separate files, you can use the -x flag:

python3 milestone.py -x -t pb -n n myfile.xml > outputfile.xml

To use this as a library in your own code, import the Milestone class:

from milestone import Milestone

To share ideas or improvements, please visit the github project at:

https://github.com/zeth/milestone

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

milestone-0.2.tar.gz (5.0 kB view details)

Uploaded Source

File details

Details for the file milestone-0.2.tar.gz.

File metadata

  • Download URL: milestone-0.2.tar.gz
  • Upload date:
  • Size: 5.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for milestone-0.2.tar.gz
Algorithm Hash digest
SHA256 5234711acd6e4f5c6cfe8c8a432ddd6dc9cc3f95338c9e244838a81cd1d1e8e1
MD5 a998da9e49962f89d8e40666d62a9de1
BLAKE2b-256 e7a500ece4f17e66124a1d9338abd4b8fb8e7f877a2cc3bbda2889b67d5f44dd

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page