A simple XML file and string reader to read big XML files and strings using iterators with optional conversion to dict

These details have not been verified by PyPI

Project links

Homepage

Project description

xml_stream

A simple XML file and string reader that is able to read big XML files and strings by using streams (iterators), with an option to convert to dictionaries

Description

xml_stream comprises two helper functions:

read_xml_file

When given a path to a file and the name of the tag that holds the relevant data, it returns an iterator of the data as xml.etree.ElementTree.Element object by default, or as dicts when to_dict argument is True

read_xml_string

When given an XML string and the name of the tag that holds the relevant data, it returns an iterator of the data as xml.etree.ElementTree.Element object by default, or as dicts when to_dict argument is True

Main Dependencies

Python +3.6

Getting Started

Install the package
```
pip install xml_stream
```

Import the read_xml_file and the read_xml_string classes and use accordingly

from xml_stream import read_xml_file, read_xml_string

xml_string = """
<company>
      <staff>
          <operations_department>
              <employees>
                  <team>Marketing</team>
                  <location name="head office" address="Kampala, Uganda" />
                  <bio first_name="John" last_name="Doe">John Doe</bio>
                  <bio first_name="Jane" last_name="Doe">Jane Doe</bio>
                  <bio first_name="Peter" last_name="Doe">Peter Doe</bio>
              </employees>
              <employees>
                  <team>Customer Service</team>
                  <location name="Kampala branch" address="Kampala, Uganda" />
                  <bio first_name="Mary" last_name="Doe">Mary Doe</bio>
                  <bio first_name="Harry" last_name="Doe">Harry Doe</bio>
                  <bio first_name="Paul" last_name="Doe">Paul Doe</bio>
              </employees>
          </operations_department>
      </staff>
</company>
"""

file_path = '...' # path to your XML file

# For XML strings, use read_xml_string which returns an iterator  
for element in read_xml_string(xml_string, records_tag='staff'):
    # returns the element as xml.etree.ElementTree.Element by default
    # ...do something with the element
    print(element)

for element_as_dict in read_xml_string(xml_string, records_tag='staff', to_dict=True):
    # returns the element as dictionary
    # ...do something with the element dictionary
    print(element_as_dict)
    # will print
    """
    {
          'operations_department': {
              'employees': [
                  [
                      {
                          'team': 'Marketing',
                          'location': {
                              'name': 'head office',
                              'address': 'Kampala, Uganda'
                          },
                          'first_name': 'John',
                          'last_name': 'Doe',
                          '_value': 'John Doe'

                      },
                      {
                          'team': 'Marketing',
                          'location': {
                              'name': 'head office',
                              'address': 'Kampala, Uganda'
                          },
                          'first_name': 'Jane',
                          'last_name': 'Doe',
                          '_value': 'Jane Doe'

                      },
                      {
                          'team': 'Marketing',
                          'location': {
                              'name': 'head office',
                              'address': 'Kampala, Uganda'
                          },
                          'first_name': 'Peter',
                          'last_name': 'Doe',
                          '_value': 'Peter Doe'

                      }, ],
                  [
                      {
                          'team': 'Customer Service',
                          'location': {
                              'name': 'Kampala branch',
                              'address': 'Kampala, Uganda'
                          },
                          'first_name': 'Mary',
                          'last_name': 'Doe',
                          '_value': 'Mary Doe'

                      },
                      {
                          'team': 'Customer Service',
                          'location': {
                              'name': 'Kampala branch',
                              'address': 'Kampala, Uganda'
                          },
                          'first_name': 'Harry',
                          'last_name': 'Doe',
                          '_value': 'Harry Doe'

                      },
                      {
                          'team': 'Customer Service',
                          'location': {
                              'name': 'Kampala branch',
                              'address': 'Kampala, Uganda'
                          },
                          'first_name': 'Paul',
                          'last_name': 'Doe',
                          '_value': 'Paul Doe'

                      }
                  ],
              ]
          }
    }
    """

# For XML files (even really large ones), use read_xml_file which also returns an iterator  
for element in read_xml_file(file_path, records_tag='staff'):
    # returns the element as xml.etree.ElementTree.Element by default
    # ...do something with the element
    print(element)

for element_as_dict in read_xml_file(file_path, records_tag='staff', to_dict=True):
    # returns the element as dictionary
    # ...do something with the element dictionary
    print(element_as_dict)
    # see the print output for read_xml_string

How to test

Clone the repo and enter its root folder

git clone https://github.com/sopherapps/xml_stream.git && cd xml_stream

Create a virtual environment and activate it

virtualenv -p /usr/bin/python3.6 env && source env/bin/activate

Install the dependencies
```
pip install -r requirements.txt
```
Download a huge xml file for test purposes and save it in the /test folder as huge_mock.xml
Run the test command
```
python -m unittest
```

Acknowledgements

This Stack Overflow Answer about converting XML to dict was very helpful.
This Real Python tutorial on publishing packages was very helpful

License

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.0.8

Sep 9, 2021

0.0.7

Jan 22, 2021

This version

0.0.6

Jan 22, 2021

0.0.5

Jan 22, 2021

0.0.4

Sep 28, 2020

0.0.3

Sep 28, 2020

0.0.2

Sep 26, 2020

0.0.1

Sep 26, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xml_stream-0.0.6.tar.gz (7.0 kB view details)

Uploaded Jan 22, 2021 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

xml_stream-0.0.6-py3-none-any.whl (6.2 kB view details)

Uploaded Jan 22, 2021 Python 3

File details

Details for the file xml_stream-0.0.6.tar.gz.

File metadata

Download URL: xml_stream-0.0.6.tar.gz
Upload date: Jan 22, 2021
Size: 7.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.6.9

File hashes

Hashes for xml_stream-0.0.6.tar.gz
Algorithm	Hash digest
SHA256	`8fb473693e2c34eb1034a672b44034df9b84ac22a8226efcf66edd378184210c`
MD5	`df4610dc9f0ae311fad0258fbbf6689d`
BLAKE2b-256	`e6bb9da342274b53f19b5e495caafc2c85c38609e64b77218e03e2b071fe05b5`

See more details on using hashes here.

File details

Details for the file xml_stream-0.0.6-py3-none-any.whl.

File metadata

Download URL: xml_stream-0.0.6-py3-none-any.whl
Upload date: Jan 22, 2021
Size: 6.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.6.9

File hashes

Hashes for xml_stream-0.0.6-py3-none-any.whl
Algorithm	Hash digest
SHA256	`378bc4be10d0ad903e8f26a2a476acb182bfbcbf9f8ae1d822378b14b8983572`
MD5	`1c0ae623bcbf665b4bb08be37f4f2f7d`
BLAKE2b-256	`e0a25814d05812b050e3d797448976b10c402357ce2831f5d6f667132e807ad3`

See more details on using hashes here.

xml-stream 0.0.6

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

xml_stream

Description

read_xml_file

read_xml_string

Main Dependencies

Getting Started

How to test

Acknowledgements

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes