Skip to main content

A simple XML file and string reader to read big XML files and strings using iterators with optional conversion to dict

Project description

xml_stream

A simple XML file and string reader that is able to read big XML files and strings by using streams (iterators), with an option to convert to dictionaries

Description

xml_stream comprises two helper functions:

read_xml_file

When given a path to a file and the name of the tag that holds the relevant data, it returns an iterator of the data as xml.etree.ElementTree.Element object by default, or as dicts when to_dict argument is True

read_xml_string

When given an XML string and the name of the tag that holds the relevant data, it returns an iterator of the data as xml.etree.ElementTree.Element object by default, or as dicts when to_dict argument is True

Main Dependencies

Getting Started

  • Install the package

    pip install xml_stream
    
  • Import the read_xml_file and the read_xml_string classes and use accordingly

    from xml_stream import read_xml_file, read_xml_string
    
    xml_string = """
    <company>
    <staff>
        <operations_department>
            <employees>
                <team>Marketing</team>
                <location name="head office" address="Kampala, Uganda" />
                <bio first_name="John" last_name="Doe">John Doe</bio>
                <bio first_name="Jane" last_name="Doe">John Doe</bio>
                <bio first_name="Peter" last_name="Doe">John Doe</bio>
            </employees>
            <employees>
                <team>Customer Service</team>
                <location name="Kampala branch" address="Kampala, Uganda" />
                <bio first_name="Mary" last_name="Doe">John Doe</bio>
                <bio first_name="Harry" last_name="Doe">John Doe</bio>
                <bio first_name="Paul" last_name="Doe">John Doe</bio>
            </employees>
        </operations_department>
    </staff>
    </company>
    """
    
    file_path = '...' # path to your XML file
    
    # For XML strings, use read_xml_string which returns an iterator  
    for element in read_xml_string(xml_string, records_tag='staff'):
        # returns the element as xml.etree.ElementTree.Element by default
        # ...do something with the element
        print(element)
    
    for element_as_dict in read_xml_string(xml_string, records_tag='staff', to_dict=True):
        # returns the element as dictionary
        # ...do something with the element dictionary
        print(element_as_dict)
        # will print
        """
        {
              'operations_department': {
                  'employees': [
                      [
                          {
                              'team': 'Marketing',
                              'location': {
                                  'name': 'head office',
                                  'address': 'Kampala, Uganda'
                              },
                              'first_name': 'John',
                              'last_name': 'Doe',
                              '_value': 'John Doe'
    
                          },
                          {
                              'team': 'Marketing',
                              'location': {
                                  'name': 'head office',
                                  'address': 'Kampala, Uganda'
                              },
                              'first_name': 'Jane',
                              'last_name': 'Doe',
                              '_value': 'Jane Doe'
    
                          },
                          {
                              'team': 'Marketing',
                              'location': {
                                  'name': 'head office',
                                  'address': 'Kampala, Uganda'
                              },
                              'first_name': 'Peter',
                              'last_name': 'Doe',
                              '_value': 'Peter Doe'
    
                          }, ],
                      [
                          {
                              'team': 'Customer Service',
                              'location': {
                                  'name': 'Kampala branch',
                                  'address': 'Kampala, Uganda'
                              },
                              'first_name': 'Mary',
                              'last_name': 'Doe',
                              '_value': 'Mary Doe'
    
                          },
                          {
                              'team': 'Customer Service',
                              'location': {
                                  'name': 'Kampala branch',
                                  'address': 'Kampala, Uganda'
                              },
                              'first_name': 'Harry',
                              'last_name': 'Doe',
                              '_value': 'Harry Doe'
    
                          },
                          {
                              'team': 'Customer Service',
                              'location': {
                                  'name': 'Kampala branch',
                                  'address': 'Kampala, Uganda'
                              },
                              'first_name': 'Paul',
                              'last_name': 'Doe',
                              '_value': 'Paul Doe'
    
                          }
                      ],
                  ]
              }
        }
        """
    
    # For XML files (even really large ones), use read_xml_file which also returns an iterator  
    for element in read_xml_file(file_path, records_tag='staff'):
        # returns the element as xml.etree.ElementTree.Element by default
        # ...do something with the element
        print(element)
    
    for element_as_dict in read_xml_file(file_path, records_tag='staff', to_dict=True):
        # returns the element as dictionary
        # ...do something with the element dictionary
        print(element_as_dict)
        # see the print output for read_xml_string
    

How to test

  • Clone the repo and enter its root folder

    git clone https://github.com/sopherapps/xml_stream.git && cd xml_stream
    
  • Create a virtual environment and activate it

    virtualenv -p /usr/bin/python3.6 env && source env/bin/activate
    
  • Install the dependencies

    pip install -r requirements.txt
    
  • Download a huge xml file for test purposes and save it in the /test folder as huge_mock.xml

  • Run the test command

    python -m unittest
    

Acknowledgements

License

Copyright (c) 2020 Martin Ahindura Licensed under the MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xml_stream-0.0.3.tar.gz (6.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

xml_stream-0.0.3-py3-none-any.whl (6.1 kB view details)

Uploaded Python 3

File details

Details for the file xml_stream-0.0.3.tar.gz.

File metadata

  • Download URL: xml_stream-0.0.3.tar.gz
  • Upload date:
  • Size: 6.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.6.9

File hashes

Hashes for xml_stream-0.0.3.tar.gz
Algorithm Hash digest
SHA256 69d2fbf17c0d81b88f91a830b3271e591778b792b6343fe38797b427587b8abc
MD5 736be9594c64ebec750408c41da0acef
BLAKE2b-256 5a50ea78c8e38e88791d60960ff12c5bde4c3dd2663bda0ac4ce091ef47bdc47

See more details on using hashes here.

File details

Details for the file xml_stream-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: xml_stream-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 6.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.6.9

File hashes

Hashes for xml_stream-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 acb5b906857ac53b6dda384c66c4b0b2a9de296984f7c7bece03a0b3f89f74a7
MD5 08b08c81b4bd2053cfc760908aa7dfff
BLAKE2b-256 75c93da9d0e444f1c05c17ad97488d838a442bc4293369e0bea6f3b686111920

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page