Skip to main content

lightweight, simple, and fast declarative XML and JSON data extraction

Project description

Yankee - Simple Declarative Data Extraction from XML and JSON

This is kind of like Marshmallow, but only does deserialization. What it lacks in reversibility, it makes up for in speed. Schemas are compiled in advance allowing data extraction to occur very quickly.

Motivation

I have another package called patent_client. I also do a lot with legal data, some of which is in XML, and some of which is in JSON. But there's a lot of it. And I mean a lot, so speed matters.

Quick Start

There are two main modules: yankee.json.schema and yankee.xml.schema. Those modules support defining class-style deserializers. Both start by subclassing a Schema class, and then defining attributes from the fields submodule.

JSON Deserializer Example

    from yankee.json import Schema, fields

    class JsonExample(Schema):
        name = fields.String()
        birthday = fields.Date("birthdate")
        deep_data = fields.Int("something.0.many.levels.deep")

    obj = {
        "name": "Johnny Appleseed",
        "birthdate": "2000-01-01",
        "something": [
            {"many": {
                "levels": {
                    "deep": 123
                }
            }}
        ]
    }

    JsonExample().deserialize(obj)
    # Returns
    {
        "name": "Johnny Appleseed",
        "birthday": datetime.date(2000, 1, 1),
        "deep_data": 123
    }

For JSON, the attributes are filled by pulling values off of the JSON object. If no path is provided, then the attribute name is used. Otherwise, a dotted string can be used to pluck an item from the JSON object.

XML Deserializer Example

    import lxml.etree as ET
    from yankee.xml import Schema, fields

    class XmlExample(Schema):
        name = fields.String("./name")
        birthday = fields.Date("./birthdate")
        deep_data = fields.Int("./something/many/levels/deep")

    obj = ET.fromstring(b"""
    <xmlObject>
        <name>Johnny Appleseed</name>
        <birthdate>2000-01-01</birthdate>
        <something>
            <many>
                <levels>
                    <deep>123</deep>
                </levels>
            </many>
        </something>
    </xmlObject>
    """.strip())

    XmlExample().deserialize(obj)
    # Returns
    {
        "name": "Johnny Appleseed",
        "birthday": datetime.date(2000, 1, 1),
        "deep_data": 123
    }

For XML, the attributes are filled using XPath expressions. If no path is provided, then the entire object is passed to the field (no implicit paths). Any valid Xpath expression can be used.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

yankee-0.1.39.tar.gz (31.3 kB view details)

Uploaded Source

Built Distribution

yankee-0.1.39-py3-none-any.whl (43.4 kB view details)

Uploaded Python 3

File details

Details for the file yankee-0.1.39.tar.gz.

File metadata

  • Download URL: yankee-0.1.39.tar.gz
  • Upload date:
  • Size: 31.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.3.2 CPython/3.10.6 Darwin/21.6.0

File hashes

Hashes for yankee-0.1.39.tar.gz
Algorithm Hash digest
SHA256 3af61d0bf0ee8585a3c2104e087ea053dca7a3aa9aaccf9ec484664bee6c17a8
MD5 0572098d37f6cd36ccbc1b2b2460cc6a
BLAKE2b-256 9771fcaa1df1ac3a0d90062b2e23d8baf6e8f786bce6f241f8a03ca571a65b87

See more details on using hashes here.

File details

Details for the file yankee-0.1.39-py3-none-any.whl.

File metadata

  • Download URL: yankee-0.1.39-py3-none-any.whl
  • Upload date:
  • Size: 43.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.3.2 CPython/3.10.6 Darwin/21.6.0

File hashes

Hashes for yankee-0.1.39-py3-none-any.whl
Algorithm Hash digest
SHA256 ff7a6cb8a208249e2744e4d4d41f96430cf2d04c2b521c357035f0c99c76f34b
MD5 2f5f0a598e7c3cdc1a87ebf6321777b5
BLAKE2b-256 9e7abb38edf297303f06547c835017474b0afaf0d4041cd0cebf3fa7acbce1de

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page