Skip to main content

lightweight, simple, and fast declarative XML and JSON data extraction

Project description

Gelatin Extract

This is kind of like Marshmallow, but only does deserialization. What it lacks in reversibility, it makes up for in speed. Schemas are compiled in advance allowing deserialization to occur very quickly.

Motivation

I have another package called patent_client. I also do a lot with legal data, some of which is in XML, and some of which is in JSON. But there's a lot of it. And I mean a lot, so speed matters.

Quick Start

There are two main modules: gelatin_extract.json.schema and gelatin_extract.xml.schema. Those modules support defining class-style deserializers. Both start by subclassing a Schema class, and then defining attributes from the fields submodule.

JSON Deserializer Example

    from gelatin_extract.json.schema import Schema
    from gelatin_extract.json.schema import fields

    class JsonExample(Schema):
        name = fields.String()
        birthday = fields.Date("birthdate")
        deep_data = fields.Int("something.0.many.levels.deep")

    obj = {
        "name": "Johnny Appleseed",
        "birthdate": "2000-01-01",
        "something": [
            {"many": {
                "levels": {
                    "deep": 123
                }
            }}
        ]
    }

    JsonExample().deserialize(obj)
    # Returns
    {
        "name": "Johnny Appleseed",
        "birthday": datetime.date(2000, 1, 1),
        "deep_data": 123
    }

For JSON, the attributes are filled by pulling values off of the JSON object. If no path is provided, then the attribute name is used. Otherwise, a dotted string can be used to pluck an item from the JSON object.

XML Deserializer Example

    import lxml.etree as ET
    from sugar.xml.schema import Schema
    from sugar.xml.schema import fields

    class XmlExample(Schema):
        name = fields.String("./name")
        birthday = fields.Date("./birthdate")
        deep_data = fields.Int("./something/many/levels/deep")

    obj = ET.fromstring(b"""
    <xmlObject>
        <name>Johnny Appleseed</name>
        <birthdate>2000-01-01</birthdate>
        <something>
            <many>
                <levels>
                    <deep>123</deep>
                </levels>
            </many>
        </something>
    </xmlObject>
    """.strip())

    XmlExample().deserialize(obj)
    # Returns
    {
        "name": "Johnny Appleseed",
        "birthday": datetime.date(2000, 1, 1),
        "deep_data": 123
    }

For XML, the attributes are filled using XPath expressions. If no path is provided, then the entire object is passed to the field (no implicit paths). Any valid Xpath expression can be used.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

yankee-0.1.2.tar.gz (14.3 kB view details)

Uploaded Source

Built Distribution

yankee-0.1.2-py3-none-any.whl (18.5 kB view details)

Uploaded Python 3

File details

Details for the file yankee-0.1.2.tar.gz.

File metadata

  • Download URL: yankee-0.1.2.tar.gz
  • Upload date:
  • Size: 14.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.6 CPython/3.9.12 Darwin/21.5.0

File hashes

Hashes for yankee-0.1.2.tar.gz
Algorithm Hash digest
SHA256 4ed528db9f513093eaf7bbcb582574ff68b5e3255ed5de6525aacccba1535f8f
MD5 50b70a4b507b8cbbb7750bc70349438a
BLAKE2b-256 d223132910c4ffe4f538ae9cb0b4ed9f3e4c224053f8610367176a05b60a4b6e

See more details on using hashes here.

File details

Details for the file yankee-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: yankee-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 18.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.6 CPython/3.9.12 Darwin/21.5.0

File hashes

Hashes for yankee-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 a77f257a0839157519ebd42f99ed2038ed029370d9bf383f2de753d255e797e8
MD5 efeb97852b06bf3d1b931489c843e629
BLAKE2b-256 0636ec7b35ffd923cbff52af1f824d3942233ef3b6dec863311bf656eb869235

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page