lightweight, simple, and fast declarative XML and JSON data extraction
Project description
Summary
Simple declarative data extraction and loading in Python, featuring:
- 🍰 Ease of use: Data extraction is performed in a simple, declarative types.
- ⚙ XML / HTML / JSON Extraction: Extraction can be performed across a wide array of structured data
- 🐼 Pandas Integration: Results are easily castable to Pandas Dataframes and Series.
- 😀 Custom Output Classes: Results can be automatically loaded into autogenerated dataclasses, or custom model types.
- 🚀 Performance: XML loading is supported by the excellent and fast lxml library, JSON is supported by UltraJSON for fast parsing, and jsonpath_ng for flexible data extraction.
Quick Start
To extract data from XML, use this import statement, and see the example below:
from yankee.xml.schema import Schema, fields as f, CSSSelector
To extract data from JSON, use this import statement, and see the example below:
from yankee.xml.schema import Schema, fields as f, JSONPath
To extract data from HTML, use this import statement:
from yankee.html.schema import Schema, fields as f, CSSSelector
To extract data from Python objects (either objects or dictionaries), use this import statement:
from yankee.base.schema import Schema, fields as f
Documentation
Complete documentation is available on Read The Docs
Examples
Extract data from XML
Data extraction from XML. By default, data keys are XPath expressions, but can also be CSS selectors.
Take this:
<xmlObject>
<name>Johnny Appleseed</name>
<birthdate>2000-01-01</birthdate>
<something>
<many>
<levels>
<deep>123</deep>
</levels>
</many>
</something>
</xmlObject>
Do this:
from yankee.xml.schema import Schema, fields as f, CSSSelector
class XmlExample(Schema):
name = f.String("./name")
birthday = f.Date(CSSSelector("birthdate"))
deep_data = f.Int("./something/many/levels/deep")
XmlExample().load(xml_doc)
Get this:
{
"name": "Johnny Appleseed",
"birthday": datetime.date(2000, 1, 1),
"deep_data": 123
}
Extract data from JSON
Data extraction from JSON. By default, data keys are implied from the field names, but can also be JSONPath expressions
Take this:
{
"name": "Johnny Appleseed",
"birthdate": "2000-01-01",
"something": [
{"many": {
"levels": {
"deep": 123
}
}}
]
}
Do this:
from yankee.json.schema import Schema, fields as f
class JsonExample(Schema):
name = f.String()
birthday = f.Date("birthdate")
deep_data = f.Int("something.0.many.levels.deep")
Get this:
{
"name": "Johnny Appleseed",
"birthday": datetime.date(2000, 1, 1),
"deep_data": 123
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file yankee-0.1.46.tar.gz
.
File metadata
- Download URL: yankee-0.1.46.tar.gz
- Upload date:
- Size: 88.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.5.1 CPython/3.11.4 Darwin/23.3.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 49fe7255152e8a7766470962e55b002962c016c370e13c4cbfc15f09744ace58 |
|
MD5 | 4ee9c7fe24cfca3b48fe8c607eb75b39 |
|
BLAKE2b-256 | 582d0de858d7393eb462c15b8f480abae305d329795e0c2b11ae6cdf058ebdf8 |
File details
Details for the file yankee-0.1.46-py3-none-any.whl
.
File metadata
- Download URL: yankee-0.1.46-py3-none-any.whl
- Upload date:
- Size: 103.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.5.1 CPython/3.11.4 Darwin/23.3.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9932ce72e8fc5146ec9f429f8efb2d204be48996b13fadcfc2df3cffe4520444 |
|
MD5 | 02374bef140b5110b7f237c159f66aa5 |
|
BLAKE2b-256 | b61663f4bbaa035ce9a599031cc20fbd2e39d39d7a5004b605e148973b1b3f7a |