Skip to main content

Data transformation and manipulation library

Project description

Build status Documentation Status code style black pypi package version
warning:

EasyData is in early stages of development; backwards incompatible changes are possible without deprecation warning until beta status is reached and therefore is not suitable to be used in production.

Overview

EasyData is data object pattern that provides transformation of item data from various sources (text, html, xml, json, dictionaries, lists and others) to a python dictionary with option to even combine different types of sources in order to transform to dictionary.

It uses component based mapping at the hearth and it’s concept is similar to ORM-like models.

Documentation

Documentation is available online at https://easydata.readthedocs.io/ and in the docs directory.

The benefits of using EasyData are:

  • focusing on the object-oriented business logic

  • uniform extraction logic between various sources

  • speeds up development process of creating a transformer/parser significantly

  • time reduction regarding maintenance since it offers clear readability and clarity regarding what each components does.

  • extraction and parsing logic re-usability

  • high and low level option for parsing so that we don’t hit any limitations

  • option to create custom components for specific needs if needed

  • defaults can be changed through configuration on various levels

  • creating test cases is a breeze since each component was created to be used independently if needed.

  • autocomplete works for all parameters on public classes or methods.

Applications:

  • Web scraping. It can easily be integrated with scrapy or any other python based solution or even your own.

  • Transforming API and FEED data from various formats.

  • Transforming/preparing data for API or FEED.

  • Transforming/preparing data for a database.

Requirements

  • Python 3.8+

  • Works on Linux, Windows, macOS, BSD

Install

The quick way:

pip install easydata

See the install section in the documentation at https://easydata.readthedocs.io/en/latest/installation.html for more details.

Example

Bellow we will give just a simple example, so you can get some presentation, how EasyData works. For more advanced examples or tutorials please refer to documentation.

Lets make transformation on a following HTML:

test_html = """
    <html>
        <body>
            <h2 class="name">
                <div class="brand">EasyData</div>
                Test Product Item
            </h2>
            <div id="description">
                <p>Basic product info. EasyData product is newest
                addition to python <b>world</b></p>
                <ul>
                    <li>Color: Black</li>
                    <li>Material: Aluminium</li>
                </ul>
            </div>
            <div id="price">Was 99.9</div>
            <div id="sale-price">49.9</div>
            <div class="images">
                <img src="http://demo.com/img1.jpg" />
                <img src="http://demo.com/img2.jpg" />
                <img src="http://demo.com/img2.jpg" />
            </div>
            <div class="stock" available="Yes">In Stock</div>
        </body>
    </html>
"""

Now lets create an ItemModel which will process HTML above and parse it to item dict.

import easydata as ed


class ProductItemModel(ed.ItemModel):
    item_name = ed.Text(
        ed.pq('.name::text'),
    )

    item_brand = ed.Text(
        ed.pq('.brand::text')
    )

    item_description = ed.Description(
        ed.pq('#description::text')
    )

    item_price = ed.PriceFloat(
        ed.pq('#price::text')
    )

    item_sale_price = ed.PriceFloat(
        ed.pq('#sale-price::text')
    )

    item_color = ed.Feature(
        ed.pq('#description::text'),
        key='color'
    )

    item_stock = ed.Has(
        ed.pq('.stock::attr(available)'),
        contains=['yes']
    )

    item_images = ed.List(
        ed.pq('.images img::items'),
        parser=ed.UrlParser(
            ed.pq('::src')
        )
    )

    """
    Alternative with selecting src values in a first css query:

        item_images = ed.ListParser(
            ed.pq('.images img::src-items'),
            parser=ed.UrlParser()
        )
    """

In example bellow we will demonstrate how newly created ProductItemModel will parse provided HTML data into dict object.

>>> item_model = ProductItemModel()

>>> item_model.parse_item(test_html)

Output:

{
    'brand': 'EasyData',
    'description': 'Basic product info. EasyData product is newest addition \
                    to python world. Color: Black. Material: Aluminium.',
    'color': 'Black',
    'images': [
        'http://demo.com/img1.jpg',
        'http://demo.com/img2.jpg',
        'http://demo.com/img3.jpg'
    ],
    'name': 'EasyData Test Product Item',
    'price': 99.9,
    'sale_price': 49.9,
    'stock': True
}

Contributing

Yes please! We are always looking for contributions, additions and improvements.

See https://easydata.readthedocs.io/en/latest/contributing.html for more details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

easydata-0.3.11.tar.gz (62.6 kB view details)

Uploaded Source

Built Distribution

easydata-0.3.11-py2.py3-none-any.whl (91.7 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file easydata-0.3.11.tar.gz.

File metadata

  • Download URL: easydata-0.3.11.tar.gz
  • Upload date:
  • Size: 62.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.11.0

File hashes

Hashes for easydata-0.3.11.tar.gz
Algorithm Hash digest
SHA256 6b9623d1050ff63057c7b1f350cbdd675bd343a727de26d04bf5bab6dcc50152
MD5 69d1050c3c4444fba8aecb51420707bc
BLAKE2b-256 452f7ae08a50976c9660ea0698b261781f0d3c7dada44374fa4fa1e082706d2e

See more details on using hashes here.

File details

Details for the file easydata-0.3.11-py2.py3-none-any.whl.

File metadata

  • Download URL: easydata-0.3.11-py2.py3-none-any.whl
  • Upload date:
  • Size: 91.7 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.11.0

File hashes

Hashes for easydata-0.3.11-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 0bb005ee9212bab04175fb71fcd0a0ec69d8cf8383f58eb64d410399e6c93203
MD5 a259d09b71f907511423d5afbfb2fa9d
BLAKE2b-256 dfd46c680bac159515b1709445ed94744ee0169a8ec7f5d32c3e37c48591f087

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page