Skip to main content

DataYoga for Python

Project description

datayoga-py

Introduction

datayoga-py is the transformation engine used in DataYoga, a framework for building and generating data pipelines.

Installation

pip install datayoga

Quick Start

This demonstrates how to transform data using a DataYoga job.

Create a Job

Use this example.yaml:

steps:
  - uses: add_field
    with:
      field: full_name
      language: jmespath
      expression: '{ "fname": fname, "lname": lname} | join('' '', values(@))'
  - uses: rename_field
    with:
      from_field: fname
      to_field: first_name
  - uses: rename_field
    with:
      from_field: lname
      to_field: last_name
  - uses: remove_field
    with:
      field: credit_card
  - uses: add_field
    with:
      field: country
      language: sql
      expression: country_code || ' - ' || UPPER(country_name)
  - uses: remove_field
    with:
      field: country_name
  - uses: remove_field
    with:
      field: country_code
  - uses: map
    with:
      object:
        {
          first_name: first_name,
          last_name: last_name,
          greeting: "'Hello ' || CASE WHEN gender = 'F' THEN 'Ms.' WHEN gender = 'M' THEN 'Mr.' ELSE 'N/A' END || ' ' || full_name",
          country: country,
          full_name: full_name
        }
      language: sql

Transform Data Using datayoga-py

Use this code snippet to transform a data record using the job defined above:

import datayoga as dy
from datayoga.job import Job
from datayoga.utils import read_yaml

job_settings = read_yaml("example.yaml")
job = dy.compile(job_settings)

assert job.transform({"fname": "jane", "lname": "smith", "country_code": 1, "country_name": "usa", "credit_card": "1234-5678-0000-9999", "gender": "F"}) == {"first_name": "jane", "last_name": "smith", "country": "1 - USA", "full_name": "jane smith", "greeting": "Hello Ms. jane smith"}

As can be seen, the record has been transformed based on the job:

  • fname field renamed to first_name.
  • lname field renamed to last_name.
  • country field added based on an SQL expression.
  • full_name field added based on a JMESPath expression.
  • greeting field added based on an SQL expression.

Examples

  • Add a new field country out of an SQL expression that concatenates country_code and country_name fields after upper case the later:

    uses: add_field
    with:
      field: country
      language: sql
      expression: country_code || ' - ' || UPPER(country_name)
    
  • Rename field lname to last_name:

    uses: rename_field
    with:
      from_field: lname
      to_field: last_name
    
  • Remove credit_card field:

    uses: remove_field
    with:
      field: credit_card
    

For a full list of supported block types see reference.

Expression Language

DataYoga supports both SQL and JMESPath expressions. JMESPath are especially useful to handle nested JSON data, while SQL is more suited to flat row-like structures.

JMESPath Custom Functions

DataYoga adds the following custom functions to the standard JMESPath library:

Function Description Example Comments
capitalize Capitalizes all the words in the string Input: {name: "john doe"}
Expression: capitalize(name)
Output: John Doe
concat Concatenates an array of variables or literals Input: {fname: "john", lname: "doe"}
Expression: concat([fname,' ',lname])
Output: john doe
This is equivalent to the more verbose built-in expression: ' '.join([fname,lname]).
lower Converts all uppercase characters in a string into lowercase characters Input: {fname: "John"}
Expression: lower(fname)
Output: john
upper Converts all lowercase characters in a string into uppercase characters Input: {fname: "john"}
Expression: upper(fname)
Output: JOHN

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datayoga-0.15.0.tar.gz (13.8 kB view hashes)

Uploaded Source

Built Distribution

datayoga-0.15.0-py3-none-any.whl (19.1 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page