Skip to main content

DataYoga for Python

Project description

DataYoga Core

Introduction

datayoga-core is the transformation engine used in DataYoga, a framework for building and generating data pipelines.

Installation

pip install datayoga-core

Quick Start

This demonstrates how to transform data using a DataYoga job.

Create a Job

Use this example.dy.yaml:

steps:
  - uses: add_field
    with:
      fields:
        - field: full_name
          language: jmespath
          expression: concat([fname, ' ' , lname])
        - field: country
          language: sql
          expression: country_code || ' - ' || UPPER(country_name)
  - uses: rename_field
    with:
      fields:
        - from_field: fname
          to_field: first_name
        - from_field: lname
          to_field: last_name
  - uses: remove_field
    with:
      fields:
        - field: credit_card
        - field: country_name
        - field: country_code
  - uses: map
    with:
      expression:
        {
          first_name: first_name,
          last_name: last_name,
          greeting: "'Hello ' || CASE WHEN gender = 'F' THEN 'Ms.' WHEN gender = 'M' THEN 'Mr.' ELSE 'N/A' END || ' ' || full_name",
          country: country,
          full_name: full_name
        }
      language: sql

Transform Data Using datayoga-core

Use this code snippet to transform a data record using the job defined above. The transform method returns a tuple of processed, filtered, and rejected records:

import datayoga_core as dy
from datayoga_core.job import Job
from datayoga_core.result import Result, Status
from datayoga_core.utils import read_yaml

job_settings = read_yaml("example.dy.yaml")
job = dy.compile(job_settings)

assert job.transform([{"fname": "jane", "lname": "smith", "country_code": 1, "country_name": "usa", "credit_card": "1234-5678-0000-9999", "gender": "F"}]).processed == [
  Result(status=Status.SUCCESS, payload={"first_name": "jane", "last_name": "smith", "country": "1 - USA", "full_name": "jane smith", "greeting": "Hello Ms. jane smith"})]

The job can also be provided as a parsed json inline:

import datayoga_core as dy
from datayoga_core.job import Job
from datayoga_core.result import Result, Status
import yaml
import textwrap

job_settings = textwrap.dedent("""
  steps:
    - uses: add_field
      with:
        fields:
          - field: full_name
            language: jmespath
            expression: concat([fname, ' ' , lname])
          - field: country
            language: sql
            expression: country_code || ' - ' || UPPER(country_name)
    - uses: rename_field
      with:
        fields:
          - from_field: fname
            to_field: first_name
          - from_field: lname
            to_field: last_name
    - uses: remove_field
      with:
        fields:
          - field: credit_card
          - field: country_name
          - field: country_code
    - uses: map
      with:
        expression:
          {
            first_name: first_name,
            last_name: last_name,
            greeting: "'Hello ' || CASE WHEN gender = 'F' THEN 'Ms.' WHEN gender = 'M' THEN 'Mr.' ELSE 'N/A' END || ' ' || full_name",
            country: country,
            full_name: full_name
          }
        language: sql
""")
job = dy.compile(yaml.safe_load(job_settings))

assert job.transform([{"fname": "jane", "lname": "smith", "country_code": 1, "country_name": "usa", "credit_card": "1234-5678-0000-9999", "gender": "F"}]).processed == [
  Result(status=Status.SUCCESS, payload={"first_name": "jane", "last_name": "smith", "country": "1 - USA", "full_name": "jane smith", "greeting": "Hello Ms. jane smith"})]

As can be seen, the record has been transformed based on the job:

  • fname field renamed to first_name.
  • lname field renamed to last_name.
  • country field added based on an SQL expression.
  • full_name field added based on a JMESPath expression.
  • greeting field added based on an SQL expression.

Examples

  • Add a new field country out of an SQL expression that concatenates country_code and country_name fields after upper case the later:

    uses: add_field
    with:
      field: country
      language: sql
      expression: country_code || ' - ' || UPPER(country_name)
    
  • Rename fname field to first_name and lname field to last_name:

    uses: rename_field
    with:
      fields:
        - from_field: fname
          to_field: first_name
        - from_field: lname
          to_field: last_name
    
  • Remove credit_card field:

    uses: remove_field
    with:
      field: credit_card
    

For a full list of supported block types see reference.

Expression Language

DataYoga supports both SQL and JMESPath expressions. JMESPath are especially useful to handle nested JSON data, while SQL is more suited to flat row-like structures.

For more information about custom functions and supported expression language syntax see reference.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datayoga_core-1.117.0.tar.gz (44.8 kB view details)

Uploaded Source

Built Distribution

datayoga_core-1.117.0-py3-none-any.whl (78.7 kB view details)

Uploaded Python 3

File details

Details for the file datayoga_core-1.117.0.tar.gz.

File metadata

  • Download URL: datayoga_core-1.117.0.tar.gz
  • Upload date:
  • Size: 44.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.9.19

File hashes

Hashes for datayoga_core-1.117.0.tar.gz
Algorithm Hash digest
SHA256 8f103d2c1e6501e7b353327eabf10f822ec52d2936853c1c2e3f83bf0ad3d33c
MD5 a24337b472b4d832a304e8108079e025
BLAKE2b-256 ffd19365fade4c21d52b744b4de36eddf03abcdc4f09afd7c12af878fdf815f4

See more details on using hashes here.

File details

Details for the file datayoga_core-1.117.0-py3-none-any.whl.

File metadata

File hashes

Hashes for datayoga_core-1.117.0-py3-none-any.whl
Algorithm Hash digest
SHA256 cd040d76c7a4a562d2b9fcb9651d990bc84b2d73a0de27c867d699833bb20fab
MD5 8a5a663a2a4a95c34bab932143049cbf
BLAKE2b-256 8c3df2d12e264f0f14eb88ddc89189ce11184d30b9d9cb1f1d94857b430f584e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page