DataYoga for Python

These details have not been verified by PyPI

Project links

Project description

DataYoga Core

Introduction

datayoga-core is the transformation engine used in DataYoga, a framework for building and generating data pipelines.

Installation

pip install datayoga-core

Quick Start

This demonstrates how to transform data using a DataYoga job.

Create a Job

Use this example.yaml:

- steps:
    - uses: add_field
      with:
        fields:
          - field: full_name
            language: jmespath
            expression: concat([fname, ' ' , lname])
          - field: country
            language: sql
            expression: country_code || ' - ' || UPPER(country_name)
    - uses: rename_field
      with:
        fields:
          - from_field: fname
            to_field: first_name
          - from_field: lname
            to_field: last_name
    - uses: remove_field
      with:
        fields:
          - field: credit_card
          - field: country_name
          - field: country_code
    - uses: map
      with:
        expression:
          {
            first_name: first_name,
            last_name: last_name,
            greeting: "'Hello ' || CASE WHEN gender = 'F' THEN 'Ms.' WHEN gender = 'M' THEN 'Mr.' ELSE 'N/A' END || ' ' || full_name",
            country: country,
            full_name: full_name,
          }
        language: sql

Transform Data Using `datayoga-core`

Use this code snippet to transform a data record using the job defined above:

import datayoga_core as dy
from datayoga_core.job import Job
from datayoga_core.utils import read_yaml

job_settings = read_yaml("example.yaml")
job = dy.compile(job_settings)

assert job.transform({"fname": "jane", "lname": "smith", "country_code": 1, "country_name": "usa", "credit_card": "1234-5678-0000-9999", "gender": "F"}) == {"first_name": "jane", "last_name": "smith", "country": "1 - USA", "full_name": "jane smith", "greeting": "Hello Ms. jane smith"}

As can be seen, the record has been transformed based on the job:

fname field renamed to first_name.
lname field renamed to last_name.
country field added based on an SQL expression.
full_name field added based on a JMESPath expression.
greeting field added based on an SQL expression.

Examples

Add a new field country out of an SQL expression that concatenates country_code and country_name fields after upper case the later:

uses: add_field
with:
  field: country
  language: sql
  expression: country_code || ' - ' || UPPER(country_name)

Rename fname field to first_name and lname field to last_name:

uses: rename_field
with:
  fields:
    - from_field: fname
      to_field: first_name
    - from_field: lname
      to_field: last_name

Remove credit_card field:

uses: remove_field
with:
  field: credit_card

For a full list of supported block types see reference.

Expression Language

DataYoga supports both SQL and JMESPath expressions. JMESPath are especially useful to handle nested JSON data, while SQL is more suited to flat row-like structures.

Notes

Dot notation in expression represents nesting fields in the object, for example name.first_name refers to { "name": { "first_name": "John" } }.
In order to refer to a field that contains a dot in its name, escape it, for example name\.first_name refers to { "name.first_name": "John" }.

JMESPath Custom Functions

DataYoga adds the following custom functions to the standard JMESPath library:

Function	Description	Example	Comments
`capitalize`	Capitalizes all the words in the string	Input: `{"name": "john doe"}` Expression: `capitalize(name)` Output: `John Doe`
`concat`	Concatenates an array of variables or literals	Input: `{"fname": "john", "lname": "doe"}` Expression: `concat([fname, ' ' ,lname])` Output: `john doe`	This is equivalent to the more verbose built-in expression: `' '.join([fname,lname])`
`hash`	Calculates a hash using the `hash_name` hash function and returns its hexadecimal representation	Input: `{"some_str": "some_value"}` Expression: `hash(some_str, `sha1`)` Output: `8c818171573b03feeae08b0b4ffeb6999e3afc05`	Supported algorithms: sha1 (default), sha256, md5, sha384, sha3_384, blake2b, sha512, sha3_224, sha224, sha3_256, sha3_512, blake2s
`left`	Returns a specified number of characters from the start of a given text string	Input: `{"greeting": "hello world!"}` Expression: left(greeting, `5`) Output: `hello`
`lower`	Converts all uppercase characters in a string into lowercase characters	Input: `{"fname": "John"}` Expression: `lower(fname)` Output: `john`
`mid`	Returns a specified number of characters from the middle of a given text string	Input: `{"greeting": "hello world!"}` Expression: mid(greeting, `4`, `3`) Output: `o w`
`replace`	Replaces all the occurrences of a substring with a new one	Input: `{"sentence": "one four three four!"}` Expression: `replace(sentence, 'four', 'two')` Output: `one two three two!`
`right`	Returns a specified number of characters from the end of a given text string	Input: `{"greeting": "hello world!"}` Expression: right(greeting, `6`) Output: `world!`
`split`	Splits a string into a list of strings after breaking the given string by the specified delimiter (comma by default)	Input: `{"departments": "finance,hr,r&d"}` Expression: `split(departments)` Output: `['finance', 'hr', 'r&d']`	Default delimiter is comma - a different delimiter can be passed to the function as the second argument, for example: `split(departments, ';')`
`time_delta_days`	Returns the number of days between a given `dt` and now (positive) or the number of days that have passed from now (negative)	Input: `{"dt": '2021-10-06T18:56:16.701670+00:00'}` Expression: `time_delta_days(dt)` Output: `365`	If `dt` is a string, ISO datetime (2011-11-04T00:05:23+04:00, for example) is assumed. If `dt` is a number, Unix timestamp (1320365123, for example) is assumed.
`time_delta_seconds`	Returns the number of seconds between a given `dt` and now (positive) or the number of seconds that have passed from now (negative)	Input: `{"dt": '2021-10-06T18:56:16.701670+00:00'}` Expression: `time_delta_days(dt)` Output: `31557600`	If `dt` is a string, ISO datetime (2011-11-04T00:05:23+04:00, for example) is assumed. If `dt` is a number, Unix timestamp (1320365123, for example) is assumed.
`upper`	Converts all lowercase characters in a string into uppercase characters	Input: `{"fname": "john"}` Expression: `upper(fname)` Output: `JOHN`
`uuid`	Generates a random UUID4 and returns it as a string in standard format	Input: None Expression: `uuid()` Output: `3264b35c-ff5d-44a8-8bc7-9be409dac2b7`

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.122.0

Sep 19, 2024

1.121.0

Aug 13, 2024

1.120.0

Aug 8, 2024

1.119.0

Aug 7, 2024

1.117.0

May 19, 2024

1.113.0

Mar 20, 2024

1.112.0

Mar 13, 2024

1.111.0

Mar 13, 2024

1.110.0

Mar 13, 2024

1.109.0

Mar 11, 2024

1.108.0

Nov 29, 2023

1.107.0

Nov 14, 2023

1.106.0

Nov 13, 2023

1.105.0

Nov 8, 2023

1.104.0

Nov 4, 2023

1.103.0

Nov 1, 2023

1.102.0

Nov 1, 2023

1.101.0

Oct 31, 2023

1.100.0

Oct 31, 2023

1.99.0

Oct 31, 2023

1.98.0

Oct 31, 2023

1.97.0

Oct 30, 2023

1.96.0

Oct 17, 2023

1.95.0

Sep 27, 2023

1.94.0

Sep 7, 2023

1.93.0

Sep 6, 2023

1.92.0

Sep 5, 2023

1.91.0

Aug 24, 2023

1.90.0

Jul 20, 2023

1.89.0

Jul 18, 2023

1.88.0

Jul 12, 2023

1.87.0

Jun 14, 2023

1.86.0

Jun 8, 2023

1.85.0

Jun 8, 2023

1.84.0

May 31, 2023

1.83.0

May 24, 2023

1.82.0

May 1, 2023

1.81.0

Apr 20, 2023

1.80.0

Apr 13, 2023

1.79.0

Apr 11, 2023

1.78.0

Apr 3, 2023

1.77.0

Apr 2, 2023

1.76.0

Apr 2, 2023

1.75.0

Apr 2, 2023

1.73.0

Mar 30, 2023

1.72.0

Mar 14, 2023

1.71.0

Mar 14, 2023

1.70.0

Mar 12, 2023

1.69.0

Mar 2, 2023

1.68.0

Feb 26, 2023

1.67.0

Feb 23, 2023

1.66.0

Feb 23, 2023

1.65.0

Feb 23, 2023

1.64.0

Feb 13, 2023

1.63.0

Feb 7, 2023

1.62.0

Feb 6, 2023

1.61.0

Feb 6, 2023

1.60.0

Feb 5, 2023

1.59.0

Feb 2, 2023

1.58.0

Feb 2, 2023

1.57.0

Jan 29, 2023

1.56.0

Jan 24, 2023

1.55.0

Jan 24, 2023

1.54.0

Jan 24, 2023

1.53.0

Jan 10, 2023

1.52.0

Jan 8, 2023

1.51.0

Jan 8, 2023

1.50.0

Jan 5, 2023

1.49.0

Jan 4, 2023

1.48.0

Jan 4, 2023

1.47.0

Jan 4, 2023

1.46.0

Jan 4, 2023

1.45.0

Jan 2, 2023

1.44.0

Jan 1, 2023

1.43.0

Jan 1, 2023

1.42.0

Jan 1, 2023

1.41.0

Jan 1, 2023

1.40.0

Jan 1, 2023

1.39.0

Dec 29, 2022

1.38.0

Dec 29, 2022

1.37.0

Dec 29, 2022

1.36.0

Dec 29, 2022

1.35.0

Dec 29, 2022

1.34.0

Dec 29, 2022

1.33.0

Dec 21, 2022

1.32.0

Dec 21, 2022

1.31.0

Dec 20, 2022

1.30.0

Dec 20, 2022

1.29.0

Dec 20, 2022

1.28.0

Dec 19, 2022

1.27.0

Dec 18, 2022

1.26.0

Dec 18, 2022

1.25.0

Dec 16, 2022

1.24.0

Dec 15, 2022

1.23.0

Dec 15, 2022

1.22.0

Dec 15, 2022

1.21.0

Dec 15, 2022

1.20.0

Dec 14, 2022

1.19.0

Dec 14, 2022

1.18.0

Dec 8, 2022

1.17.0

Dec 8, 2022

1.16.0

Dec 7, 2022

This version

1.15.0

Nov 29, 2022

1.14.0

Nov 28, 2022

1.13.0

Nov 26, 2022

1.12.0

Nov 25, 2022

1.11.0

Nov 24, 2022

1.10.0

Nov 24, 2022

1.9.0

Nov 23, 2022

1.8.0

Nov 22, 2022

1.7.0

Nov 22, 2022

1.6.0

Nov 21, 2022

1.5.0

Nov 16, 2022

1.4.0

Nov 15, 2022

1.3.0

Nov 8, 2022

1.2.0

Nov 8, 2022

1.1.0

Nov 8, 2022

1.0.1

Nov 7, 2022

1.0.1b12 pre-release

Nov 7, 2022

1.0.1b11 pre-release

Nov 7, 2022

1.0.1b10 pre-release

Nov 7, 2022

0.0.1

Nov 7, 2022

0.0.0

Nov 7, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datayoga_core-1.15.0.tar.gz (24.7 kB view details)

Uploaded Nov 29, 2022 Source

Built Distribution

datayoga_core-1.15.0-py3-none-any.whl (38.3 kB view details)

Uploaded Nov 29, 2022 Python 3

File details

Details for the file datayoga_core-1.15.0.tar.gz.

File metadata

Download URL: datayoga_core-1.15.0.tar.gz
Upload date: Nov 29, 2022
Size: 24.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.1 CPython/3.9.15

File hashes

Hashes for datayoga_core-1.15.0.tar.gz
Algorithm	Hash digest
SHA256	`e8cb0914b80fde9e3b0660e54180f0cff3bbad8533c3f9798d4d30ea2c4fcb22`
MD5	`3a7293581c133b81efeba124549db94b`
BLAKE2b-256	`8c381d466492102e6e6046db84b772d10d461c04b5d931ef6f577029048bba1b`

See more details on using hashes here.

File details

Details for the file datayoga_core-1.15.0-py3-none-any.whl.

File metadata

Download URL: datayoga_core-1.15.0-py3-none-any.whl
Upload date: Nov 29, 2022
Size: 38.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.1 CPython/3.9.15

File hashes

Hashes for datayoga_core-1.15.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`194ffa6d00d3c87892101bcd8363b768940680a0e0fc891ebb57a04031b3a7b6`
MD5	`73dfa1bea5fb1e6299bc5bd40cc3046d`
BLAKE2b-256	`d92a967e7fda9978a651b4057b63b5364f003b53b6bbfe4d0eb06b4c2ba06c32`

See more details on using hashes here.

datayoga-core 1.15.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

DataYoga Core

Introduction

Installation

Quick Start

Create a Job

Transform Data Using `datayoga-core`

Examples

Expression Language

Notes

JMESPath Custom Functions

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

datayoga-core 1.15.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

DataYoga Core

Introduction

Installation

Quick Start

Create a Job

Transform Data Using datayoga-core

Examples

Expression Language

Notes

JMESPath Custom Functions

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Transform Data Using `datayoga-core`