Skip to main content

Declarative Data Orchestration

Project description

Project generated with PyScaffold PyPI-Server

awk_plus_plus

A language designed for data orchestration.

Features

  • Fuzzy regex engine and Semantic search to retrieve information in an in-process DB.
  • End-user programming.
  • Orthogonal Persistence based on DuckDB
  • Transparent reference with Jsonnet. We plan to execute this feature with Dask.
  • URL interpreter to manage data sources.

Installation from pip

Install the package with:

pip install awk_plus_plus

CLI Usage

You output your data to JSON with the cti command.

Web service

The command runs a web service with Gradio, allowing you to execute your expressions through a user-friendly user interface or by making HTTP requests.

cti run-webservice

Jsonnet support

Hello world

cti i "Hello world" -p -v 4

Jsonnet support

cti i '{"keys":: ["AWK", "SED", "SHELL"], "languages": [std.asciiLower(x) for x in self.keys]}'

URL interpreter

Our step further is the URL interpreter which allows you to manage different data sources with an unique syntax across a set of plugins.

STDIN, STDOUT, STDERR

cti i '{"lines": interpret("stream://stdin?strip=true")}'

Imap

cti i '{"emails": interpret("imap://USER:PASSWORD@HOST:993/INBOX")}'

Keyring

cti i '{"email":: interpret("keyring://backend/awk_plus_plus/email"), "emails": interpret($.email)}'

Files

cti i 'interpret("**/*.csv")'

SQL

cti i 'interpret("sql:SELECT * FROM email")'

Leverage the Power of Reference with Jsonnet

Unlike other programming languages that require multiple steps to reference data, Jsonnet requires only one step, thanks to its reference mechanism. This is particularly useful for data engineers who want to connect different services in a topological order. The code below represents this scenario in Python:

import requests

def fetch_character(id):
    url = f"https://rickandmortyapi.com/api/character/{id}"
    response = requests.get(url)
    return response.json()

def process_character(character):
    # Add new 'image' field with processed URL
    character['image'] += f"?awk_download=data/{character['name'].replace(' ', '_').lower()}.jpeg"
    
    # Process 'episode' field, fetching additional data if necessary
    character['episode'] = [requests.get(episode).json() for episode in character['episode']]
    
    return character


print([process_character(fetch_character(id)) for id in [1, 2, 3, 4, 5, 6]])

Contrary to the previous Python code, Jsonnet allows you to leverage the power of referential transparency. The previous code is equivalent in Jsonnet to:

[
   i("https://rickandmortyapi.com/api/character/%s" % id) + 
    {image: i(super.image+"?awk_download=data/"+std.strReplace(std.asciiLower(super.name), " ", "_")+".jpeg")} + 
    {episode: [i(episode) for episode in super.episode]}
   for id in [1,2,3,4,5,6]
]

Connect and call different data sources in one expression

{
   "emails": i("sql:SELECT subject FROM `%s`" %  self.email),
   // This expression saves the unseen emails from your inbox, as defined in your keyring, using IMAP query criteria. It then returns the netloc hash, which refers to the table.
   "email": i(i("keyring://backend/awk_plus_plus/primary_email")+"?q=UNSEEN")
}

Note

This project has been set up using PyScaffold 4.5 and the dsproject extension 0.0.post167+g4386552.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

awk_plus_plus-0.16.0.tar.gz (35.8 kB view details)

Uploaded Source

Built Distribution

awk_plus_plus-0.16.0-py3-none-any.whl (14.6 kB view details)

Uploaded Python 3

File details

Details for the file awk_plus_plus-0.16.0.tar.gz.

File metadata

  • Download URL: awk_plus_plus-0.16.0.tar.gz
  • Upload date:
  • Size: 35.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.9

File hashes

Hashes for awk_plus_plus-0.16.0.tar.gz
Algorithm Hash digest
SHA256 9c1007704f5bc8188f4d7b4477ac5ba528eaa03367bf6d0a7e26098a7279d14e
MD5 c210686603d942e7baf6311ca15ac5c5
BLAKE2b-256 84a539c2953972b72e40c0d3fafbd2332f297adb0f99826c047806bebe3fca6f

See more details on using hashes here.

File details

Details for the file awk_plus_plus-0.16.0-py3-none-any.whl.

File metadata

File hashes

Hashes for awk_plus_plus-0.16.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6f7754312af0cf48dcf063863bbd464f7246e10a57301d857ac5503d6f657e9a
MD5 bc33238f6ba2329bec1ecf0d40aaf8ec
BLAKE2b-256 ece4108e5fc7ea3249cf90434b3cdfbd723f30da9329991a561430212532e0a7

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page