Declarative Data Orchestration
Project description
awk_plus_plus
A language designed for data orchestration.
Features
- Fuzzy regex engine and Semantic search to retrieve information in an in-process DB.
- End-user programming.
- Orthogonal Persistence based on DuckDB
- Transparent reference with Jsonnet. We plan to execute this feature with Dask.
- URL interpreter to manage data sources.
Installation from pip
Install the package with:
pip install awk_plus_plus
CLI Usage
You output your data to JSON with the cti
command.
Web service
The command runs a web service with Gradio, allowing you to execute your expressions through a user-friendly user interface or by making HTTP requests.
cti run-webservice
Jsonnet support
Hello world
cti i "Hello world" -p -v 4
Jsonnet support
cti i '{"keys":: ["AWK", "SED", "SHELL"], "languages": [std.asciiLower(x) for x in self.keys]}'
URL interpreter
Our step further is the URL interpreter which allows you to manage different data sources with an unique syntax across a set of plugins.
STDIN, STDOUT, STDERR
cti i '{"lines": interpret("stream://stdin?strip=true")}'
Imap
cti i '{"emails": interpret("imap://USER:PASSWORD@HOST:993/INBOX")}'
Keyring
cti i '{"email":: interpret("keyring://backend/awk_plus_plus/email"), "emails": interpret($.email)}'
Files
cti i 'interpret("**/*.csv")'
SQL
cti i 'interpret("sql:SELECT * FROM email")'
Leverage the Power of Reference with Jsonnet
Unlike other programming languages that require multiple steps to reference data, Jsonnet requires only one step, thanks to its reference mechanism. This is particularly useful for data engineers who want to connect different services in a topological order. The code below represents this scenario in Python:
import requests
def fetch_character(id):
url = f"https://rickandmortyapi.com/api/character/{id}"
response = requests.get(url)
return response.json()
def process_character(character):
# Add new 'image' field with processed URL
character['image'] += f"?awk_download=data/{character['name'].replace(' ', '_').lower()}.jpeg"
# Process 'episode' field, fetching additional data if necessary
character['episode'] = [requests.get(episode).json() for episode in character['episode']]
return character
print([process_character(fetch_character(id)) for id in [1, 2, 3, 4, 5, 6]])
Contrary to the previous Python code, Jsonnet allows you to leverage the power of referential transparency. The previous code is equivalent in Jsonnet to:
[
i("https://rickandmortyapi.com/api/character/%s" % id) +
{image: i(super.image+"?awk_download=data/"+std.strReplace(std.asciiLower(super.name), " ", "_")+".jpeg")} +
{episode: [i(episode) for episode in super.episode]}
for id in [1,2,3,4,5,6]
]
Connect and call different data sources in one expression
{
"emails": i("sql:SELECT subject FROM `%s`" % self.email),
// This expression saves the unseen emails from your inbox, as defined in your keyring, using IMAP query criteria. It then returns the netloc hash, which refers to the table.
"email": i(i("keyring://backend/awk_plus_plus/primary_email")+"?q=UNSEEN")
}
Note
This project has been set up using PyScaffold 4.5 and the dsproject extension 0.0.post167+g4386552.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file awk_plus_plus-0.15.0.tar.gz
.
File metadata
- Download URL: awk_plus_plus-0.15.0.tar.gz
- Upload date:
- Size: 35.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c53fb60c1cfb6cc3fa72264c76a95e88e26b2d4672e60f02f8511ba73c7f7158 |
|
MD5 | 71cee56fa1c732a34892a3e5686b7406 |
|
BLAKE2b-256 | 0494795ea789e625458a36f516f9a3dff60700dca2710bcbd41a207255dfce73 |
File details
Details for the file awk_plus_plus-0.15.0-py3-none-any.whl
.
File metadata
- Download URL: awk_plus_plus-0.15.0-py3-none-any.whl
- Upload date:
- Size: 14.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ea4ab6d00fc2b708dbd2fc507b428e562325a9e0337209d9b919f13ac6027bed |
|
MD5 | 812baee713c43ad72fafa1084e59aea5 |
|
BLAKE2b-256 | 744370bb352e7497ddbe53e3bfe63a77af692a9b693ab1f4b7f0c342bb63baab |