Programming language to create data manipulation pipelines.
Project description
Plai
Plai is a domain specific programming language(DSL) to create data manipulation pipelines with focus on data treatment, validation and easier syntax. It uses pandas as data manipulation engine so it is meant to work with small data.
Examples
Example of pipeline with basic data manipulation using Plai:
df = read_file('issues.csv')
pipeline(df) as 'gh_pct_issues_by_language.csv':
$.groupby(.name, as_index=False).sum()
(.count/.count.sum()) * 100 as pct
{.name, .count, .pct}
To create validations for the dataframes being manipulated you can define dictionaries mapping each column to a specific type, and apply that to a dataframe or pipeline. When applied to the dataframe it will validate its schema accordingly to the defined on the dictionary, that is, it will check data type and column presence. For the pipeline, the result dataframe will be validated. The following snippet is an example of implementation:
input_type = {
'name': 'str',
'year': 'int',
'quarter': 'int',
'count': 'int'
}
output_type = {
'name': 'str',
'count': 'int',
'pct': 'float'
}
input_type::df = read_file('issues.csv')
output_type::pipeline(df) as 'gh_pct_issues_by_language.csv':
$.groupby(.name, as_index=False).sum()
(.count/.count.sum()) * 100 as pct
{.name, .count, .pct}
Development
- Install the dependencies by running the command on the root folder of the project:
pip install -r requirements-dev.txt
- To run all the tests execute:
pytest tests
To run a specific test execute:
# For a specific test file
pytest tests/test_grammar.py
# For a specific test class
pytest tests/test_grammar.py::TestBasicTokens
# For a specific tests method
pytest tests/test_grammar.py::TestBasicTokens::test_token_number
- To run the interactive terminal execute on the root folder:
python -m plai
- To execute the code from a file:
python -m plai file.plai
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file plai-0.1.tar.gz
.
File metadata
- Download URL: plai-0.1.tar.gz
- Upload date:
- Size: 15.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.10.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 825541da7fb91badc09ab9fa76974bae51e265fe2019aac4afcfb2b5f61c0618 |
|
MD5 | cd15b480e3228c5cbd94be2de369618f |
|
BLAKE2b-256 | 58753e6a19ae86894bcf353d64c5a20713753f04dc9f41f75a44142c09b6df88 |
File details
Details for the file plai-0.1-py3-none-any.whl
.
File metadata
- Download URL: plai-0.1-py3-none-any.whl
- Upload date:
- Size: 18.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.10.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1b5797246900feb72a8d9d3df241b02c449450d6c52b2caa16615c541a000f8e |
|
MD5 | 01bc8427305c1ad9a241ffb2620ec23e |
|
BLAKE2b-256 | 3f1c724d64a64fbc86d64284c0b1ead829e120dc8948d60fe0b9fd916c84b00e |