Skip to main content

Programming language to create data manipulation pipelines.

Project description

Plai codecov

Plai is a domain specific programming language(DSL) to create data manipulation pipelines with focus on data treatment, validation and easier syntax. It uses pandas as data manipulation engine so it is meant to work with small data.

Examples

Example of pipeline with basic data manipulation using Plai:

df = read_file('issues.csv')

pipeline(df) as 'gh_pct_issues_by_language.csv':
    $.groupby(.name, as_index=False).sum()
    (.count/.count.sum()) * 100 as pct
    {.name, .count, .pct}

To create validations for the dataframes being manipulated you can define dictionaries mapping each column to a specific type, and apply that to a dataframe or pipeline. When applied to the dataframe it will validate its schema accordingly to the defined on the dictionary, that is, it will check data type and column presence. For the pipeline, the result dataframe will be validated. The following snippet is an example of implementation:

input_type = {
    'name': 'str',
    'year': 'int',
    'quarter': 'int',
    'count': 'int'
}

output_type = {
    'name': 'str',
    'count': 'int',
    'pct': 'float'
}

input_type::df = read_file('issues.csv')

output_type::pipeline(df) as 'gh_pct_issues_by_language.csv':
    $.groupby(.name, as_index=False).sum()
    (.count/.count.sum()) * 100 as pct
    {.name, .count, .pct}

Development

  1. Install the dependencies by running the command on the root folder of the project:
pip install -r requirements-dev.txt
  1. To run all the tests execute:
pytest tests

To run a specific test execute:

# For a specific test file
pytest tests/test_grammar.py

# For a specific test class
pytest tests/test_grammar.py::TestBasicTokens

# For a specific tests method
pytest tests/test_grammar.py::TestBasicTokens::test_token_number
  1. To run the interactive terminal execute on the root folder:
python -m plai
  1. To execute the code from a file:
python -m plai file.plai

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

plai-0.1.tar.gz (15.0 kB view details)

Uploaded Source

Built Distribution

plai-0.1-py3-none-any.whl (18.4 kB view details)

Uploaded Python 3

File details

Details for the file plai-0.1.tar.gz.

File metadata

  • Download URL: plai-0.1.tar.gz
  • Upload date:
  • Size: 15.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.10.2

File hashes

Hashes for plai-0.1.tar.gz
Algorithm Hash digest
SHA256 825541da7fb91badc09ab9fa76974bae51e265fe2019aac4afcfb2b5f61c0618
MD5 cd15b480e3228c5cbd94be2de369618f
BLAKE2b-256 58753e6a19ae86894bcf353d64c5a20713753f04dc9f41f75a44142c09b6df88

See more details on using hashes here.

File details

Details for the file plai-0.1-py3-none-any.whl.

File metadata

  • Download URL: plai-0.1-py3-none-any.whl
  • Upload date:
  • Size: 18.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.10.1 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.10.2

File hashes

Hashes for plai-0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 1b5797246900feb72a8d9d3df241b02c449450d6c52b2caa16615c541a000f8e
MD5 01bc8427305c1ad9a241ffb2620ec23e
BLAKE2b-256 3f1c724d64a64fbc86d64284c0b1ead829e120dc8948d60fe0b9fd916c84b00e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page