pdpipe

Easy pipelines for pandas.

These details have not been verified by PyPI

Project links

Homepage

Project description

Website: https://pdpipe.readthedocs.io/en/latest/

Easy pipelines for pandas DataFrames (learn how!).

>>> df = pd.DataFrame(
        data=[[4, 165, 'USA'], [2, 180, 'UK'], [2, 170, 'Greece']],
        index=['Dana', 'Jane', 'Nick'],
        columns=['Medals', 'Height', 'Born']
    )
>>> import pdpipe as pdp
>>> pipeline = pdp.ColDrop('Medals').OneHotEncode('Born')
>>> pipeline(df)
            Height  Born_UK  Born_USA
    Dana     165        0         1
    Jane     180        1         0
    Nick     170        0         0

1 📚 Documentation

This is the repository of the pdpipe package, and this readme file is aimed to help potential contributors to the project.

To learn more about how to use pdpipe, either visit pdpipe’s homepage or read the getting started section.

2 🔩 Installation

Install pdpipe with:

pip install pdpipe

Some pipeline stages require scikit-learn; they will simply not be loaded if scikit-learn is not found on the system, and pdpipe will issue a warning. To use them you must also install scikit-learn.

Similarly, some pipeline stages require nltk; they will simply not be loaded if nltk is not found on your system, and pdpipe will issue a warning. To use them you must additionally install nltk.

Package author and current maintainer is Shay Palachy (shay.palachy@gmail.com); You are more than welcome to approach him for help. Contributions are very welcomed, especially since this package is very much in its infancy and many other pipeline stages can be added.

🪛 Installing for development ————–=————–

Clone:

git clone git@github.com:pdpipe/pdpipe.git

Install in development mode with test dependencies:

cd pdpipe
pip install -e ".[test]"

3.1 ⚗️ Running the tests

To run the tests, use:

python -m pytest

Notice pytest runs are configured by the pytest.ini file. Read it to understand the exact pytest arguments used.

3.2 🔬 Adding tests

At the time of writing, pdpipe is maintained with a test coverage of 100%. Although challenging, I hope to maintain this status. If you add code to the package, please make sure you thoroughly test it. Codecov automatically reports changes in coverage on each PR, and so PR reducing test coverage will not be examined before that is fixed.

Tests reside under the tests directory in the root of the repository. Each module has a separate test folder, with each class - usually a pipeline stage - having a dedicated file (always starting with the string “test”) containing several tests (each a global function starting with the string “test”). Please adhere to this structure, and try to separate tests cases to different test functions; this allows us to quickly focus on problem areas and use cases. Thank you! :)

3.3 ⚙️ Configuration

pdpipe can be configured using both a configuration file - locaated at either $XDG_CONFIG_HOME/pdpipe/cfg.json or, if the XDG_CONFIG_HOME environment variable is not set, at ~/.pdpipe/cfg.json - and environment variables.

At the moment, these configuration options are only relevant for development. The available options are:

LOAD_STAGE_ATTRIBUTES - True by default. If set to False stage attributes, which enable the chainer construction pattern, e.g. pdp.ColDrop('b').Bin('f'), are not loaded. This is used for sensible documentation generation. Set with this "LOAD_STAGE_ATTRIBUTES": false in cfg.json, or with export PDPIPE__LOAD_STAGE_ATTRIBUTES=False for environment variable-driven configuration.

3.4 ✒️ Code style

pdpip code is written to adhere to the coding style dictated by flake8. Practically, this means that one of the jobs that runs on the project’s Travis for each commit and pull request checks for a successfull run of the flake8 CLI command in the repository’s root. Which means pull requests will be flagged red by the Travis bot if non-flake8-compliant code was added.

To solve this, please run flake8 on your code (whether through your text editor/IDE or using the command line) and fix all resulting errors. Thank you! :)

3.5 📓 Adding documentation

This project is documented using the numpy docstring conventions, which were chosen as they are perhaps the most widely-spread conventions that are both supported by common tools such as Sphinx and result in human-readable docstrings (in my personal opinion, of course). When documenting code you add to this project, please follow these conventions.

Additionally, if you update this README.rst file, use python setup.py checkdocs to validate it compiles.

3.6 📋 Adding doctests

Please notice that for pdoc3 - the Python package used to generate the html documentation files for pdpipe - to successfully include doctests in the generated documentation files, the whole doctest must be indented in relation to the opening multi-string indentation, like so:

class ApplyByCols(PdPipelineStage):
    """A pipeline stage applying an element-wise function to columns.

    Parameters
    ----------
    columns : str or list-like
        Names of columns on which to apply the given function.
    func : function
        The function to be applied to each element of the given columns.
    result_columns : str or list-like, default None
        The names of the new columns resulting from the mapping operation. Must
        be of the same length as columns. If None, behavior depends on the
        drop parameter: If drop is True, the name of the source column is used;
        otherwise, the name of the source column is used with the suffix
        '_app'.
    drop : bool, default True
        If set to True, source columns are dropped after being mapped.
    func_desc : str, default None
        A function description of the given function; e.g. 'normalizing revenue
        by company size'. A default description is used if None is given.


    Example
    -------
        >>> import pandas as pd; import pdpipe as pdp; import math;
        >>> data = [[3.2, "acd"], [7.2, "alk"], [12.1, "alk"]]
        >>> df = pd.DataFrame(data, [1,2,3], ["ph","lbl"])
        >>> round_ph = pdp.ApplyByCols("ph", math.ceil)
        >>> round_ph(df)
           ph  lbl
        1   4  acd
        2   8  alk
        3  13  alk
    """

4 💳 Credits

Created by Shay Palachy (shay.palachy@gmail.com).

4.1 🐞 Bugfixes & Documentation:

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

1.1.0

Jul 1, 2025

1.0.0

Jun 29, 2025

0.3.2

Sep 19, 2022

0.3.1

Aug 9, 2022

0.3.0

Jul 4, 2022

0.2.8

Jun 23, 2022

This version

0.2.7

Jun 22, 2022

0.2.6

Jun 22, 2022

0.2.5

Jun 7, 2022

0.2.4

May 12, 2022

0.2.3

Mar 13, 2022

0.2.2

Mar 10, 2022

0.2.1

Feb 23, 2022

0.2.0

Feb 14, 2022

0.1.6

Jan 30, 2022

0.1.5

Jan 29, 2022

0.1.4

Jan 29, 2022

0.1.3

Jan 26, 2022

0.1.2

Jan 23, 2022

0.1.0

Jan 23, 2022

0.0.72

Jan 19, 2022

0.0.71

Dec 26, 2021

0.0.70

Dec 19, 2021

0.0.69

Dec 10, 2021

0.0.68

Dec 8, 2021

0.0.67

Nov 15, 2021

0.0.66

Nov 8, 2021

0.0.65

Nov 8, 2021

0.0.64

Nov 8, 2021

0.0.63

Nov 3, 2021

0.0.62

Oct 27, 2021

0.0.61

Oct 27, 2021

0.0.60

Sep 29, 2021

0.0.59

Aug 30, 2021

0.0.58

Aug 28, 2021

0.0.57

Aug 25, 2021

0.0.56

Aug 18, 2021

0.0.55

Aug 18, 2021

0.0.54

Aug 18, 2021

0.0.53

Nov 9, 2020

0.0.52

Oct 30, 2020

0.0.51

Oct 1, 2020

0.0.50

Aug 27, 2020

0.0.49

May 5, 2020

0.0.48

May 5, 2020

0.0.46

Feb 26, 2020

0.0.45

Feb 24, 2020

0.0.44

Feb 24, 2020

0.0.43

Feb 17, 2020

0.0.42

Feb 5, 2020

0.0.41

Feb 3, 2020

0.0.40

Feb 3, 2020

0.0.39

Jan 26, 2020

0.0.38

Jan 20, 2020

0.0.37

Jan 7, 2020

0.0.35

Dec 21, 2019

0.0.33

Dec 7, 2019

0.0.32

Dec 3, 2019

0.0.31

Jun 27, 2019

0.0.30

Jun 14, 2019

0.0.29

Jun 14, 2019

0.0.27

May 28, 2018

0.0.26

May 28, 2018

0.0.25

May 9, 2018

0.0.24

May 2, 2018

0.0.23

Apr 22, 2018

0.0.22

Apr 16, 2018

0.0.21

Apr 16, 2018

0.0.20

Apr 8, 2018

0.0.19

Apr 8, 2018

0.0.18

Mar 20, 2018

0.0.17

Mar 12, 2018

0.0.16

Mar 11, 2018

0.0.15

Mar 7, 2018

0.0.14

Mar 7, 2018

0.0.13

Mar 3, 2018

0.0.12

Feb 12, 2018

0.0.11

Feb 12, 2018

0.0.10

Feb 12, 2018

0.0.9

Feb 5, 2018

0.0.8

Feb 5, 2018

0.0.7

Jan 30, 2018

0.0.6

Jan 14, 2018

0.0.5

May 24, 2017

0.0.4

May 5, 2017

0.0.3

May 5, 2017

0.0.2

Mar 17, 2017

0.0.1

Mar 16, 2017

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdpipe-0.2.7.tar.gz (725.4 kB view details)

Uploaded Jun 22, 2022 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pdpipe-0.2.7-py3-none-any.whl (110.1 kB view details)

Uploaded Jun 22, 2022 Python 3

File details

Details for the file pdpipe-0.2.7.tar.gz.

File metadata

Download URL: pdpipe-0.2.7.tar.gz
Upload date: Jun 22, 2022
Size: 725.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.62.3 importlib-metadata/4.11.0 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.10.2

File hashes

Hashes for pdpipe-0.2.7.tar.gz
Algorithm	Hash digest
SHA256	`0207de974fdecdc31a3903097da355dc41dc35d8a1c794613ab7534d09798b97`
MD5	`abd7e3033abe66a92e5b44b535116efe`
BLAKE2b-256	`c4f4adbabe8cd43775e8c7dd81327b58e3b3fec2b3f42f22566a0936f6b0d68d`

See more details on using hashes here.

File details

Details for the file pdpipe-0.2.7-py3-none-any.whl.

File metadata

Download URL: pdpipe-0.2.7-py3-none-any.whl
Upload date: Jun 22, 2022
Size: 110.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.62.3 importlib-metadata/4.11.0 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.10.2

File hashes

Hashes for pdpipe-0.2.7-py3-none-any.whl
Algorithm	Hash digest
SHA256	`1598b8c8a5649282c9bf7547b823681cc9be500b3c3c319d26f4b522ac6a3abb`
MD5	`09911f2e3755dfa842542849924b845e`
BLAKE2b-256	`704933f9bf3015e0162b3e1abec2cd0c1b13ef2926b753a4ea34f7dd788a4c46`

See more details on using hashes here.

pdpipe 0.2.7

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

1 📚 Documentation

2 🔩 Installation

3 🎁 Contributing

3.1 ⚗️ Running the tests

3.2 🔬 Adding tests

3.3 ⚙️ Configuration

3.4 ✒️ Code style

3.5 📓 Adding documentation

3.6 📋 Adding doctests

4 💳 Credits

4.1 🐞 Bugfixes & Documentation:

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes