Skip to main content

Easy pipelines for pandas.

Project description

PyPI-Status PePy stats PyPI-Versions Build-Status Codecov Codefactor code quality LICENCE

Easy pipelines for pandas DataFrames (learn how!).

Website: https://pdpipe.github.io/pdpipe/

Documentation: https://pdpipe.github.io/pdpipe/doc/pdpipe/

>>> df = pd.DataFrame(
        data=[[4, 165, 'USA'], [2, 180, 'UK'], [2, 170, 'Greece']],
        index=['Dana', 'Jane', 'Nick'],
        columns=['Medals', 'Height', 'Born']
    )
>>> import pdpipe as pdp
>>> pipeline = pdp.ColDrop('Medals').OneHotEncode('Born')
>>> pipeline(df)
            Height  Born_UK  Born_USA
    Dana     165        0         1
    Jane     180        1         0
    Nick     170        0         0

1 Documentation

This is the repository of the pdpipe package. This readme is aimed to help potential contributors to the project.

To learn more about how to use pdpipe, either visit pdpipe’s homepage or read the online documentation of pdpipe.

2 Installation

Install pdpipe with:

pip install pdpipe

Some pipeline stages require scikit-learn; they will simply not be loaded if scikit-learn is not found on the system, and pdpipe will issue a warning. To use them you must also install scikit-learn.

Similarly, some pipeline stages require nltk; they will simply not be loaded if nltk is not found on your system, and pdpipe will issue a warning. To use them you must additionally install nltk.

3 Contributing

Package author and current maintainer is Shay Palachy (shay.palachy@gmail.com); You are more than welcome to approach him for help. Contributions are very welcomed, especially since this package is very much in its infancy and many other pipeline stages can be added. Intuit are nice.

3.1 Installing for development

Clone:

git clone git@github.com:shaypal5/pdpipe.git

Install in development mode with test dependencies:

cd pdpipe
pip install -e ".[test]"

3.2 Running the tests

To run the tests, use:

python -m pytest --cov=pdpipe

3.3 Adding documentation

This project is documented using the numpy docstring conventions, which were chosen as they are perhaps the most widely-spread conventions that are both supported by common tools such as Sphinx and result in human-readable docstrings (in my personal opinion, of course). When documenting code you add to this project, please follow these conventions.

Additionally, if you update this README.rst file, use python setup.py checkdocs to validate it compiles.

3.4 Adding doctests

Please notice that for pdoc3 - the used the generate documentation for pdpipe - to successfully include doctest in the generated documentation, the whole doctest must be indented in relation to the opening multi-string indentation:

class ApplyByCols(PdPipelineStage):
    """A pipeline stage applying an element-wise function to columns.

    Parameters
    ----------
    columns : str or list-like
        Names of columns on which to apply the given function.
    func : function
        The function to be applied to each element of the given columns.
    result_columns : str or list-like, default None
        The names of the new columns resulting from the mapping operation. Must
        be of the same length as columns. If None, behavior depends on the
        drop parameter: If drop is True, the name of the source column is used;
        otherwise, the name of the source column is used with the suffix
        '_app'.
    drop : bool, default True
        If set to True, source columns are dropped after being mapped.
    func_desc : str, default None
        A function description of the given function; e.g. 'normalizing revenue
        by company size'. A default description is used if None is given.


    Example
    -------
        >>> import pandas as pd; import pdpipe as pdp; import math;
        >>> data = [[3.2, "acd"], [7.2, "alk"], [12.1, "alk"]]
        >>> df = pd.DataFrame(data, [1,2,3], ["ph","lbl"])
        >>> round_ph = pdp.ApplyByCols("ph", math.ceil)
        >>> round_ph(df)
           ph  lbl
        1   4  acd
        2   8  alk
        3  13  alk
    """

4 Credits

Created by Shay Palachy (shay.palachy@gmail.com).

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdpipe-0.0.35.tar.gz (42.1 kB view details)

Uploaded Source

Built Distribution

pdpipe-0.0.35-py3-none-any.whl (28.6 kB view details)

Uploaded Python 3

File details

Details for the file pdpipe-0.0.35.tar.gz.

File metadata

  • Download URL: pdpipe-0.0.35.tar.gz
  • Upload date:
  • Size: 42.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.40.2 CPython/3.7.5

File hashes

Hashes for pdpipe-0.0.35.tar.gz
Algorithm Hash digest
SHA256 b2ae6b452863ec829ed1c77bcf4b7a5c33d7decc3600cac01edbe494b25046ee
MD5 096b635d56b1dfd8ca499f6848e19878
BLAKE2b-256 706055b173f743fb573d81fe24a31910233a07e807450b52d865b2e379939b56

See more details on using hashes here.

File details

Details for the file pdpipe-0.0.35-py3-none-any.whl.

File metadata

  • Download URL: pdpipe-0.0.35-py3-none-any.whl
  • Upload date:
  • Size: 28.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.40.2 CPython/3.7.5

File hashes

Hashes for pdpipe-0.0.35-py3-none-any.whl
Algorithm Hash digest
SHA256 52f173a688dd65b3241e71bbfa4d11f4199525bf86b77f90bc9285e253c5e174
MD5 5197e7ce90a5d8f3bf48d6fe5b6fabf1
BLAKE2b-256 1e20c5f0a06a88150b4a9d04ebca9f7e67d61c368741f199fbb0dc6c731e3a84

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page