Skip to main content

Simple, composable selectors for loc[], iloc[], assign() and others for fluent-API style Pandas code.

Project description

Pandas Paddles

Access the calling pandas data frame in loc[], iloc[], assign() and other methods with DF to write better chains of data frame operations, e.g.:

df = (df
      # Select all rows with column "x" < 2
      .loc[DF["x"] < 2]
      .assign(
          # Shift "x" by its minimum.
          y = DF["x"] - DF["x"].min(),
          # Clip "x" to it's central 50% window. Note how DF is used
          # in the argument to `clip()`.
          z = DF["x"].clip(
              lower=DF["x"].quantile(0.25),
              upper=DF["x"].quantile(0.75)
          ),
      )
     )
Documentation Status Test Status Latest version Supported Python versions PyPI downloads

Overview

  • Motivation: Make chaining Pandas operations easier and bring functionality to Pandas similar to Spark’s col() function or referencing columns in R’s dplyr.

  • Install from PyPI with pip install pandas-paddles. Pandas versions 1+ (>=1,<3) are supported.

  • Documentation can be found at readthedocs.

  • Source code can be obtained from GitHub.

  • Changelog

Example: Create new column and filter

Instead of writing “traditional” Pandas like this:

df_in = pd.DataFrame({"x": range(5)})
df = df_in.assign(y = df_in["x"] // 2)
df = df.loc[df["y"] <= 1]
df
#    x  y
# 0  0  0
# 1  1  0
# 2  2  1
# 3  3  1

One can write:

from pandas_paddles import DF
df = (df_in
      .assign(y = DF["x"] // 2)
      .loc[DF["y"] <= 1]
     )

This is especially handy when re-iterating on data frame manipulations interactively, e.g. in a notebook (just imagine you have to rename df to df_out).

But you can access all methods and attributes of the data frame from the context:

df = pd.DataFrame({
    "X": range(5),
    "y": ["1", "a", "c", "D", "e"],
})
df.loc[DF["y"]str.isupper() | DF["y"]str.isnumeric()]
#    X  y
# 0  0  1
# 3  3  D
df.loc[:, DF.columns.str.isupper()]
#    X
# 0  0
# 1  1
# 2  2
# 3  3
# 4  4

You can even use DF in the arguments to methods:

df = pd.DataFrame({
    "x": range(5),
    "y": range(2, 7),
})
df.assign(z = DF['x'].clip(lower=2.2, upper=DF['y'].median()))
#    x  y    z
# 0  0  2  2.2
# 1  1  3  2.2
# 2  2  4  2.2
# 3  3  5  3.0
# 4  4  6  4.0

When working with ~pd.Series the S object exists. It can be used similar to DF:

s = pd.Series(range(5))
s[s < 3]
# 0    0
# 1    1
# 2    2
# dtype: int64

Similar projects for pandas

  • siuba

    • (+) active

    • (-) new API to learn

  • pandas-ply

    • (-) stale(?), last change 6 years ago

    • (-) new API to learn

    • (-) Symbol / pandas_ply.X works only with ply_* functions

  • pandas-select

    • (+) no explicite df necessary

    • (-) new API to learn

  • pandas-selectable

    • (+) simple select accessor

    • (-) usage inside chains clumsy (needs explicite df):

      ((df
        .select.A == 'a')
        .select.B == 'b'
      )
    • (-) hard-coded str, dt accessor methods

    • (?) composable?

Development

Development is containerized with [Docker](https://www.docker.com/) to separte from host systems and improve reproducability. No other prerequisites are needed on the host system.

Recommendation for Windows users: install WSL 2 (tested on Ubuntu 20.04), and for containerized workflows, Docker Desktop for Windows.

The common tasks are collected in Makefile (See make help for a complete list):

  • Run the unit tests: make test or make watch for continuously running tests on code-changes.

  • Build the documentation: make docs

  • TODO: Update the poetry.lock file: make lock

  • Add a dependency:

    1. Start a shell in a new container.

    2. Add dependency with poetry add in the running container. This will update poetry.lock automatically:

      # 1. On the host system
      % make shell
      # 2. In the container instance:
      I have no name!@7d0e85b3a303:/app$ poetry add --dev --lock falcon
  • Build the development image make devimage (Note: This should be done automatically for the targets.)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pandas_paddles-1.5.0.tar.gz (21.0 kB view details)

Uploaded Source

Built Distribution

pandas_paddles-1.5.0-py3-none-any.whl (22.0 kB view details)

Uploaded Python 3

File details

Details for the file pandas_paddles-1.5.0.tar.gz.

File metadata

  • Download URL: pandas_paddles-1.5.0.tar.gz
  • Upload date:
  • Size: 21.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.1 CPython/3.11.2 Linux/6.1.0-18-amd64

File hashes

Hashes for pandas_paddles-1.5.0.tar.gz
Algorithm Hash digest
SHA256 cb3d7797d3dc502e357a6b039c008ab55c112dab11460b8c6f5c233738baeb0c
MD5 b336f9786d46c265b1c631cba63e3329
BLAKE2b-256 df9fb0dcfbd88f89f8fa877660f6771acfec5b09e6b74ffdb7ed69f16f0671c1

See more details on using hashes here.

File details

Details for the file pandas_paddles-1.5.0-py3-none-any.whl.

File metadata

  • Download URL: pandas_paddles-1.5.0-py3-none-any.whl
  • Upload date:
  • Size: 22.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.1 CPython/3.11.2 Linux/6.1.0-18-amd64

File hashes

Hashes for pandas_paddles-1.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9a2596c8e3f9979f42190ac50dd94b14b6b04b198e28838c3b16959ad8a5e8d4
MD5 3d996a4da4889b136d54829c6a56c4d0
BLAKE2b-256 f3ac51999a85e77f097756c9327b581dfdd451d88a166ccca06e837d03824c5b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page