phenopipe

No project description provided

These details have not been verified by PyPI

Project description

Phenopipe

Python Functions for Phenotyping and Analysis

Phenopipe is a Python library to automate phenotyping and downstream analysis. Its main development target is All of Us research platform. Phenopipe is heavily inspired by and borrowed query definitions from https://github.com/annisjs/aou.phenotyper2 and https://github.com/annisjs/aou.reader

Tasks

The basic building block in phenopipe library is a task which is a class from tasks module. A task represents a single step in phenotyping and analysis. Its inputs are polars dataframes and it has a single polars dataframe output. Therefore tasks' outputs can be passed to other tasks. This can be automated using input_tasks attribute in task class. Each task in input_tasks will be completed prior to task completion and those outputs will be added to inputs. Each task need to have a complete method which contains the logic of the data retrieval and/or analysis step and completion decorator automates common operations prior and after task is completed.

Data Queries

A common task type is getting the data from the database. These tasks are collected under get_data module in tasks module. These tasks contains additional attributes such as caching and lazy dataframe evaulation. Data queries are planned by tasks but it is run inside query connection objects which are provided in query_connections module. The goal is to allow running the same data tasks on different platform by simply changing the query connection in enviroment variables. This is partially achieved for other platforms designed around OMOP Data Model like AOU, however AOU specific data structures does not fully allow this. To track if a task is compatible (or tested) with a platform state attribute is used. This attribute holds a key value pair where keys are shared with query platform attribute of query connections and values indicate if task is compatible. Value can be one of the followings:

incompatible: Indicates the task is known to be incompatible with the platform. parsed: Indicates the task is parsed from another library and it is not yet tested. untested: Indicates the task in not tested on this platform. unverified: Indicates the resulting data is not yet verified. tested: Indicates the task is tested on this platform verified: Indicates the task is verified on this platform which means the resulting data is used in an analysis and didn't show any inconsistencies.

Environment Variables

env_vars attribute in task object holds variables that is shared between different tasks. It is also used to share common variables between tasks in a analysis. An example variable is query connection stored in env_vars to delegate communication with database.

Plan

A Pipe object holds a phenotyping and analysis plan as a dictionary of tasks and a env_vars attribute. Pipe object has a run method which will complete each task and merge each result on its anchor. Only tasks outputs without any anchor is saved in outputs dictionary.

Inputs

A task can accept other tasks or their outputs as inputs. Each task may have a minimal input schema which describes the minimal column names and data types in order to task to run succesfully. Similarly every task has a minimal output schema which describes the minimal column names and data types in the output dataframe so any task can determine if it accepts the task as input. All input schemas and output schema are validated during task completion.

Anchor Input

Anchor keyword in inputs dictionary is reserved for a data frame defining a selection criteria of the output. This can be described using anchor_date, anchor_range, anchor_pid attributes in the task object. Anchor range list of two literal which can be column names in anchor input dataframe or integers determining the time window for selection around anchor_date_col column in anchor dataframe. Anchor pid is the name of the column of person ids in the anchor dataframe to be used during subsetting.

Data Aggregate

Any task can indicate an aggregate function that will be run after or alongside the anchoring. This can be first, last, closest:nearest, closest:forward, closest:backward. For closest aggregate an anchor needs to be given. Ties are broken randomly but consistently.

Templating

Phenopipe provides a templating structure to define a Pipe object using yaml files (or strings or dictionaries in the same format). The function build_pipe_from_yaml will accept the file name for a yaml. The pipe object obtained using example below will collect initial hypertension diagnosis where there is a heart failure hospitalization in one year window before or after and return with the first heart failure hospitalization date in that window. Each task is given as a absolute import import such as phenopipe.tasks.get_data.hospitalization.FirstHfHospitalizationData or commonly used modules can be described using modules keyword and relative import can be given such as modules.phenotype.HypertensionPt for convenience. Query connection will be translated as the camelcase class of the underscored name given in the template. All parameters under the task id will be passed into task init method. The inputs of a task can be other tasks in the plan given by using the identifier.

target: examples
cache: false
lazy: false
env_vars:
  query_conn: big_query_connection
modules:
  phenotype: phenopipe.tasks.get_data.phenotype
tasks:
  hypertension:
    task_name: modules.phenotype.HypertensionPt
    cache_type: std
  first_hf_hospitalization:
    task_name: phenopipe.tasks.get_data.hospitalization.FirstHfHospitalizationData
    cache_type: std
    inputs:
      anchor: hypertension
    anchor_range: [-365, 365]

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.8.0

Apr 30, 2026

This version

0.7.0

Feb 20, 2026

0.6.0a1 pre-release

Aug 26, 2025

0.5.0

Aug 6, 2025

0.4.0a1 pre-release

Aug 2, 2025

0.3.0a1 pre-release

Jul 1, 2025

0.2.0

Jun 20, 2025

0.1.0

Jun 18, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

phenopipe-0.7.0.tar.gz (74.5 kB view details)

Uploaded Feb 20, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

phenopipe-0.7.0-py3-none-any.whl (193.5 kB view details)

Uploaded Feb 20, 2026 Python 3

File details

Details for the file phenopipe-0.7.0.tar.gz.

File metadata

Download URL: phenopipe-0.7.0.tar.gz
Upload date: Feb 20, 2026
Size: 74.5 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for phenopipe-0.7.0.tar.gz
Algorithm	Hash digest
SHA256	`226019309fa7cc18869308730fe7bbc657551e53baa17d7d5a923f3e12e16621`
MD5	`04e9e6f6c2e46317ef4fd565e0160918`
BLAKE2b-256	`1436388d04b1067163f4e57907640e089b5e9012ec389106434930b7a8987d35`

See more details on using hashes here.

Provenance

The following attestation bundles were made for phenopipe-0.7.0.tar.gz:

Publisher: python-publish.yml on cakarac/phenopipe

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: phenopipe-0.7.0.tar.gz
- Subject digest: 226019309fa7cc18869308730fe7bbc657551e53baa17d7d5a923f3e12e16621
- Sigstore transparency entry: 974998392
- Sigstore integration time: Feb 20, 2026
Source repository:
- Permalink: cakarac/phenopipe@a66054745be20c702414817dc329693f917800d1
- Branch / Tag: refs/tags/0.7.0
- Owner: https://github.com/cakarac
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@a66054745be20c702414817dc329693f917800d1
- Trigger Event: push

File details

Details for the file phenopipe-0.7.0-py3-none-any.whl.

File metadata

Download URL: phenopipe-0.7.0-py3-none-any.whl
Upload date: Feb 20, 2026
Size: 193.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for phenopipe-0.7.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`278da5a78f64232d9e1896fa80baa9eccbcc90acc85342ddfb1c063f75818b78`
MD5	`c4a74edb1e2744dd1e8fb1d7299f8f44`
BLAKE2b-256	`e498620ec00c27deb240719414d75c7767d84d4161adf2e793c3e68067d1bb61`

See more details on using hashes here.

Provenance

The following attestation bundles were made for phenopipe-0.7.0-py3-none-any.whl:

Publisher: python-publish.yml on cakarac/phenopipe

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: phenopipe-0.7.0-py3-none-any.whl
- Subject digest: 278da5a78f64232d9e1896fa80baa9eccbcc90acc85342ddfb1c063f75818b78
- Sigstore transparency entry: 974998438
- Sigstore integration time: Feb 20, 2026
Source repository:
- Permalink: cakarac/phenopipe@a66054745be20c702414817dc329693f917800d1
- Branch / Tag: refs/tags/0.7.0
- Owner: https://github.com/cakarac
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@a66054745be20c702414817dc329693f917800d1
- Trigger Event: push

phenopipe 0.7.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

Phenopipe

Tasks

Data Queries

Environment Variables

Plan

Inputs

Anchor Input

Data Aggregate

Templating

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance