Alpha channel of features for PyTerrier

Project description

pyterrier-alpha

Alpha channel of features for PyTerrier.

Features in ths package are under development and intend to be merged with the main package or split into a separate package when stable.

Table of Contents

Getting Started
pta.validate
pta.DataFrameBuilder

Getting Started

pip install pyterrier-alpha

Import pyterrier_alpha alongside pyterrier:

import pyterrier as pt
import pyterrier_alpha as pta

pta.validate

It's a good idea to check the input to a transformer to make sure its compatible before you start using it. pta.validate provides functions for this.

def MyTransformer(pt.Transformer):
    def transform(self, inp: pd.DataFrame):
        # e.g., expects a query frame with query_vec
        pta.validate.query_frame(inp, extra_columns=['query_vec'])
        # raises an error if the specification doesn't match

Function	Must have column(s)	Must NOT have column(s)
`pta.validate.query_frame(inp, extra_columns=...)`	qid + `extra_columns`	docno
`pta.validate.document_frame(inp, extra_columns=...)`	docno + `extra_columns`	qid
`pta.validate.result_frame(inp, extra_columns=...)`	qid + docno + `extra_columns`
`pta.validate.columns(inp, includes=..., excludes=...)`	`includes`	`excludes`

Advanced Usage (click to expand)

Sometimes a transformer has multiple acceptable input specifications, e.g., if it can act as either a retriever (with a query input) or re-ranker (with a result input). In this case, you can specify multiple possible configurations in a with pta.validate.any(inpt) as v: block:

def MyTransformer(pt.Transformer):
    def transform(self, inp: pd.DataFrame):
        # e.g., expects a query frame with query_vec
        with pta.validate.any(inp) as v:
            v.query_frame(extra_columns=['query'], mode='retrieve')
            v.result_frame(extra_columns=['query', 'text'], mode='rerank')
        # raises an error if ALL specifications do not match
        # v.mode is set to the FIRST specification that matches
        if v.mode == 'retrieve':
            ...
        if v.mode == 'rerank':
            ...

pta.DataFrameBuilder

A common pattern in Transformer implementation builds up an intermediate representation of the output DataFrame, but this can be a bit clunky, as shown below:

def MyTransformer(pt.Transformer):
    def transform(self, inp: pd.DataFrame):
        result = {
            'qid': [],
            'query': [],
            'docno': [],
            'score': [],
        }
        for qid, query in zip(inp['qid'], inp['query']):
            docnos, scores = self.some_function(qid, query)
            result['qid'].append([qid] * len(docnos))
            result['query'].append([query] * len(docnos))
            result['docno'].append(docnos)
            result['score'].append(scores)
        result = pd.DataFrame({
            'qid': np.concatenate(result['qid']),
            'query': np.concatenate(result['query']),
            'docno': np.concatenate(result['docno']),
            'score': np.concatenate(result['score']),
        })
        return result

pta.DataFrameBuilder simplifies the process of building a DataFrame by removing lots of the boilerplate. It also automatically handles various types and ensures that all columns end up with the same length. The above example can be rewritten with pta.DataFrameBuilder as follows:

def MyTransformer(pt.Transformer):
    def transform(self, inp: pd.DataFrame):
        result = pta.DataFrameBuilder(['qid', 'query', 'docno', 'score'])
        for qid, query in zip(inp['qid'], inp['query']):
            docnos, scores = self.some_function(qid, query)
            result.extend({
                'qid': qid, # automatically repeats to the length of this batch
                'query': query, # ditto
                'docno': docnos,
                'score': scores,
            })
        return result.to_df()

Project details

Release history Release notifications | RSS feed

0.9.0

Oct 13, 2024

0.8.1

Sep 1, 2024

0.8.0

Sep 1, 2024

0.7.0

Aug 24, 2024

0.6.2

Aug 23, 2024

0.6.1

Aug 22, 2024

0.6.0

Aug 22, 2024

0.5.1

Aug 20, 2024

0.5.0

Aug 20, 2024

0.4.3

Aug 18, 2024

0.4.2

Aug 17, 2024

0.4.1

Aug 17, 2024

0.4.0

Aug 17, 2024

0.3.1

Jul 24, 2024

0.3.0

Jul 23, 2024

0.2.2

Jul 23, 2024

0.2.1

Jul 7, 2024

0.2.0 yanked

Jul 7, 2024

0.1.2

Jun 10, 2024

This version

0.1.1

Jun 10, 2024

0.1.0

Jun 10, 2024

0.0.1

Jun 10, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyterrier-alpha-0.1.1.tar.gz (4.4 kB view hashes)

Uploaded Jun 10, 2024 Source

Hashes for pyterrier-alpha-0.1.1.tar.gz

Hashes for pyterrier-alpha-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`976ffcf09e631b9b919a34a6a5e2b17ec70ab8c6e15abce12a0a4a2844a7d911`
MD5	`770c0274201bbef21335fa1c67e3ce22`
BLAKE2b-256	`c71d7fc6e088ec95870e942a90535702adc6ea094648647415f672fcf0dc7ee3`