Python toolkit to read and analyse TREC results.
Project description
# TREC TOOLS
TrecTools is an open-source Python library for assisting Information Retrieval (IR) practitioners with TREC-like campaigns.
If this package helps your research somehow, please reference our paper:
```
@inproceedings{palotti2019,
author = {Palotti, Joao and Scells, Harrisen and Zuccon, Guido},
title = {TrecTools: an open-source Python library for Information Retrieval practitioners involved in TREC-like campaigns},
series = {SIGIR'19},
year = {2019},
location = {Paris, France},
publisher = {ACM}
}
```
## Installing
```
pip install trectools
```
## Background
IR practitioners tasked with activities like building test collections, evaluating systems, or analysing results from empirical experiments commonly have to resort to use a number of different software tools and scripts that each perform an individual functionality – and at times they even have to implement ad-hoc scripts of their own. TrecTools aims to provide a unified environment for performing these common activities.
### Features
TrecTools is implemented in Python using standard data science libraries (NumPy, SciPy, Pandas, and Matplotlib) and using the object-oriented paradigm.
Each of the key components of an evaluation campaign is mapped to a class: classes for runs (TrecRun),topics/queries (TrecTopic), assessment pools (TrecPools), relevance assessments (TrecQrel) and the evaluation results (TrecRes). [See file format for each object below](https://github.com/joaopalotti/trec_tools#file-formats).
Evaluation results can be produced by TrecTools itself using the evaluation metrics implemented in the tool, or be imported from the output file of trec_eval and derivatives. The features that are currently implemented in TrecTools are:
- **Querying IR Systems:** Benchmark runs can be obtained di-rectly from one of the IR toolkits that are integrated in TrecTools. There is support for issuing full-text queries to [Indri](https://www.lemurproject.org/indri/) and [Terrier](http://terrier.org/) toolkits. Future releases will include other toolkits (e.g., [Elastic-search](), [Anserini](https://dl.acm.org/citation.cfm?id=3239571), etc.) and support for specific query languages(Indri’s query language, Boolean queries). See code snipets in [Example 1](https://github.com/joaopalotti/trec_tools#example-1).
- **Pooling Techniques:** The following techniques for assessment pool creation from a runs set are implemented: [Depth@K](https://sigir.org/files/museum/pub-14/pub_14.pdf), [Comb[Min/Max/Med/Sum/ANZ/MNZ]](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.100.9316&rep=rep1&type=pdf), [Take@N](https://link.springer.com/chapter/10.1007/978-3-319-56608-5_28), [RRFTake@N](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.150.2291&rep=rep1&type=pdf), [RBPTake@N](https://people.eng.unimelb.edu.au/jzobel/fulltext/acmtois08.pdf). [See Example 2](https://github.com/joaopalotti/trec_tools#example-2).
- **Evaluation Measures:** Currently implemented and verified measures include: Precision at depth K, Recall at depth K, MAP, NDCG, Bpref, [uBpref](http://zuccon.net/publications/sigir2016_L2R_readability.pdf), [RBP]((https://people.eng.unimelb.edu.au/jzobel/fulltext/acmtois08.pdf)), [uRBP](https://link.springer.com/chapter/10.1007/978-3-319-30671-1_21). Implemented in TrecTools is the option to break ties using document score (i.e., similar to trec_eval), or document ranking (i.e., similar to the original implementation of [RBP]( https://people.eng.unimelb.edu.au/ammoffat/abstracts/mz08acmtois.html
)). Additionally, TrecTools also allows to compute the residual of the evaluation measure and analyse the relative presence of unassessed documents. [See Example 3](https://github.com/joaopalotti/trec_tools#example-3).
- **Correlation and Agreement Analysis:** The Pearson, Spearman, Kendall and τ-ap correlation between system rankings can be computed [(see Example 4)](https://github.com/joaopalotti/trec_tools#example-4). Agreement measures between relevance assessment sets can be obtained with Kappa or Jaccard [(see Example 5)](https://github.com/joaopalotti/trec_tools#example-5).
- **Fusion Techniques.** Runs can be fused using the following techniques: Comb[Max/Min/Sum/Mnz/Anz/Med] - both using the scores and document rankings, RBPFusion, RRFFusion,or BordaCountFusion. Fusion techniques are provided for meta-analysis. [See Example 6](https://github.com/joaopalotti/trec_tools#example-6).
### File Formats
The three main modules found in TrecTools are inspired by the main files created in TREC campaigns: a participant run (TrecRun), a qrel (TrecQrel) e a result file (TrecRes).
**TrecRun format**
qid Q0 docno rank score tag
where:
- **qid** is the query number
- **Q0** is the literal Q0
- **docno** is the id of a document returned for qid
- **rank** (1-999) is the rank of this response for this qid
- **score** is a system-dependent indication of the quality of the response
- **tag** is the identifier for the system
Example:
1 Q0 nhslo3844_12_012186 1 1.73315273652 mySystem
1 Q0 nhslo1393_12_003292 2 1.72581054377 mySystem
1 Q0 nhslo3844_12_002212 3 1.72522727817 mySystem
1 Q0 nhslo3844_12_012182 4 1.72522727817 mySystem
1 Q0 nhslo1393_12_003296 5 1.71374426875 mySystem
**TrecQrel format**
qid 0 docno relevance
where:
- **qid** is the query number
- **0** is the literal 0
- **docno** is the id of a document in your collection
- **relevance** is how relevant is docno for qid
Example:
1 0 aldf.1864_12_000027 1
1 0 aller1867_12_000032 2
1 0 aller1868_12_000012 0
1 0 aller1871_12_000640 1
1 0 arthr0949_12_000945 0
1 0 arthr0949_12_000974 1
**TrecRes format**
label qid value
where:
- **label** is any string, usually representing a metric
- **qid** is the query number or 'all' to represent a aggregate value
- **value** is numeral result of a metric
Example:
num_rel_ret 7 77
map 7 0.4653
P_10 9 0.9000
num_rel_ret all 1180
map all 0.1323
gm_map all 0.0504
### Code Examples
#### Example 0
Code Snippets and toy examples with TrecTools.
```python
from trectools import TrecQrel, procedures
qrels_file = "./qrel/robust03_qrels.txt"
qrels = TrecQrel(qrels_file)
# Generates a P@10 graph with all the runs in a directory
path_to_runs = "./robust03/runs/"
runs = procedures.list_of_runs_from_path(path_to_runs, "*.gz")
results = procedures.evaluate_runs(runs, qrels, per_query=True)
p10 = procedures.extract_metric_from_results(results, "P_10")
procedures.plot_system_rank(p10, display_metric="P@10")
# Sample output with one run for each participating team in robust03:
```
![](robust03/robust03.png)
#### Example 1
Code Snippets for manipulating topic formats and querying different IR toolkits (shown here: Terrier and Indri)
```python
from trectools import TrecTopics, TrecTerrier, TrecIndri
# Loads some topics from a file (e.g., topics.txt)
"""
<topics>
<topic number="201" type="single">
<query>amazon raspberry pi</query>
<description> You have heard quite a lot about cheap computing as being the way of the future,
including one recent model called a Raspberry Pi. You start thinking about buying one, and wonder how much they cost.
</description>
</topic>
</topics>
"""
topics = TrecTopics().read_topics_from_file("topics.txt")
# Or...load topics from a Python dictionary
topics = TrecTopics(topics={'201': u'amazon raspberry pi'})
topics.printfile(fileformat="terrier")
#<topics>
# <top>
# <num>201</num>
# <title>amazon raspberry pi</title>
# </top>
#</topics>
topics.printfile(fileformat="indri")
#<parameters>
# <trecFormat>true</trecFormat>
# <query>
# <id>201</id>
# <text>#combine( amazon raspberry pi )</text>
# </query>
#</parameters>
topics.printfile(fileformat="indribaseline")
#<parameters>
# <trecFormat>true</trecFormat>
# <query>
# <id>201</id>
# <text>amazon raspberry pi</text>
# </query>
#</parameters>
tt = TrecTerrier(bin_path="<PATH>/terrier/bin/") # where trec_terrier.sh is located
# Runs PL2 model from Terrier with Query Expansion
tr = tt.run(index="<PATH>/terrier/var/index", topics="topics.xml.gz", qexp=True,
model="PL2", result_file="terrier.baseline", expTerms=5, expDocs=3, expModel="Bo1")
ti = TrecIndri(bin_path="~/<PATH>/indri/bin/") # where IndriRunQuery is located
ti.run(index="<PATH>/indriindex", topics, model="dirichlet", parameters={"mu":2500},
result_file="trec_indri.run", ndocs=1000, qexp=True, expTerms=5, expDocs=3)
```
#### Example 2
Code Snippets for generating and exporting document pools using different pooling strategies.
```python
from trectools import TrecPool, TrecRun
r1 = TrecRun("./robust03/runs/input.aplrob03a.gz")
r2 = TrecRun("./robust03/runs/input.UIUC03Rd1.gz")
len(r1.topics()) # 100 topics
# Creates document pools with r1 and r2 using different strategies:
# Strategy1: Creates a pool with top 10 documents of each run:
pool1 = TrecPool.make_pool([r1, r2], strategy="topX", topX=10) # Pool with 1636 unique documents.
# Strategy2: Creates a pool with 2000 documents (20 per topic) using the reciprocal ranking strategy by Gordon, Clake and Buettcher:
pool2 = TrecPool.make_pool([r1,r2], strategy="rrf", topX=20, rrf_den=60) # Pool with 2000 unique documents.
# Check to see which pool covers better my run r1
pool1.check_coverage(r1, topX=10) # 10.0
pool2.check_coverage(r1, topX=10) # 8.35
# Export documents to be judged using Relevation! visual assessing system
pool1.export_document_list(filename="mypool.txt", with_format="relevation")
```
#### Example 3
Code snippets showing case evaluation options available in TrecTools.
```python
from trectools import TrecQrel, TrecRun, TrecEval
# A typical evaluation workflow
r1 = TrecRun("./robust03/runs/input.aplrob03a.gz")
r1.topics()[:5] # Shows the first 5 topics: 601, 602, 603, 604, 605
qrels = TrecQrel("./robust03/qrel/robust03_qrels.txt")
te = TrecEval(r1, qrels)
rbp, residuals = te.getRBP() # RBP: 0.474, Residuals: 0.001
p100 = te.getPrecisionAtDepth(100) # P@100: 0.186
# Check if documents retrieved by the system were judged:
r1.get_mean_coverage(qrels, topX=10) # 9.99
r1.get_mean_coverage(qrels, topX=1000) # 481.390
# On average for system 'input.aplrob03a' participating in robust03, 480 documents out of 1000 were judged.
# Loads another run
r2 = TrecRun("./robust03/runs/input.UIUC03Rd1.gz")
# Check how many documents, on average, in the top 10 of r1 were retrieved in the top 10 of r2
r1.check_run_coverage(r2, topX=10) # 3.64
# Evaluates r1 and r2 using all implemented evaluation metrics
result_r1 = r1.evaluate_run(qrels, per_query=True)
result_r2 = r2.evaluate_run(qrels, per_query=True)
# Inspect for statistically significant differences between the two runs for P_10 using two-tailed Student t-test
pvalue = result_r1.compare_with(result_r2, metric="P_10") # pvalue: 0.0167
```
#### Example 4
Code Snippets for obtaining correlation measures from a set of runs.
```python
from trectools import misc, TrecRun, TrecQrel, procedures
qrels_file = "./robust03/qrel/robust03_qrels.txt"
path_to_runs = "./robust03/runs/"
qrels = TrecQrel(qrels_file)
runs = procedures.list_of_runs_from_path(path_to_runs, "*.gz")
results = procedures.evaluate_runs(runs, qrels, per_query=True)
# check the system correlation between P@10 and MAP using Kendall's tau for all systems participating in a campaign
misc.get_correlation( misc.sort_systems_by(results, "P_10"),
misc.sort_systems_by(results, "map"), correlation = "kendall") # Correlation: 0.7647
# check the system correlation between P@10 and MAP using Tau's ap for all systems participating in a campaign
misc.get_correlation( misc.sort_systems_by(results, "P_10"),
misc.sort_systems_by(results, "map"), correlation = "tauap") # Correlation: 0.77413
```
#### Example 5
Code Snippets for obtaining agreement measures from a pair of relevance assessments.
```python
# Code snippet to check correlation between two sets of relevance assessment (e.g., made by different cohorts - assessments made by medical doctors Vs. crowdsourced assessments)
from trectools import TrecQrel
original_qrels_file = "./robust03/qrel/robust03_qrels.txt"
# Changed the first 10 assessments from 0 to 1
modified_qrels_file = "./robust03/qrel/mod_robust03_qrels.txt"
original_qrels = TrecQrel(original_qrels_file)
modified_qrels = TrecQrel(modified_qrels_file)
# Overall agreement
original_qrels.check_agreement(modified_qrels) # 0.99
# Fleiss' kappa agreement
original_qrels.check_kappa(modified_qrels) # P0: 1.00, Pe = 0.90
# Jaccard similarity coefficient
original_qrels.check_jaccard(modified_qrels) # 0.99
# 3x3 confusion matrix (labels 0, 1 or 2)
original_qrels.check_confusion_matrix(modified_qrels)
# [[122712 10 0]
# [ 0 5667 0]
# [ 0 0 407]]
```
#### Example 6
Code Snippets for generating fusing two runs (Reciprocal Rank fusion shown here)
```python
from trectools import TrecRun, TrecEval, fusion
r1 = TrecRun("./robust03/runs/input.aplrob03a.gz")
r2 = TrecRun("./robust03/runs/input.UIUC03Rd1.gz")
# Easy way to create new baselines by fusing existing runs:
fused_run = fusion.reciprocal_rank_fusion([r1,r2])
TrecEval(r1, qrels).getPrecisionAtDepth(25) # P@25: 0.3392
TrecEval(r2, qrels).getPrecisionAtDepth(25) # P@25: 0.2872
TrecEval(fused_run, qrels).getPrecisionAtDepth(25) # P@25: 0.3436
# Save run to disk with all its topics
fused_run.print_subset("my_fused_run.txt", topics=fused_run.topics())
```
## ToDos
- [x] Upload examples with a famous Trec campaing (e.g., robust3)
- [ ] Explain other file formats, such as TrecPool
TrecTools is an open-source Python library for assisting Information Retrieval (IR) practitioners with TREC-like campaigns.
If this package helps your research somehow, please reference our paper:
```
@inproceedings{palotti2019,
author = {Palotti, Joao and Scells, Harrisen and Zuccon, Guido},
title = {TrecTools: an open-source Python library for Information Retrieval practitioners involved in TREC-like campaigns},
series = {SIGIR'19},
year = {2019},
location = {Paris, France},
publisher = {ACM}
}
```
## Installing
```
pip install trectools
```
## Background
IR practitioners tasked with activities like building test collections, evaluating systems, or analysing results from empirical experiments commonly have to resort to use a number of different software tools and scripts that each perform an individual functionality – and at times they even have to implement ad-hoc scripts of their own. TrecTools aims to provide a unified environment for performing these common activities.
### Features
TrecTools is implemented in Python using standard data science libraries (NumPy, SciPy, Pandas, and Matplotlib) and using the object-oriented paradigm.
Each of the key components of an evaluation campaign is mapped to a class: classes for runs (TrecRun),topics/queries (TrecTopic), assessment pools (TrecPools), relevance assessments (TrecQrel) and the evaluation results (TrecRes). [See file format for each object below](https://github.com/joaopalotti/trec_tools#file-formats).
Evaluation results can be produced by TrecTools itself using the evaluation metrics implemented in the tool, or be imported from the output file of trec_eval and derivatives. The features that are currently implemented in TrecTools are:
- **Querying IR Systems:** Benchmark runs can be obtained di-rectly from one of the IR toolkits that are integrated in TrecTools. There is support for issuing full-text queries to [Indri](https://www.lemurproject.org/indri/) and [Terrier](http://terrier.org/) toolkits. Future releases will include other toolkits (e.g., [Elastic-search](), [Anserini](https://dl.acm.org/citation.cfm?id=3239571), etc.) and support for specific query languages(Indri’s query language, Boolean queries). See code snipets in [Example 1](https://github.com/joaopalotti/trec_tools#example-1).
- **Pooling Techniques:** The following techniques for assessment pool creation from a runs set are implemented: [Depth@K](https://sigir.org/files/museum/pub-14/pub_14.pdf), [Comb[Min/Max/Med/Sum/ANZ/MNZ]](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.100.9316&rep=rep1&type=pdf), [Take@N](https://link.springer.com/chapter/10.1007/978-3-319-56608-5_28), [RRFTake@N](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.150.2291&rep=rep1&type=pdf), [RBPTake@N](https://people.eng.unimelb.edu.au/jzobel/fulltext/acmtois08.pdf). [See Example 2](https://github.com/joaopalotti/trec_tools#example-2).
- **Evaluation Measures:** Currently implemented and verified measures include: Precision at depth K, Recall at depth K, MAP, NDCG, Bpref, [uBpref](http://zuccon.net/publications/sigir2016_L2R_readability.pdf), [RBP]((https://people.eng.unimelb.edu.au/jzobel/fulltext/acmtois08.pdf)), [uRBP](https://link.springer.com/chapter/10.1007/978-3-319-30671-1_21). Implemented in TrecTools is the option to break ties using document score (i.e., similar to trec_eval), or document ranking (i.e., similar to the original implementation of [RBP]( https://people.eng.unimelb.edu.au/ammoffat/abstracts/mz08acmtois.html
)). Additionally, TrecTools also allows to compute the residual of the evaluation measure and analyse the relative presence of unassessed documents. [See Example 3](https://github.com/joaopalotti/trec_tools#example-3).
- **Correlation and Agreement Analysis:** The Pearson, Spearman, Kendall and τ-ap correlation between system rankings can be computed [(see Example 4)](https://github.com/joaopalotti/trec_tools#example-4). Agreement measures between relevance assessment sets can be obtained with Kappa or Jaccard [(see Example 5)](https://github.com/joaopalotti/trec_tools#example-5).
- **Fusion Techniques.** Runs can be fused using the following techniques: Comb[Max/Min/Sum/Mnz/Anz/Med] - both using the scores and document rankings, RBPFusion, RRFFusion,or BordaCountFusion. Fusion techniques are provided for meta-analysis. [See Example 6](https://github.com/joaopalotti/trec_tools#example-6).
### File Formats
The three main modules found in TrecTools are inspired by the main files created in TREC campaigns: a participant run (TrecRun), a qrel (TrecQrel) e a result file (TrecRes).
**TrecRun format**
qid Q0 docno rank score tag
where:
- **qid** is the query number
- **Q0** is the literal Q0
- **docno** is the id of a document returned for qid
- **rank** (1-999) is the rank of this response for this qid
- **score** is a system-dependent indication of the quality of the response
- **tag** is the identifier for the system
Example:
1 Q0 nhslo3844_12_012186 1 1.73315273652 mySystem
1 Q0 nhslo1393_12_003292 2 1.72581054377 mySystem
1 Q0 nhslo3844_12_002212 3 1.72522727817 mySystem
1 Q0 nhslo3844_12_012182 4 1.72522727817 mySystem
1 Q0 nhslo1393_12_003296 5 1.71374426875 mySystem
**TrecQrel format**
qid 0 docno relevance
where:
- **qid** is the query number
- **0** is the literal 0
- **docno** is the id of a document in your collection
- **relevance** is how relevant is docno for qid
Example:
1 0 aldf.1864_12_000027 1
1 0 aller1867_12_000032 2
1 0 aller1868_12_000012 0
1 0 aller1871_12_000640 1
1 0 arthr0949_12_000945 0
1 0 arthr0949_12_000974 1
**TrecRes format**
label qid value
where:
- **label** is any string, usually representing a metric
- **qid** is the query number or 'all' to represent a aggregate value
- **value** is numeral result of a metric
Example:
num_rel_ret 7 77
map 7 0.4653
P_10 9 0.9000
num_rel_ret all 1180
map all 0.1323
gm_map all 0.0504
### Code Examples
#### Example 0
Code Snippets and toy examples with TrecTools.
```python
from trectools import TrecQrel, procedures
qrels_file = "./qrel/robust03_qrels.txt"
qrels = TrecQrel(qrels_file)
# Generates a P@10 graph with all the runs in a directory
path_to_runs = "./robust03/runs/"
runs = procedures.list_of_runs_from_path(path_to_runs, "*.gz")
results = procedures.evaluate_runs(runs, qrels, per_query=True)
p10 = procedures.extract_metric_from_results(results, "P_10")
procedures.plot_system_rank(p10, display_metric="P@10")
# Sample output with one run for each participating team in robust03:
```
![](robust03/robust03.png)
#### Example 1
Code Snippets for manipulating topic formats and querying different IR toolkits (shown here: Terrier and Indri)
```python
from trectools import TrecTopics, TrecTerrier, TrecIndri
# Loads some topics from a file (e.g., topics.txt)
"""
<topics>
<topic number="201" type="single">
<query>amazon raspberry pi</query>
<description> You have heard quite a lot about cheap computing as being the way of the future,
including one recent model called a Raspberry Pi. You start thinking about buying one, and wonder how much they cost.
</description>
</topic>
</topics>
"""
topics = TrecTopics().read_topics_from_file("topics.txt")
# Or...load topics from a Python dictionary
topics = TrecTopics(topics={'201': u'amazon raspberry pi'})
topics.printfile(fileformat="terrier")
#<topics>
# <top>
# <num>201</num>
# <title>amazon raspberry pi</title>
# </top>
#</topics>
topics.printfile(fileformat="indri")
#<parameters>
# <trecFormat>true</trecFormat>
# <query>
# <id>201</id>
# <text>#combine( amazon raspberry pi )</text>
# </query>
#</parameters>
topics.printfile(fileformat="indribaseline")
#<parameters>
# <trecFormat>true</trecFormat>
# <query>
# <id>201</id>
# <text>amazon raspberry pi</text>
# </query>
#</parameters>
tt = TrecTerrier(bin_path="<PATH>/terrier/bin/") # where trec_terrier.sh is located
# Runs PL2 model from Terrier with Query Expansion
tr = tt.run(index="<PATH>/terrier/var/index", topics="topics.xml.gz", qexp=True,
model="PL2", result_file="terrier.baseline", expTerms=5, expDocs=3, expModel="Bo1")
ti = TrecIndri(bin_path="~/<PATH>/indri/bin/") # where IndriRunQuery is located
ti.run(index="<PATH>/indriindex", topics, model="dirichlet", parameters={"mu":2500},
result_file="trec_indri.run", ndocs=1000, qexp=True, expTerms=5, expDocs=3)
```
#### Example 2
Code Snippets for generating and exporting document pools using different pooling strategies.
```python
from trectools import TrecPool, TrecRun
r1 = TrecRun("./robust03/runs/input.aplrob03a.gz")
r2 = TrecRun("./robust03/runs/input.UIUC03Rd1.gz")
len(r1.topics()) # 100 topics
# Creates document pools with r1 and r2 using different strategies:
# Strategy1: Creates a pool with top 10 documents of each run:
pool1 = TrecPool.make_pool([r1, r2], strategy="topX", topX=10) # Pool with 1636 unique documents.
# Strategy2: Creates a pool with 2000 documents (20 per topic) using the reciprocal ranking strategy by Gordon, Clake and Buettcher:
pool2 = TrecPool.make_pool([r1,r2], strategy="rrf", topX=20, rrf_den=60) # Pool with 2000 unique documents.
# Check to see which pool covers better my run r1
pool1.check_coverage(r1, topX=10) # 10.0
pool2.check_coverage(r1, topX=10) # 8.35
# Export documents to be judged using Relevation! visual assessing system
pool1.export_document_list(filename="mypool.txt", with_format="relevation")
```
#### Example 3
Code snippets showing case evaluation options available in TrecTools.
```python
from trectools import TrecQrel, TrecRun, TrecEval
# A typical evaluation workflow
r1 = TrecRun("./robust03/runs/input.aplrob03a.gz")
r1.topics()[:5] # Shows the first 5 topics: 601, 602, 603, 604, 605
qrels = TrecQrel("./robust03/qrel/robust03_qrels.txt")
te = TrecEval(r1, qrels)
rbp, residuals = te.getRBP() # RBP: 0.474, Residuals: 0.001
p100 = te.getPrecisionAtDepth(100) # P@100: 0.186
# Check if documents retrieved by the system were judged:
r1.get_mean_coverage(qrels, topX=10) # 9.99
r1.get_mean_coverage(qrels, topX=1000) # 481.390
# On average for system 'input.aplrob03a' participating in robust03, 480 documents out of 1000 were judged.
# Loads another run
r2 = TrecRun("./robust03/runs/input.UIUC03Rd1.gz")
# Check how many documents, on average, in the top 10 of r1 were retrieved in the top 10 of r2
r1.check_run_coverage(r2, topX=10) # 3.64
# Evaluates r1 and r2 using all implemented evaluation metrics
result_r1 = r1.evaluate_run(qrels, per_query=True)
result_r2 = r2.evaluate_run(qrels, per_query=True)
# Inspect for statistically significant differences between the two runs for P_10 using two-tailed Student t-test
pvalue = result_r1.compare_with(result_r2, metric="P_10") # pvalue: 0.0167
```
#### Example 4
Code Snippets for obtaining correlation measures from a set of runs.
```python
from trectools import misc, TrecRun, TrecQrel, procedures
qrels_file = "./robust03/qrel/robust03_qrels.txt"
path_to_runs = "./robust03/runs/"
qrels = TrecQrel(qrels_file)
runs = procedures.list_of_runs_from_path(path_to_runs, "*.gz")
results = procedures.evaluate_runs(runs, qrels, per_query=True)
# check the system correlation between P@10 and MAP using Kendall's tau for all systems participating in a campaign
misc.get_correlation( misc.sort_systems_by(results, "P_10"),
misc.sort_systems_by(results, "map"), correlation = "kendall") # Correlation: 0.7647
# check the system correlation between P@10 and MAP using Tau's ap for all systems participating in a campaign
misc.get_correlation( misc.sort_systems_by(results, "P_10"),
misc.sort_systems_by(results, "map"), correlation = "tauap") # Correlation: 0.77413
```
#### Example 5
Code Snippets for obtaining agreement measures from a pair of relevance assessments.
```python
# Code snippet to check correlation between two sets of relevance assessment (e.g., made by different cohorts - assessments made by medical doctors Vs. crowdsourced assessments)
from trectools import TrecQrel
original_qrels_file = "./robust03/qrel/robust03_qrels.txt"
# Changed the first 10 assessments from 0 to 1
modified_qrels_file = "./robust03/qrel/mod_robust03_qrels.txt"
original_qrels = TrecQrel(original_qrels_file)
modified_qrels = TrecQrel(modified_qrels_file)
# Overall agreement
original_qrels.check_agreement(modified_qrels) # 0.99
# Fleiss' kappa agreement
original_qrels.check_kappa(modified_qrels) # P0: 1.00, Pe = 0.90
# Jaccard similarity coefficient
original_qrels.check_jaccard(modified_qrels) # 0.99
# 3x3 confusion matrix (labels 0, 1 or 2)
original_qrels.check_confusion_matrix(modified_qrels)
# [[122712 10 0]
# [ 0 5667 0]
# [ 0 0 407]]
```
#### Example 6
Code Snippets for generating fusing two runs (Reciprocal Rank fusion shown here)
```python
from trectools import TrecRun, TrecEval, fusion
r1 = TrecRun("./robust03/runs/input.aplrob03a.gz")
r2 = TrecRun("./robust03/runs/input.UIUC03Rd1.gz")
# Easy way to create new baselines by fusing existing runs:
fused_run = fusion.reciprocal_rank_fusion([r1,r2])
TrecEval(r1, qrels).getPrecisionAtDepth(25) # P@25: 0.3392
TrecEval(r2, qrels).getPrecisionAtDepth(25) # P@25: 0.2872
TrecEval(fused_run, qrels).getPrecisionAtDepth(25) # P@25: 0.3436
# Save run to disk with all its topics
fused_run.print_subset("my_fused_run.txt", topics=fused_run.topics())
```
## ToDos
- [x] Upload examples with a famous Trec campaing (e.g., robust3)
- [ ] Explain other file formats, such as TrecPool
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
trectools-0.0.38.tar.gz
(26.2 kB
view hashes)