Terrier IR Python API
Project description
Pyterrier
Terrier Python API
Installation
pip install python-terrier
Windows
Linux
Colab notebooks
Indexing
Indexing TREC formatted collections
index_path = "/home/alex/Documents/index"
path = "/home/alex/Downloads/books/doc-text.trec"
index_path = createTRECIndex(index_path, path)
Indexing text files
Indexing a pandas dataframe
Firstly, lets create an example dataframe
df = pd.DataFrame({'docno': ['1', '2', '3'],
'url': ['url1', 'url2', 'url3'],
'text' : ['He ran out of money, so he had to stop playing',
'The waves were crashing on the shore; it was a',
'The body may perhaps compensates for the loss']
})
Then there are a number of options to index that dataframe:
index = createDFIndex(index_path, df["text"])
index = createDFIndex(index_path, df["text"], df["docno"])
index = createDFIndex(index_path, df["text"], df["docno"], df["url"])
index = createDFIndex(index_path, df["text"], df)
index = createDFIndex(index_path, df["text"], docno=["1","2","3"])
meta_fields={"docno":["1","2","3"],"url":["url1", "url2", "url3"]}
index = createDFIndex(index_path, df["text"], **meta_fields)
Retrieval
Evaluation
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
python-terrier-0.1.7.tar.gz
(6.8 kB
view hashes)
Built Distribution
Close
Hashes for python_terrier-0.1.7-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b5315a574cdab13b69adf6ea9076d503472065de986111048c62e3d01a06d1f9 |
|
MD5 | 65a050109a8ea0ebcab1d87712d28fe2 |
|
BLAKE2b-256 | 771d5e045bf133b1bc8e0ace8b5f67bad798af25876bc3ffec1779ec5959a896 |