Terrier IR Python API
Project description
Pyterrier
Terrier Python API
Installation
pip install python-terrier
Windows
Linux
Colab notebooks
Indexing
Indexing TREC formatted collections
index_path = "/home/alex/Documents/index"
path = "/home/alex/Downloads/books/doc-text.trec"
index_path = createTRECIndex(index_path, path)
Indexing text files
Indexing a pandas dataframe
Firstly, lets create an example dataframe
df = pd.DataFrame({'docno': ['1', '2', '3'],
'url': ['url1', 'url2', 'url3'],
'text' : ['He ran out of money, so he had to stop playing',
'The waves were crashing on the shore; it was a',
'The body may perhaps compensates for the loss']
})
Then there are a number of options to index that dataframe:
index = createDFIndex(index_path, df["text"])
index = createDFIndex(index_path, df["text"], df["docno"])
index = createDFIndex(index_path, df["text"], df["docno"], df["url"])
index = createDFIndex(index_path, df["text"], df)
index = createDFIndex(index_path, df["text"], docno=["1","2","3"])
meta_fields={"docno":["1","2","3"],"url":["url1", "url2", "url3"]}
index = createDFIndex(index_path, df["text"], **meta_fields)
Retrieval
Evaluation
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
python-terrier-0.1.3.tar.gz
(6.4 kB
view hashes)
Built Distribution
Close
Hashes for python_terrier-0.1.3-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6955a399b2b5ee2e3a5d99389fed4a1738e7ef0d41f377390e8b4a7016337e39 |
|
MD5 | 98cfd9ec09f1b635a04408fee3d308d4 |
|
BLAKE2b-256 | 31ea9ea4303b00c4d33d630a4d9fbbb92c48e09e1d6752d526cc7105d671d672 |