A simple information retrieval system for pdf documents
Project description
irspdf
A simple textual information retrieval system for pdf documents.
The ranking function used is BM25.
Installation
Install with pip
pip install irspdf
OR install from github
git clone https://github.com/Jibril-Frej/irspdf.git
cd irspdf && python setup.py install
Usage
Build a collection
from irspdf import build
build(folder_path, collection_path)
folder_path : path of the folder that contains all the pdf files to include to the collection.
collection_path : file where the collection will be saved
Query the collection
from irspdf import query
query(collection_path)
collection_path : file where the collection is saved
Update the collection
from irspdf import update
update(folder_path, collection_path)
folder_path : path of the folder that contains all the pdf files to add to the collection.
collection_path : file where the original collection is saved
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
irspdf-0.3.1.tar.gz
(4.1 kB
view hashes)
Built Distribution
irspdf-0.3.1-py3-none-any.whl
(7.9 kB
view hashes)