Skip to main content

FlexNeuART (flex-noo-art) is a Flexible classic and NeurAl Retrieval Toolkit

Project description

Pypi version Downloads Downloads Join the chat at https://gitter.im/oaqa/FlexNeuART

FlexNeuART (flex-noo-art)

Flexible classic and NeurAl Retrieval Toolkit, or shortly FlexNeuART (intended pronunciation flex-noo-art) is a substantially reworked knn4qa package. The overview can be found in our EMNLP OSS workshop paper: Flexible retrieval with NMSLIB and FlexNeuART, 2020. Leonid Boytsov, Eric Nyberg.

In Aug-Dec 2020, we used this framework to generate best traditional and/or neural runs in the MSMARCO Document ranking task. In fact, our best traditional (non-neural) run slightly outperformed a couple of neural submissions. Please, see our write-up for details: Boytsov, Leonid. "Traditional IR rivals neural models on the MS MARCO Document Ranking Leaderboard." arXiv preprint arXiv:2012.08020 (2020).

In 2021, after being outsripped by a number of participants, we again advanced to a good position with a help of newly implemented models for ranking long documents. Please, see our write-up for details: Boytsov, L., Lin, T., Gao, F., Zhao, Y., Huang, J., & Nyberg, E. (2022). Understanding Performance of Long-Document Ranking Models through Comprehensive Evaluation and Leaderboarding. At the moment of writing (October 2022), we have competitive submissions on both MS MARCO leaderboards.

Code corresponding to Neural Model 1 is not included as this may be subject to a third party patent. This model (together with its non-contextualized variant) is described and evaluated in our ECIR 2021 paper: Boytsov, Leonid, and Zico Kolter. "Exploring Classic and Neural Lexical Translation Models for Information Retrieval: Interpretability, Effectiveness, and Efficiency Benefits." ECIR 2021.

In terms of pure effectiveness on long documents, other models (CEDR & PARADE) seem to be perform equally well (or somewhat better). They are available in our codebase. We are not aware of the patents inhibiting the use of the traditional (non-neural) Model 1.

Objectives

Develop & maintain a (relatively) light-weight modular middleware useful primarily for:

  • Research
  • Education
  • Evaluation & leaderboarding

Main features

  • Dense, sparse, or dense-sparse retrieval using Lucene and NMSLIB (dense embeddings can be created using any Sentence BERT model).
  • Multi-field multi-level forward indices (+parent-child field relations) that can store parsed and "raw" text input as well as sparse and dense vectors.
  • Forward indices can be created in append-only mode, which requires much less RAM.
  • Pluggable generic rankers (via a server)
  • SOTA neural (PARADE, BERT FirstP/MaxP/Sum, Longformer, COLBERT (re-ranking), dot-product Senence BERT models) and non-neural models (multi-field BM25, IBM Model 1).
  • Multi-GPU training and inference with out-of-the box support for ensembling
  • Basic experimentation framework (+LETOR)
  • Python API to use retrievers and rankers as well as to access indexed data.

Documentation

We support a number of neural BERT-based ranking models as well as strong traditional ranking models including IBM Model 1 (description of non-neural rankers to follow).

The framework supports data in generic JSONL format. We provide conversion (and in some cases download) scripts for the following collections:

Acknowledgements

For neural network training FlexNeuART incorporates a substantially re-worked variant of CEDR (MacAvaney et al' 2019).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

flexneuart-1.2.6.tar.gz (63.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

flexneuart-1.2.6-py2.py3-none-any.whl (63.3 MB view details)

Uploaded Python 2Python 3

File details

Details for the file flexneuart-1.2.6.tar.gz.

File metadata

  • Download URL: flexneuart-1.2.6.tar.gz
  • Upload date:
  • Size: 63.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.16

File hashes

Hashes for flexneuart-1.2.6.tar.gz
Algorithm Hash digest
SHA256 9cd2219aefdeeebfad568c8a798d392f12863e14ee2fb0912d6530a8fa8b9b40
MD5 60f9e6ee651c93b64e31830f8b55e1c2
BLAKE2b-256 2e76ae5c2421dffbc675cd6626e9f5e5351d2ad961e44105057b96385568c48e

See more details on using hashes here.

File details

Details for the file flexneuart-1.2.6-py2.py3-none-any.whl.

File metadata

  • Download URL: flexneuart-1.2.6-py2.py3-none-any.whl
  • Upload date:
  • Size: 63.3 MB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.16

File hashes

Hashes for flexneuart-1.2.6-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 eac8d36f1c6a87f89ccfc32c89011c7ed4480d6047f6d7b2253c58ec40329739
MD5 c04096ffc6484a1988f530471ca6e74b
BLAKE2b-256 ea59a78add4a2e5a6bbcb2d3d86812cfcbe8a76edd2c2614aa674405e7f51ffa

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page