Skip to main content

State-of-the-art Retrieval/Search engine models, including ElasticSearch, ChromaDB, Milvus, and PrimeQA

Project description

primeqa

Repository for (almost) *all* your document search needs.

Part of the Prime Repository for State-of-the-Art Multilingual QuestionAnswering Research and Development.

DocUServe is a public open source repository that enables researchers and developers to quickly experiment with various search engines (such as ElasticSearch, ChromaDB, Milvus, PrimeQA, FAISS) both in direct search and reranking scenarios. By using DocUVerse, a researcher can replicate the experiments outlined in a paper published in the latest NLP conference while also enjoying the capability to download pre-trained models (from an online repository) and run them on their own custom data. DocUVerse is built on top of the Transformers, PrimeQA, and Elasticsearch toolkits and uses datasets and models that are directly downloadable.

Design

The following is a code snippet showing how to run a query search, and also how to ingest a corpus, followed by an evaluation search.

from docuverse import SearchEngine, SearchQueries

# Test an existing engine
engine = SearchEngine(config="experiments/sap/elastic_v2/setup.yaml")
queries = SearchQueries(data="benchmark_v2.csv")

results = engine.search(queries)
scores = engine.compute_score(queries, results)
print (f"Results:\n{scores.to_string()}")

Ingesting a new corpus (create an index for a specific engine) should be just as easy:

from docuverse import SearchEngine, SearchCorpus, SearchQueries

corpus = SearchCorpus(filepaths="experiments/claspnq/passages.jsonl")
engine = SearchEngine(config="experiments/sap/elastic_v2/setup.yaml")
engine.ingest(corpus, max_doc_length=512, stride=100, title_handling="all", 
              index="my_new_index")

queries = SearchQueries(data="ClaspNQ.jsonl")
scores = engine.compute_score(queries, results)
print (f"Results:\n{scores.to_string()}")

✔️ Getting Started

Installation

Installation doc

# cd to project root

# If you want to run on GPU make sure to install torch appropriately

# E.g. for torch 1.11 + CUDA 11.3:
pip install 'torch~=1.11.0' --extra-index-url https://download.pytorch.org/whl/cu113

# Install as editable (-e) or non-editable using pip, with extras (e.g. tests) as desired
# Example installation commands:

# Minimal install (non-editable)
pip install .

# GPU support
pip install .[gpu]

# Full install (editable)
pip install -e .[all]

Please note that dependencies (specified in setup.py) are pinned to provide a stable experience. When installing from source these can be modified, however this is not officially supported.

Note: in many environments, conda-forge based faiss libraries perform substantially better than the default ones installed with pip. To install faiss libraries from conda-forge, use the following steps:

  • Create and activate a conda environment
  • Install faiss libraries, using a command

conda install -c conda-forge faiss=1.7.0 faiss-gpu=1.7.0

  • In setup.py, remove the faiss-related lines:
"faiss-cpu~=1.7.2": ["install", "gpu"],
"faiss-gpu~=1.7.2": ["gpu"],
  • Continue with the pip install commands as desctibed above.

:speech_balloon: Blog Posts

There're several blog posts by members of the open source community on how they've been using PrimeQA for their needs. Read some of them:

  1. PrimeQA and GPT 3
  2. Enterprise search with PrimeQA
  3. A search engine for Trivia geeks

🧪 Unit Tests

Testing doc

To run the unit tests you first need to install PrimeQA. Make sure to install with the [tests] or [all] extras from pip.

From there you can run the tests via pytest, for example:

pytest --cov PrimeQA --cov-config .coveragerc tests/

For more information, see:

🔭 Learn more

Section Description
📒 Documentation Full API documentation and tutorials
📓 Tutorials: Jupyter Notebooks Notebooks to get started on QA tasks
🤗 Model sharing and uploading Upload and share your fine-tuned models with the community
Pull Request PrimeQA Pull Request
📄 Generate Documentation How Documentation works

❤️ PrimeQA collaborators include

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

docuverse-0.0.8-py3-none-any.whl (105.0 kB view details)

Uploaded Python 3

File details

Details for the file docuverse-0.0.8-py3-none-any.whl.

File metadata

  • Download URL: docuverse-0.0.8-py3-none-any.whl
  • Upload date:
  • Size: 105.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.14

File hashes

Hashes for docuverse-0.0.8-py3-none-any.whl
Algorithm Hash digest
SHA256 7ac11af2b4152b245e3cd2b2979421fdb56e3be35a00dbc524dbc95667a8c2fe
MD5 d5c412a6f80ebb798ea4cdd058d1a77d
BLAKE2b-256 fbbbfc4cd4c61b9d7305639c6858efe1c81b0630903bb7e027deaae7e3c712f2

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page