Skip to main content

State-of-the-art Retrieval/Search engine models, including ElasticSearch, ChromaDB, Milvus, and PrimeQA

Project description

primeqa

Repository for (almost) *all* your document search needs.

Part of the Prime Repository for State-of-the-Art Multilingual QuestionAnswering Research and Development.

DocUServe is a public open source repository that enables researchers and developers to quickly experiment with various search engines (such as ElasticSearch, ChromaDB, Milvus, PrimeQA, FAISS) both in direct search and reranking scenarios. By using DocUVerse, a researcher can replicate the experiments outlined in a paper published in the latest NLP conference while also enjoying the capability to download pre-trained models (from an online repository) and run them on their own custom data. DocUVerse is built on top of the Transformers, PrimeQA, and Elasticsearch toolkits and uses datasets and models that are directly downloadable.

Design

The following is a code snippet showing how to ingesting a new corpus (create an index for a specific engine), read the query file, run the search, compute the results and print them:

from docuverse import SearchEngine
engine = SearchEngine(config_or_path="data/clapnq_small/milvus-test.yaml")

# Read the ClapNQ dataset
data = engine.read_data() # or engine.read_data(engine.config.input_passages)
#Ingest the data
engine.ingest(data)

# Read the queries
queries = engine.read_questions() # or engine.read_questions(engine.config.input_queries)
# Run the retrieval
results = engine.search(queries)
# Evaluation and print the results
scores = engine.compute_score(queries, results)

# Print the evaluation results in a human-readable format.
print(f"Results:\n{scores}")

✔️ Getting Started

Installation

Installation doc

# cd to project root

# If you want to run on GPU make sure to install torch appropriately

# Install as editable (-e) or non-editable using pip, with extras (e.g. tests) as desired
# Example installation commands:

# Minimal install (non-editable)
pip install .

# Full install (editable)
pip install -e .

# Install milvus and/or elastic dependencies, and the pyizumo library (if you have acecess to it)
pip install -r requirements-milvus.txt
pip install -r requirements-elastic.txt
pip install -r requirements_extra.txt

Please note that dependencies (specified in setup.py) are pinned to provide a stable experience. When installing from source these can be modified, however this is not officially supported.

🔭 Learn more (not yet working)

Section Description
📒 Documentation Start API documentation and tutorials
📓 Tutorials: Jupyter Notebooks Notebooks to get started on QA tasks
🤗 Model sharing and uploading Upload and share your fine-tuned models with the community
Pull Request PrimeQA Pull Request
📄 Generate Documentation How Documentation works

❤️ DocUVerse collaborators include: Sara Rosenthal, Parul Awasthy, Scott McCarley, Jatin Ganhotra, and Radu Florian.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

docuverse-0.0.13-py3-none-any.whl (213.2 kB view details)

Uploaded Python 3

File details

Details for the file docuverse-0.0.13-py3-none-any.whl.

File metadata

  • Download URL: docuverse-0.0.13-py3-none-any.whl
  • Upload date:
  • Size: 213.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for docuverse-0.0.13-py3-none-any.whl
Algorithm Hash digest
SHA256 fddab8eed26afff1ccdbae05a47c9f0681cdca7a8a351e47d73478df6ea115c9
MD5 3047b712dcf1b0e73c58cf92f8c06b98
BLAKE2b-256 581047a0bd5bfeb3d89516702bf7a180ee6449e0e0434007b9b6f5877a7be37e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page