Skip to main content

Nboost is a scalable, search-api-boosting platform for developing and deploying automated SOTA models more relevant search results.

Project description

Nboost

PyPI Documentation Status PyPI - License

HighlightsOverviewInstallGetting StartedDocumentationTutorialContributingRelease NotesBlog

What is it

NBoost is a scalable, search-api-boosting proxy for developing and deploying state-of-the-art models to improve the relevance of search results.

Nboost leverages finetuned models to produce domain-specific neural search engines. The platform can also improve other downstream tasks requiring ranked input, such as question answering.

Contact us to request domain-specific models or leave feedback

Overview

Fine-tuned Models Domain Search Boost[4] Scoring Speed
bert-base-uncased-msmarco(default)[1] bing queries 0.301 vs 0.173 (1.8x) ~5 ms/rank[3]
albert-tiny-msmarco (coming soon) - - ~0.7ms /rank [3]

To download and run nboost with one of these fine-tuned models run

nboost --model_dir=<model> --ext_host=<es_host>

[4] Mean Reciprocal Rank compared to BM25, the default for Elasticsearch. Reranking top 100.

Install NBoost

There are two ways to get NBoost, either as a Docker image or as a PyPi package. For cloud users, we highly recommend using NBoost via Docker.

🚸 Tensorflow, and Pytorch are not part of the "barebone" NBoost installation. Depending on your model, you may have to install them in advance.

For installing NBoost, follow the table below.

Dependency 🐳 Docker 📦 pypi
None koursaros/nboost:latest-alpine pip install nboost
Pytorch koursaros/nboost:latest-torch pip install nboost[torch]
Tensorflow koursaros/nboost:latest-tf pip install nboost[tf]
All koursaros/nboost:latest-all pip install nboost[all]

Any way you install it, if you end up reading the following message after $ nboost --help or $ docker run koursaros/nboost --help, then you are ready to go!

success installation of NBoost

Getting Started

Preliminaries

Before we start, let me first introduce the most important concept, the Proxy.

📡The Proxy

The proxy object is the core of NBoost. The proxy is essentially a wrapper to enable serving the model. It is able to understand incoming messages from specific search apis (i.e. Elasticsearch). When the proxy receives a message, it increases the amount of results the client is asking for so that the model can rerank a larger set and return the (hopefully) better results. For instance, if a client asks for 10 results to do with the query "brown dogs" from Elasticsearch, then the proxy may increase the results request to 100 and filter down the best ten results for the client.

Setting up a Neural Proxy for Elasticsearch in 3 minutes

In this example we will set up a proxy to sit in between the client and Elasticsearch and boost the results!

Setting up an Elasticsearch Server

🔔 If you already have an Elasticsearch server, you can move on to the next step!

If you don't have Elasticsearch, not to worry! You can set up a local Elasticsearch cluster by using docker. First, get the ES image by running:

docker pull elasticsearch:7.4.2

Once you have the image, you can run an Elasticsearch server via:

docker run -d -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" elasticsearch:7.4.2

Deploying the proxy

Now we're ready to deploy our Neural Proxy! It is very simple to do this, simply run:

nboost --ext_host localhost --ext_port 9200

📢 The --ext_host and --ext_port should be the same as the Elasticsearch server above!

If you get this message: LISTENING: <host>:<port>, then we're good to go.

Indexing some data

The proxy is set up so that there is no need to ever talk to the server directly from here on out. You can send index requests, stats requests, but only the search requests will be altered. For demonstration purposes, we will be indexing a set of passages about open-source software through NBoost. You can add the index to your Elasticsearch server by running:

nboost-tutorial opensource --host localhost --port 8000

Now let's test it out! Hit the proxy with:

curl "http://localhost:8000/opensource/_search?q=passage:what%20is%20mozilla%20firefox&pretty&size=2"

If the Elasticsearch result has the _nboost tag in it, congratulations it's working!

success installation of NBoost

Elastic made easy

To increase the number of parallel proxies, simply increase --workers:

🚧 Under construction.

Deploying a proxy via Docker Swarm/Kubernetes

🚧 Under construction.

Take-home messages

Let's make a short recap of what we have learned.

  • NBoost is result-boosting-proxy, there are four fundamental components: model, server, db and codex.
  • One can increase the number of concurrent proxies with --workers or by deploying more containers.
  • NBoost can be deployed using an orchestration engine to coordinate load-balancing. It supports Kubernetes, Docker Swarm, or built-in multi-process/thread solution.

Documentation

ReadTheDoc

The official NBoost documentation is hosted on nboost.readthedocs.io. It is automatically built, updated and archived on every new release.

Tutorial

🚧 Under construction.

Benchmark

We have setup /benchmarks to track the network/model latency over different NBoost versions.

Contributing

Contributions are greatly appreciated! You can make corrections or updates and commit them to NBoost. Here are the steps:

  1. Create a new branch, say fix-nboost-typo-1
  2. Fix/improve the codebase
  3. Commit the changes. Note the commit message must follow the naming style, say Fix/model-bert: improve the readability and move sections
  4. Make a pull request. Note the pull request must follow the naming style. It can simply be one of your commit messages, just copy paste it, e.g. Fix/model-bert: improve the readability and move sections
  5. Submit your pull request and wait for all checks passed (usually 10 minutes)
    • Coding style
    • Commit and PR styles check
    • All unit tests
  6. Request reviews from one of the developers from our core team.
  7. Merge!

More details can be found in the contributor guidelines.

Citing NBoost

If you use NBoost in an academic paper, we would love to be cited. Here are the two ways of citing NBoost:

  1. \footnote{https://github.com/koursaros-ai/nboost}
    
  2. @misc{koursaros2019NBoost,
      title={NBoost: Neural Boosting Search Results},
      author={Thienes, Cole and Pertschuk, Jack},
      howpublished={\url{https://github.com/koursaros-ai/nboost}},
      year={2019}
    }
    

Footnotes

[1] https://github.com/nyu-dl/dl4marco-bert
[2] https://github.com/huggingface/transformers
[3] ms for reranking each hit. On nvidia T4 GPU.

License

If you have downloaded a copy of the NBoost binary or source code, please note that the NBoost binary and source code are both licensed under the Apache License, Version 2.0.

Koursaros AI is excited to bring this open source software to the community.
Copyright (C) 2019. All rights reserved.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nboost-0.0.1rc3.tar.gz (57.5 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page