nboost

Nboost is a scalable, search-api-boosting platform for developing and deploying automated SOTA models more relevant search results.

These details have not been verified by PyPI

Project links

Homepage

Project description

Nboost

Highlights • Overview • Benchmarks • Install • Getting Started • Documentation • Tutorials • Contributing • Release Notes • Blog

What is it

⚡NBoost is a scalable, search-engine-boosting platform for developing and deploying state-of-the-art models to improve the relevance of search results.

Nboost leverages finetuned models to produce domain-specific neural search engines. The platform can also improve other downstream tasks requiring ranked input, such as question answering.

Overview

The workflow of NBoost is relatively simple. Take the graphic above, and imagine that the server in this case is Elasticsearch.

In a conventional search request, the user asks for 10 results from Elasticsearch and gets 10 back from the server.

In an NBoost search request, the user asks for 10 results from the proxy. Then, the proxy asks for 100 results from the Elasticsearch. When the server returns 100 results, the model picks the best 10 results and returns them to the user.

Benchmarks

Fine-tuned Models	Domain	Search Boost^[4]	Speed
`bert-base-uncased-msmarco`(default)^[1]	bing queries	0.302 vs 0.173 (1.8x)	301 ms/query^[3]
`biobert-base-uncased-pubmed` (coming soon)	medicine	-	-
`bert-tiny-uncased` (coming soon)	-	-	-
`albert-tiny-uncased-msmarco` (coming soon)	-	-	~50ms/query ^[3]

_{[4] MRR compared to BM25, the default for Elasticsearch. Reranking top 50.}

Using pre-trained language understanding models, you can boost search relevance metrics by nearly 2x compared to just text search, with little to no extra configuration. While assessing performance, there is often a tradeoff between model accuracy and speed, so we benchmark both of these factors above. This leaderboard is a work in progress, and we intend on releasing more cutting edge models!

Install NBoost

There are two ways to get NBoost, either as a Docker image or as a PyPi package. For cloud users, we highly recommend using NBoost via Docker.

🚸 Depending on your model, you should install the respective Tensorflow or Pytorch dependencies. We package them below.

For installing NBoost, follow the table below.

Dependency	🐳 Docker	📦 Pypi
-	`koursaros/nboost:latest-alpine`	`pip install nboost`
Pytorch	`koursaros/nboost:latest-torch`	`pip install nboost[torch]`
Tensorflow	`koursaros/nboost:latest-tf`	`pip install nboost[tf]`
All	`koursaros/nboost:latest-all`	`pip install nboost[all]`

Any way you install it, if you end up reading the following message after $ nboost --help or $ docker run koursaros/nboost --help, then you are ready to go!

success installation of NBoost

Getting Started

The Proxy
Setting up a Neural Proxy for Elasticsearch in 3 minutes
Elastic made easy
Deploying a distributed proxy via Docker Swarm/Kubernetes
‍Take-home messages

📡The Proxy

The Proxy is the core of NBoost. The proxy is essentially a wrapper to enable serving the model. It is able to understand incoming messages from specific search apis (i.e. Elasticsearch). When the proxy receives a message, it increases the amount of results the client is asking for so that the model can rerank a larger set and return the (hopefully) better results.

For instance, if a client asks for 10 results to do with the query "brown dogs" from Elasticsearch, then the proxy may increase the results request to 100 and filter down the best ten results for the client.

Setting up a Neural Proxy for Elasticsearch in 3 minutes

In this example we will set up a proxy to sit in between the client and Elasticsearch and boost the results!

Installing NBoost with tensorflow

Make sure you have Tensorflow 1.14-1.15 (with CUDA to run on GPU) to support the modelling functionality.

pip3 install nboost[tf]

Setting up an Elasticsearch Server

🔔 If you already have an Elasticsearch server, you can move on to the next step!

If you don't have Elasticsearch, not to worry! You can set up a local Elasticsearch cluster by using docker. First, get the ES image by running:

docker pull elasticsearch:7.4.2

Once you have the image, you can run an Elasticsearch server via:

docker run -d -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" elasticsearch:7.4.2

Deploying the proxy

Now we're ready to deploy our Neural Proxy! It is very simple to do this, simply run:

nboost --uhost localhost --uport 9200 --field passage

📢 The --uhost and --uport should be the same as the Elasticsearch server above! Uhost and uport are short for upstream-host and upstream-port (referring to the upstream server).

If you get this message: Listening: <host>:<port>, then we're good to go!

Indexing some data

The proxy is set up so that there is no need to ever talk to the server directly from here on out. You can send index requests, stats requests, but only the search requests will be altered. For demonstration purposes, we will be indexing a set of passages about open-source software through NBoost. You can add the index to your Elasticsearch server by running:

nboost-tutorial Travel --host localhost --port 8000

Now let's test it out! Go to your web browser and type in:

curl "http://localhost:8000/travel/_search?pretty&q=passage:vegas&size=2"

If the Elasticsearch result has the _nboost tag in it, congratulations it's working!

What just happened? You asked for two results from Elasticsearch having to do with "vegas". The proxy intercepted this request, asked the Elasticsearch for 10 results, and the model picked the best two. Magic! 🔮 (statistics)

success installation of NBoost

Elastic made easy

To increase the number of parallel proxies, simply increase --workers. For a more robust deployment approach, you can distribute the proxy via Docker Swarm or Kubernetes.

Deploying a proxy via Docker Swarm/Kubernetes

🚧 Swarm yaml/Helm chart under construction...

Documentation

The official NBoost documentation is hosted on nboost.readthedocs.io. It is automatically built, updated and archived on every new release.

Tutorials

🚧 Under construction.

Contributing

Contributions are greatly appreciated! You can make corrections or updates and commit them to NBoost. Here are the steps:

Create a new branch, say fix-nboost-typo-1
Fix/improve the codebase
Commit the changes. Note the commit message must follow the naming style, say Fix/model-bert: improve the readability and move sections
Make a pull request. Note the pull request must follow the naming style. It can simply be one of your commit messages, just copy paste it, e.g. Fix/model-bert: improve the readability and move sections
Submit your pull request and wait for all checks passed (usually 10 minutes)
- Coding style
- Commit and PR styles check
- All unit tests
Request reviews from one of the developers from our core team.
Merge!

More details can be found in the contributor guidelines.

Citing NBoost

If you use NBoost in an academic paper, we would love to be cited. Here are the two ways of citing NBoost:

\footnote{https://github.com/koursaros-ai/nboost}

@misc{koursaros2019NBoost,
  title={NBoost: Neural Boosting Search Results},
  author={Thienes, Cole and Pertschuk, Jack},
  howpublished={\url{https://github.com/koursaros-ai/nboost}},
  year={2019}
}

Footnotes

^[1] https://github.com/nyu-dl/dl4marco-bert
^[2] https://github.com/huggingface/transformers
^[3] ms for reranking each hit. On nvidia T4 GPU.

License

If you have downloaded a copy of the NBoost binary or source code, please note that the NBoost binary and source code are both licensed under the Apache License, Version 2.0.

_{Koursaros AI is excited to bring this open source software to the community.

Copyright (C) 2019. All rights reserved.}

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.3.9

Jun 12, 2020

0.3.8

May 29, 2020

0.3.7

May 28, 2020

0.3.5

Mar 31, 2020

0.3.4

Mar 10, 2020

0.3.3

Feb 28, 2020

0.3.2

Feb 28, 2020

0.3.1

Feb 20, 2020

0.3.0

Jan 24, 2020

0.2.2

Jan 22, 2020

0.2.1

Jan 17, 2020

0.2.0

Jan 12, 2020

0.1.1

Dec 18, 2019

0.1.0

Dec 18, 2019

0.0.9

Dec 16, 2019

0.0.8

Dec 15, 2019

0.0.7

Dec 4, 2019

0.0.6

Dec 3, 2019

0.0.5

Dec 1, 2019

0.0.4

Nov 25, 2019

This version

0.0.3

Nov 25, 2019

0.0.2

Nov 23, 2019

0.0.1

Nov 23, 2019

0.0.1rc3 pre-release

Nov 21, 2019

0.0.1rc2 pre-release

Nov 21, 2019

0.0.1rc1 pre-release

Nov 11, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nboost-0.0.3.tar.gz (832.5 kB view details)

Uploaded Nov 25, 2019 Source

File details

Details for the file nboost-0.0.3.tar.gz.

File metadata

Download URL: nboost-0.0.3.tar.gz
Upload date: Nov 25, 2019
Size: 832.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.15.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.4.0 requests-toolbelt/0.9.1 tqdm/4.38.0 CPython/3.5.3

File hashes

Hashes for nboost-0.0.3.tar.gz
Algorithm	Hash digest
SHA256	`f5e3f7b51b072eb2b07d285cbe5b4c1f3ed51f7d6e24a7e44e9ef28d40b56738`
MD5	`008a4c796c464e285df7e047f4ed408e`
BLAKE2b-256	`af08c19c6c90db96bd04f6a101641bb1e37c9730093c07c27d4f8209232a1294`