Skip to main content

Neural search for humans

Project description

Marqo

Marqo

Neural search for humans.

A deep-learning powered, open-source search engine which seamlessly integrates with your applications, websites, and workflow.

Get started

  1. Marqo requires docker. To install docker go to https://docs.docker.com/get-docker/
  2. Use docker to run Opensearch:
docker run -p 9200:9200 -p 9600:9600 -e "discovery.type=single-node" opensearchproject/opensearch:2.1.0
  1. Install the Marqo client:
pip install marqo
  1. Start indexing and searching! Let's look at a simple example below:
import marqo

mq = marqo.Client(url='https://localhost:9200', main_user="admin", main_password="admin")

mq.index("my-first-index").add_documents([
    {
        "Title": "The Travels of Marco Polo",
        "Description": "A 13th-century travelogue describing Polo's travels"
    }, 
    {
        "Title": "Extravehicular Mobility Unit (EMU)",
        "Description": "The EMU is a spacesuit that provides environmental protection, "
                       "mobility, life support, and communications for astronauts",
        "_id": "article_591"
    }]
)

results = mq.index("my-first-index").search(
    q="What is the best outfit to wear on the moon?"
)
  • mq is the client that wraps themarqo API
  • add_documents() takes a list of documents, represented as python dicts, for indexing
  • add_documents() creates an index with default settings, if one does not already exist
  • You can optionally set a document's ID with the special _id field. Otherwise, marqo will generate one.
  • If the index doesn't exist, Marqo will create it. If it exists then Marqo will add the documents to the index.

This should print output like this:

# let's print out the results:
import pprint
pprint.pprint(results)

{
    'hits': [
        {   
            'Title': 'Extravehicular Mobility Unit (EMU)',
            'Description': 'The EMU is a spacesuit that provides environmental protection, mobility, life support, and' 
                           'communications for astronauts',
            '_highlights': {
                'Description': 'The EMU is a spacesuit that provides environmental protection, '
                               'mobility, life support, and communications for astronauts'
            },
            '_id': 'article_591',
            '_score': 1.2387788
        }, 
        {   
            'Title': 'The Travels of Marco Polo',
            'Description': "A 13th-century travelogue describing Polo's travels",
            '_highlights': {'Title': 'The Travels of Marco Polo'},
            '_id': 'e00d1a8d-894c-41a1-8e3b-d8b2a8fce12a',
            '_score': 1.2047464
        }
    ],
    'limit': 10,
    'processingTimeMs': 49,
    'query': 'What is the best outfit to wear on the moon?'
}
  • Each hit corresponds to a document that matched the search query
  • They are ordered from most to least matching
  • limit is the maximum number of hits to be returned. This can be set as a parameter during search
  • Each hit has a _highlights field. This was the part of the document that matched the query the best

Other basic operations

Get document

Retrieve a document by ID.

result = mq.index("my-first-index").get_document(document_id="e197e580-0393-4f4e-90e9-8cdf4b17e339")

Note that by adding the document using add_documents again using the same _id will cause a document to be updated.

Get Index

Get data about an index.

results = mq.get_index("my-first-index")

Delete Index

Delete an index.

results = mq.index("my-first-index").delete()

Lexical search

Search using a BM25 query.

result = mq.index("my-first-index").search('marco polo', lexical=True)

Search specific fields

Search using a BM25 query.

result = mq.index("my-first-index").search('marco polo', searchable_attributes=['Title'])

Multi modal and cross modal search

To power image and text search, Marqo allows users to plug and play with CLIP models from HuggingFace. Note that if you do not configure multi modal search, image urls will be treated as strings. To start indexing and searching with images, first create an index with a CLIP configuration, as below:

settings = {
  "treat_urls_and_pointers_as_images":True,   # allows us to find an image file and index it 
  "model":"ViT-B/32"
}
response = client.create_index("my-multimodal-index", **settings)

Images can then be added within documents as follows. You can use urls from the internet (for example S3) or from the disk of the machine:

responses = client.index("my-multimodal-index").add_documents([{
    "Image": "/mnt/images/spacesuit.png"
    "Description": "The EMU is a spacesuit that provides environmental protection, "
                   "mobility, life support, and communications for astronauts",
    "_id": "article_591"
}], batch_size=50, use_parallel=True)

You can then search using text as usual. To search specifically against the image attribute

results = client.index("my-multimodal-index").search('spaceman')

Searching using an image

Searching using an image can be achieved by providing the image link. In this example searchable_attributes is used to restrict the search just to the image data. You can use urls from the internet (for example S3) or from the disk of the machine:

results = client.index(index_name).search('https://api.claylings.io/api/image/190', searchable_attributes=['Image'])

Warning

Note that you should not run other applications on the Opensearch cluster as Marqo automatically changes and adapts the settings on the cluster.

Contributors

Marqo is a community project with the goal of making neural search accessible to the wider developer community. We are glad that you are interested in helping out! Please read this to get started

Dev set up

  1. Create a virtual env python -m venv ./venv
  2. Activate the virtual environment source ./venv/bin/activate
  3. Install requirements from the requirements file: pip install -r requirements.txt
  4. Run tests by running the tox file. CD into this dir and then run "tox"
  5. If you update dependencies, make sure to delete the .tox dir and rerun

Merge instructions:

  1. Run the full test suite (by using the command tox in this dir).
  2. Create a pull request with an attached github issue.

The large data test will build Marqo from the main branch and fill indices with data. Go through and test queries against this data. https://github.com/S2Search/NeuralSearchLargeDataTest

Support

  • Join our Slack community and chat with other community members about ideas.
  • Marqo community meetings (coming soon!)

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

marqo-0.1.6.tar.gz (46.6 kB view details)

Uploaded Source

File details

Details for the file marqo-0.1.6.tar.gz.

File metadata

  • Download URL: marqo-0.1.6.tar.gz
  • Upload date:
  • Size: 46.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.15.0 pkginfo/1.8.3 requests/2.27.1 setuptools/41.4.0 requests-toolbelt/0.9.1 tqdm/4.64.0 CPython/2.7.17

File hashes

Hashes for marqo-0.1.6.tar.gz
Algorithm Hash digest
SHA256 c355e7d349cb4f96618af828b24d5e58e12bc1358f7f84b99fc91caa372114c7
MD5 010558d649044f1fbb17a10b7e928669
BLAKE2b-256 ec8f7ba59900c101989461ca21c94dcde3a1169cf1525fa6c9046a6786399ab0

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page