Skip to main content

Tensor search for humans

Project description

Marqo

Marqo

Tensor search for humans.

PyPI - Downloads from pepy

An open-source tensor search engine that seamlessly integrates with your applications, websites, and workflow.

Marqo cloud ☁️ is in beta. If you're interested, apply here: https://q78175g1wwa.typeform.com/to/d0PEuRPC

What is tensor search?

Tensor search involves transforming documents, images and other data into collections of vectors called "tensors". Representing data as tensors allows us to match queries against documents with human-like understanding of the query and document's content. Tensor search can power a variety of use cases such as:

  • end user search and recommendations
  • multi-modal search (image-to-image, text-to-image, image-to-text)
  • chat bots and question and answer systems
  • text and image classification

Getting started

  1. Marqo requires docker. To install Docker go to the Docker Official website.
  2. Use docker to run Marqo (Mac users with M-series chips will need to go here):
docker rm -f marqo;
docker pull marqoai/marqo:latest;
docker run --name marqo -it --privileged -p 8882:8882 --add-host host.docker.internal:host-gateway marqoai/marqo:latest
  1. Install the Marqo client:
pip install marqo
  1. Start indexing and searching! Let's look at a simple example below:
import marqo

mq = marqo.Client(url='http://localhost:8882')

mq.index("my-first-index").add_documents([
    {
        "Title": "The Travels of Marco Polo",
        "Description": "A 13th-century travelogue describing Polo's travels"
    }, 
    {
        "Title": "Extravehicular Mobility Unit (EMU)",
        "Description": "The EMU is a spacesuit that provides environmental protection, "
                       "mobility, life support, and communications for astronauts",
        "_id": "article_591"
    }]
)

results = mq.index("my-first-index").search(
    q="What is the best outfit to wear on the moon?"
)
  • mq is the client that wraps the marqo API
  • add_documents() takes a list of documents, represented as python dicts, for indexing
  • add_documents() creates an index with default settings, if one does not already exist
  • You can optionally set a document's ID with the special _id field. Otherwise, Marqo will generate one.
  • If the index doesn't exist, Marqo will create it. If it exists then Marqo will add the documents to the index.
  • Running this code multiple times could result in duplicate documents. To reset the index, you can delete it first using mq.index("my-first-index").delete()

Let's have a look at the results:

# let's print out the results:
import pprint
pprint.pprint(results)

{
    'hits': [
        {   
            'Title': 'Extravehicular Mobility Unit (EMU)',
            'Description': 'The EMU is a spacesuit that provides environmental protection, mobility, life support, and' 
                           'communications for astronauts',
            '_highlights': {
                'Description': 'The EMU is a spacesuit that provides environmental protection, '
                               'mobility, life support, and communications for astronauts'
            },
            '_id': 'article_591',
            '_score': 0.61938936
        }, 
        {   
            'Title': 'The Travels of Marco Polo',
            'Description': "A 13th-century travelogue describing Polo's travels",
            '_highlights': {'Title': 'The Travels of Marco Polo'},
            '_id': 'e00d1a8d-894c-41a1-8e3b-d8b2a8fce12a',
            '_score': 0.60237324
        }
    ],
    'limit': 10,
    'processingTimeMs': 49,
    'query': 'What is the best outfit to wear on the moon?'
}
  • Each hit corresponds to a document that matched the search query
  • They are ordered from most to least matching
  • limit is the maximum number of hits to be returned. This can be set as a parameter during search
  • Each hit has a _highlights field. This was the part of the document that matched the query the best

Other basic operations

Get document

Retrieve a document by ID.

result = mq.index("my-first-index").get_document(document_id="article_591")

Note that by adding the document using add_documents again using the same _id will cause a document to be updated.

Get index stats

Get information about an index.

results = mq.index("my-first-index").get_stats()

Lexical search

Perform a keyword search.

result =  mq.index("my-first-index").search('marco polo', search_method=marqo.SearchMethods.LEXICAL)

Search specific fields

Using the default tensor search method

result = mq.index("my-first-index").search('adventure', searchable_attributes=['Title'])

Delete documents

Delete documents.

results = mq.index("my-first-index").delete_documents(ids=["article_591", "article_602"])

Delete index

Delete an index.

results = mq.index("my-first-index").delete()

Multi modal and cross modal search

To power image and text search, Marqo allows users to plug and play with CLIP models from HuggingFace. Note that if you do not configure multi modal search, image urls will be treated as strings. To start indexing and searching with images, first create an index with a CLIP configuration, as below:

settings = {
  "treat_urls_and_pointers_as_images":True,   # allows us to find an image file and index it 
  "model":"ViT-L/14"
}
response = mq.create_index("my-multimodal-index", **settings)

Images can then be added within documents as follows. You can use urls from the internet (for example S3) or from the disk of the machine:

response = mq.index("my-multimodal-index").add_documents([{
    "My Image": "https://upload.wikimedia.org/wikipedia/commons/thumb/f/f2/Portrait_Hippopotamus_in_the_water.jpg/440px-Portrait_Hippopotamus_in_the_water.jpg",
    "Description": "The hippopotamus, also called the common hippopotamus or river hippopotamus, is a large semiaquatic mammal native to sub-Saharan Africa",
    "_id": "hippo-facts"
}])

You can then search using text as usual. Both text and image fields will be searched:

results = mq.index("my-multimodal-index").search('animal')

Setting searchable_attributes to the image field ['My Image'] ensures only images are searched in this index:

results = mq.index("my-multimodal-index").search('animal',  searchable_attributes=['My Image'])

Searching using an image

Searching using an image can be achieved by providing the image link.

results = mq.index("my-multimodal-index").search('https://upload.wikimedia.org/wikipedia/commons/thumb/9/96/Standing_Hippopotamus_MET_DP248993.jpg/440px-Standing_Hippopotamus_MET_DP248993.jpg')

Documentation

The full documentation for Marqo can be found here https://docs.marqo.ai/.

M series Mac users

Marqo does not yet support the docker-in-docker backend configuration for the arm64 architecture. This means that if you have an M series Mac, you will also need to run marqo's backend, marqo-os, locally.

To run Marqo on an M series Mac, follow the next steps.

  1. In one terminal run the following command to start opensearch:
docker rm -f marqo-os; docker run -p 9200:9200 -p 9600:9600 -e "discovery.type=single-node" marqoai/marqo-os:0.0.3-arm
  1. In another terminal run the following command to launch Marqo:
docker rm -f marqo; docker run --name marqo --privileged \
    -p 8882:8882 --add-host host.docker.internal:host-gateway \
    -e "OPENSEARCH_URL=https://localhost:9200" \
    marqoai/marqo:latest

Contributors

Marqo is a community project with the goal of making tensor search accessible to the wider developer community. We are glad that you are interested in helping out! Please read this to get started

Dev set up

  1. Create a virtual env python -m venv ./venv
  2. Activate the virtual environment source ./venv/bin/activate
  3. Install requirements from the requirements file: pip install -r requirements.txt
  4. Run tests by running the tox file. CD into this dir and then run "tox"
  5. If you update dependencies, make sure to delete the .tox dir and rerun

Merge instructions:

  1. Run the full test suite (by using the command tox in this dir).
  2. Create a pull request with an attached github issue.

Support

  • Join our Slack community and chat with other community members about ideas.
  • Marqo community meetings (coming soon!)

Stargazers

Stargazers repo roster for @marqo-ai/marqo

Forkers

Forkers repo roster for @marqo-ai/marqo

Translations

This readme is available in the following translations:

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

marqo-0.5.9.tar.gz (22.1 kB view details)

Uploaded Source

Built Distribution

marqo-0.5.9-py3-none-any.whl (20.9 kB view details)

Uploaded Python 3

File details

Details for the file marqo-0.5.9.tar.gz.

File metadata

  • Download URL: marqo-0.5.9.tar.gz
  • Upload date:
  • Size: 22.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.11.1

File hashes

Hashes for marqo-0.5.9.tar.gz
Algorithm Hash digest
SHA256 f4139dec2130e904b72386e2b00c593734872e9f1dae707e6ab32e890e0d9af4
MD5 96b703bc448c0d16f0f901b9c7d0f035
BLAKE2b-256 a4e7be9988eabb5e41a8dfb5bf9ee6e5ccc7ec2f6b6dae83972667c8de75de21

See more details on using hashes here.

File details

Details for the file marqo-0.5.9-py3-none-any.whl.

File metadata

  • Download URL: marqo-0.5.9-py3-none-any.whl
  • Upload date:
  • Size: 20.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.11.1

File hashes

Hashes for marqo-0.5.9-py3-none-any.whl
Algorithm Hash digest
SHA256 b09a2c576f26b8ae79c10e3639603e095e27099fbafd191aa62aea02fafd4351
MD5 3ab63c39bf2485e568717850a9c03d34
BLAKE2b-256 d190982f14c14fda25c916ca397e16d4cd269aa2f793f9c7bbd23a688b0ff86f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page