Skip to main content

No project description provided

Project description

Mars Similarity Tools

A small tools library for getting vector similarity measurement working in no time.

Example

Here's a basic similarity search and vectorization example. We instantiate a VectorSimilarityService which needs an Augmentor of some kind. The Augmentor should be responsible for taking objects inherited by SimilarityObject class and return vectorized grouped objects (as a VectorGroup object). Before a SimilarityObject can be transformed into a VectorGroup it will pass the GroupParser first. That one will rearrange the properties of the objects into groups, which is given in the parser. We need to do this since multiple properties of an object should in the end be represented by one vector together.

# First things first. Create a similarity model we want to measure similarity between
# And yes! You could create a seperate Color class that holds name and description for Color.
@dataclass(frozen=True) # "frozen" is Important!
class Bicycle(SimilarityObject):

    id: str
    color_name: str
    color_description: str
    wheel_size: int
    model: str

# Then create the parser, vectorizer, augmentor and service.
service = VectorSimilarityService(
    augmentor=ItemVectorizer(
        vectorizer=Vectorizer(),
        parser=GroupParser(
            name=Bicycle.__class__.__name__, 
            children=[
                GroupParser(
                    name="color",
                    children=[
                        PropertyParser(
                            name="color name",
                            dtype=str,
                            path=["color_name"]
                        ),
                        PropertyParser(
                            name="color description",
                            dtype=str,
                            path=["color_description"]
                        ),
                    ],
                ),
                PropertyParser(
                    name="wheel_size",
                    dtype=int,
                    path=["wheel_size"]
                ),
                PropertyParser(
                    name="model",
                    dtype=str,
                    path=["model"]
                ),
            ]
        ),
    )
)

# Now we can create a namespace and add objects to that namespace.
objects = [
    Bicycle(
        id="1",
        color_name="red",
        color_description="A red bicycle",
        wheel_size=26,
        model="mountain"
    ),
    Bicycle(
        id="2",
        color_name="blue",
        color_description="A blue bicycle",
        wheel_size=26,
        model="mountain"
    ),
    Bicycle(
        id="3",
        color_name="green",
        color_description="A green bicycle",
        wheel_size=28,
        model="racer"
    ),
]

# Now we can perform a similarity search.
similarity_result = service.similarity_search(
    objects, 
    Bicycle(
        id="4",
        color_name="yellow",
        color_description="A yellow bicycle",
        wheel_size=28,
        model="racer"
    ), 
)

# Sort by similarity score
sorted_similarity_result = sorted(
    similarity_result,
    key=lambda x: x.score
)

assert len(sorted_similarity_result) == 3
assert type(sorted_similarity_result[0].obj) == Bicycle
assert type(sorted_similarity_result[1].obj) == Bicycle

# We could also do a similarity search including some bias to the search.
# For instance, we might want to find a similar bicycle but we want to bias the search
# towards the color.
biased_similarity_result = service.similarity_search(
    objects, 
    Bicycle(
        id="4",
        color_name="yellow",
        color_description="A yellow bicycle",
        wheel_size=28,
        model="racer"
    ), 
    bias={"color": 1.2, "wheel_size": 0.2, "model": 0.2}
)

# Sort by similarity score
sorted_biased_similarity_result = sorted(
    biased_similarity_result,
    key=lambda x: x.score
)

assert len(sorted_biased_similarity_result) == 3
assert type(sorted_biased_similarity_result[0].obj) == Bicycle
assert type(sorted_biased_similarity_result[1].obj) == Bicycle

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mars_similarity_tools-0.3.1.tar.gz (11.3 kB view details)

Uploaded Source

Built Distribution

mars_similarity_tools-0.3.1-py3-none-any.whl (13.3 kB view details)

Uploaded Python 3

File details

Details for the file mars_similarity_tools-0.3.1.tar.gz.

File metadata

  • Download URL: mars_similarity_tools-0.3.1.tar.gz
  • Upload date:
  • Size: 11.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.10

File hashes

Hashes for mars_similarity_tools-0.3.1.tar.gz
Algorithm Hash digest
SHA256 90e036e4f0bc3ff74422b53a6206c875c9d2f217d18814a831ca4ea1c8a3e775
MD5 d6443674f9ceb956c61066e1c2c60b61
BLAKE2b-256 b6bc5a668b7c38f27c561716533d6dc90c40ac4ec508e87fda683fb1c952f035

See more details on using hashes here.

File details

Details for the file mars_similarity_tools-0.3.1-py3-none-any.whl.

File metadata

File hashes

Hashes for mars_similarity_tools-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 45a465f7ce7fb07da3c8d71efc8aa8e37f96edee09251901afccff6dbb5d46b6
MD5 443b85e495ab3c23058eb461524ae7b4
BLAKE2b-256 af51af3e0ae953292f4189a01857f5c2016d450411a04a854ca3de588b20d118

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page