Skip to main content

No project description provided

Project description

Mars Similarity Tools

A small tools library for getting vector similarity measurement working in no time.

Example

Here's a basic similarity search and vectorization example. We instantiate a VectorSimilarityService which needs an Augmentor of some kind. The Augmentor should be responsible for taking objects inherited by SimilarityObject class and return vectorized grouped objects (as a VectorGroup object). Before a SimilarityObject can be transformed into a VectorGroup it will pass the GroupParser first. That one will rearrange the properties of the objects into groups, which is given in the parser. We need to do this since multiple properties of an object should in the end be represented by one vector together.

# First things first. Create a similarity model we want to measure similarity between
# And yes! You could create a seperate Color class that holds name and description for Color.
@dataclass(frozen=True) # "frozen" is Important!
class Bicycle(SimilarityObject):

    id: str
    color_name: str
    color_description: str
    wheel_size: int
    model: str

# Then create the parser, vectorizer, augmentor and service.
service = VectorSimilarityService(
    augmentor=ItemVectorizer(
        vectorizer=Vectorizer(),
        parser=GroupParser(
            name=Bicycle.__class__.__name__, 
            children=[
                GroupParser(
                    name="color",
                    children=[
                        PropertyParser(
                            name="color name",
                            dtype=str,
                            path=["color_name"]
                        ),
                        PropertyParser(
                            name="color description",
                            dtype=str,
                            path=["color_description"]
                        ),
                    ],
                ),
                PropertyParser(
                    name="wheel_size",
                    dtype=int,
                    path=["wheel_size"]
                ),
                PropertyParser(
                    name="model",
                    dtype=str,
                    path=["model"]
                ),
            ]
        ),
    )
)

# Now we can create a namespace and add objects to that namespace.
objects = [
    Bicycle(
        id="1",
        color_name="red",
        color_description="A red bicycle",
        wheel_size=26,
        model="mountain"
    ),
    Bicycle(
        id="2",
        color_name="blue",
        color_description="A blue bicycle",
        wheel_size=26,
        model="mountain"
    ),
    Bicycle(
        id="3",
        color_name="green",
        color_description="A green bicycle",
        wheel_size=28,
        model="racer"
    ),
]

# Now we can perform a similarity search.
similarity_result = service.similarity_search(
    objects, 
    Bicycle(
        id="4",
        color_name="yellow",
        color_description="A yellow bicycle",
        wheel_size=28,
        model="racer"
    ), 
)

# Sort by similarity score
sorted_similarity_result = sorted(
    similarity_result,
    key=lambda x: x.score
)

assert len(sorted_similarity_result) == 3
assert type(sorted_similarity_result[0].obj) == Bicycle
assert type(sorted_similarity_result[1].obj) == Bicycle

# We could also do a similarity search including some bias to the search.
# For instance, we might want to find a similar bicycle but we want to bias the search
# towards the color.
biased_similarity_result = service.similarity_search(
    objects, 
    Bicycle(
        id="4",
        color_name="yellow",
        color_description="A yellow bicycle",
        wheel_size=28,
        model="racer"
    ), 
    bias={"color": 1.2, "wheel_size": 0.2, "model": 0.2}
)

# Sort by similarity score
sorted_biased_similarity_result = sorted(
    biased_similarity_result,
    key=lambda x: x.score
)

assert len(sorted_biased_similarity_result) == 3
assert type(sorted_biased_similarity_result[0].obj) == Bicycle
assert type(sorted_biased_similarity_result[1].obj) == Bicycle

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mars_similarity_tools-0.3.2.tar.gz (5.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mars_similarity_tools-0.3.2-py3-none-any.whl (6.9 kB view details)

Uploaded Python 3

File details

Details for the file mars_similarity_tools-0.3.2.tar.gz.

File metadata

  • Download URL: mars_similarity_tools-0.3.2.tar.gz
  • Upload date:
  • Size: 5.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.9.18 Darwin/23.1.0

File hashes

Hashes for mars_similarity_tools-0.3.2.tar.gz
Algorithm Hash digest
SHA256 9e555e221d9c21dd22bc0ff5931592567e4c2c89547f28a48111d45136d1979f
MD5 ecd99fc5b3dfa06e86925b3043256fcb
BLAKE2b-256 4549b0ca600cd78f02703411632827c0b64e5bb01828888c9d71b8a0608ed371

See more details on using hashes here.

File details

Details for the file mars_similarity_tools-0.3.2-py3-none-any.whl.

File metadata

File hashes

Hashes for mars_similarity_tools-0.3.2-py3-none-any.whl
Algorithm Hash digest
SHA256 610a31217e5d6c85d93f5bc21a6f43341209f8e582e3b2028fa1da226a68bcda
MD5 778c2db8e05726bedc5f2b930a8e31f9
BLAKE2b-256 6e80a7ef014b9bda4ad8dd46acbde50e56a241e5ba62e61c85d2457c166f8c8d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page