Skip to main content

Semantic search of text and semanic matching of tables using FAISS

Project description

semantic-matcher

Unit Tests

This library is built to handle anything related to semantic matching.

In its current state, it has two main uses:

  • Find the closest matches of a user query to a text corpus, using sentence transformer encodings and FAISS for optimization.
  • Measure the semantic similarity between two tables and determine common columns. Useful for detecting duplicates and determining which columns to join on.

1. Installation

This package can be installed after cloning by running "make install". Alternatively it can be installed using pip:

pip install semanticmatcher

2. Usage:

This package can be used to simply search for a query within a text corpus.

from semanticmatcher.search import semantic_search

query = ["boy jumping"]
corpus = ["There was a young man running around town", "The mayor is looking for a new house", "I had pasta for dinner"]

scores, indices = semantic_search(query, corpus)

Additionally, the package can also be used to determine the similarity between two tables. It returns a matrix that compares each column.

df1 = pd.DataFrame(
        {
            "col1": ["hello", "world"], 
            "col2": ["how", "are"], 
            "col3": ["you", "doing"]
         }
    )
    
df2 = pd.DataFrame(
        {
            "col1": ["hola", "mundo"],
            "col2": ["como", "estas"],
            "col3": ["tu", "haciendo"],
        }
    )

similarity_matrix_df = similarity_matrix(df1, df2)

By default, the following sentence transformer model is used to encode the text: "all-MiniLM-L6-v2".

3. Next Steps

Add functionality to allow users to join two tables on a column, depending on the similarity match between the columns.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

semanticmatcher-0.0.2.tar.gz (5.1 kB view details)

Uploaded Source

Built Distribution

semanticmatcher-0.0.2-py3-none-any.whl (4.9 kB view details)

Uploaded Python 3

File details

Details for the file semanticmatcher-0.0.2.tar.gz.

File metadata

  • Download URL: semanticmatcher-0.0.2.tar.gz
  • Upload date:
  • Size: 5.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.16

File hashes

Hashes for semanticmatcher-0.0.2.tar.gz
Algorithm Hash digest
SHA256 fcb1235d168347d12b0cd45799a7e26f245c994a9aa9486ff0c7f2961a536ff9
MD5 c06337f8b9bd454679d12b10f1e24a83
BLAKE2b-256 d7ee1588534b53e02a983285d4d704411aa5ecdea9e89f7cf4e18af9ac4a3b97

See more details on using hashes here.

File details

Details for the file semanticmatcher-0.0.2-py3-none-any.whl.

File metadata

File hashes

Hashes for semanticmatcher-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 30e9f157e2c19aed0f88592c4458e6ef31e1a6fab2e5e38547d77e8e653c9fcd
MD5 bda3ba407ac9913c9d00281ec3b4b47c
BLAKE2b-256 9890cf6ae2beba44a0793f55ce1465ccbfe9b4acd47a243d2bc7e12bf987f28f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page