Semantic search of text and semanic matching of tables using FAISS
Project description
semantic-matcher
This library is built to handle anything related to semantic matching.
In its current state, it has two main uses:
- Find the closest matches of a user query to a text corpus, using sentence transformer encodings and FAISS for optimization.
- Measure the semantic similarity between two tables and determine common columns. Useful for detecting duplicates and determining which columns to join on.
1. Installation
This package can be installed after cloning by running "make install". Alternatively it can be installed using pip:
pip install semanticmatcher
2. Usage:
This package can be used to simply search for a query within a text corpus.
from semanticmatcher.search import semantic_search
query = ["boy jumping"]
corpus = ["There was a young man running around town", "The mayor is looking for a new house", "I had pasta for dinner"]
scores, indices = semantic_search(query, corpus)
Additionally, the package can also be used to determine the similarity between two tables. It returns a matrix that compares each column.
df1 = pd.DataFrame(
{
"col1": ["hello", "world"],
"col2": ["how", "are"],
"col3": ["you", "doing"]
}
)
df2 = pd.DataFrame(
{
"col1": ["hola", "mundo"],
"col2": ["como", "estas"],
"col3": ["tu", "haciendo"],
}
)
similarity_matrix_df = similarity_matrix(df1, df2)
By default, the following sentence transformer model is used to encode the text: "all-MiniLM-L6-v2".
3. Next Steps
Add functionality to allow users to join two tables on a column, depending on the similarity match between the columns.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file semanticmatcher-0.0.2.tar.gz
.
File metadata
- Download URL: semanticmatcher-0.0.2.tar.gz
- Upload date:
- Size: 5.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.16
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | fcb1235d168347d12b0cd45799a7e26f245c994a9aa9486ff0c7f2961a536ff9 |
|
MD5 | c06337f8b9bd454679d12b10f1e24a83 |
|
BLAKE2b-256 | d7ee1588534b53e02a983285d4d704411aa5ecdea9e89f7cf4e18af9ac4a3b97 |
File details
Details for the file semanticmatcher-0.0.2-py3-none-any.whl
.
File metadata
- Download URL: semanticmatcher-0.0.2-py3-none-any.whl
- Upload date:
- Size: 4.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.16
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 30e9f157e2c19aed0f88592c4458e6ef31e1a6fab2e5e38547d77e8e653c9fcd |
|
MD5 | bda3ba407ac9913c9d00281ec3b4b47c |
|
BLAKE2b-256 | 9890cf6ae2beba44a0793f55ce1465ccbfe9b4acd47a243d2bc7e12bf987f28f |