A package for similarity search for any type of data. Use your own feature extractor and data loader and use this package to perform similarity search.
Project description
DeepSearchLite
DeepSearchLite is a lightweight and versatile package for finding similar items in a large collection of data, initially inspired by DeepImageSearch. It supports a variety of data types, such as images, text, or any other type of data that can be represented by feature vectors. All you need is a suitable encoder to convert the data into a feature vector, and DeepSearchLite will handle the rest, enabling you to perform similarity searches efficiently and effectively.
Features
- Supports various data types (images, text, etc.) as long as they can be represented by feature vectors
- Easy integration with custom feature encoders
- Fast similarity search using FAISS indexing
- Flexible and extensible with support for custom feature extraction, dimensionality reduction functions, and item loading functions
- Add new items to the index without the need for re-indexing the entire dataset
- Retrieve item metadata and perform search by providing either the item itself or a file path
Installation
Install DeepSearchLite using pip:
pip install DeepSearchLite
Quickstart
To use DeepSearchLite, you need to provide a custom feature encoder that can convert your data into feature vectors. For example, let's say you have a collection of images, and you want to find similar images. You can use a pre-trained deep learning model as a feature encoder. Please refer to the Example folder for a complete example of setting up DeepSearchLite for image similarity search. For other types of data, you need to provide a suitable encoder that can convert the data into feature vectors and a compatible item loader function if necessary.
Customization
DeepSearchLite offers flexibility by allowing you to customize feature extraction, dimensionality reduction, and item loading functions to suit your specific use case. To incorporate your custom functions, provide them as arguments when initializing the SearchSetup
instance:
search_setup = SearchSetup(
item_list=item_list,
feature_extractor=custom_feature_extractor,
dim_reduction=custom_dim_reduction, # Set to None if not required
metadata_dir="metadata_dir",
feature_extractor_name="custom_extractor_name",
mode="search", # Choose between "index" or "search"
item_loader=custom_item_loader
)
In the example above:
custom_feature_extractor
is a function that extracts features from the given items according to your requirements.custom_dim_reduction
is an optional function that reduces the dimensionality of the feature vectors. Set it toNone
if you don't need dimensionality reduction.custom_item_loader
is a function that loads items from their file paths, ensuring compatibility with the feature extractor.
By providing your custom functions, you can tailor DeepSearchLite to work with various data types and feature extraction techniques, enabling a more personalized and efficient similarity search experience.
Usage
You can use the provided methods in the SearchSetup
class to index and search for similar items, add new items to the index, retrieve item metadata, and more. Check the class documentation for a detailed list of available methods and their usage.
MODE
There are two main modes available in the SearchSetup class: 'index' and 'search'. These modes are specified when initializing the SearchSetup
class.
-
Index Mode
In the index mode, the metadata is initialized, which contains the item paths along with the feature vectors of these items. This mode should be used when you are initializing your
SearchSetup
for the first time. During the indexing process, a directory is created to store this metadata, which will be used later for similarity search.When using the index mode, the feature extractor, dimensionality reducer (if any), and item loader are applied to extract features from the items in the dataset. The resulting feature vectors are then saved in the metadata along with the item paths, and an index is created to enable fast similarity search.
-
Search Mode
In the search mode, you can search for similar items by providing the item itself or the item path as input. When initializing the
SearchSetup
class in search mode, make sure to use the same feature extractor, dimensionality reducer (if any), and item loader as when the index was created. This is essential to ensure that the features extracted during the search are compatible with the indexed features.During the search process, the specified item is loaded, and its features are extracted using the provided feature extractor and dimensionality reducer (if any). The search is then performed using the pre-built index, and a list of similar items from the indexed dataset is returned.
In summary, the index mode is used to build the metadata and index required for similarity search, while the search mode enables you to search for similar items in the indexed dataset. Make sure to use the same feature extractor, dimensionality reducer, and item loader in both modes to ensure compatibility and accurate search results.
Contributing
Contributions to DeepSearchLite are welcome! If you have an idea, bug report, or feature request, please open an issue on the GitHub repository. If you'd like to contribute code, please fork the repository and submit a pull request.
License
DeepSearchLite is released under the MIT License.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file DeepSearchLite-0.2.1.tar.gz
.
File metadata
- Download URL: DeepSearchLite-0.2.1.tar.gz
- Upload date:
- Size: 8.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.16
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 34992ec2f4f780b20c2ccabc4768372b8016f4be27acfdeb32e69875f8db5c47 |
|
MD5 | 5e9bc75823e7c705a36dc2aa3c3a5186 |
|
BLAKE2b-256 | 965cb034a7338e8e07bed80e43ce8877690a6e80df6b8244bebb8f9f2e3c784f |
File details
Details for the file DeepSearchLite-0.2.1-py3-none-any.whl
.
File metadata
- Download URL: DeepSearchLite-0.2.1-py3-none-any.whl
- Upload date:
- Size: 8.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.16
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ef488720ada2ed881cdd9eed7704e87d7e03d6cf9d70d4c0cbf0a552c225c095 |
|
MD5 | 1b76907d4f8941e416265e4d3ba68ff1 |
|
BLAKE2b-256 | 8be2db6670420a9915c03e6e338083298c4f140b9bc73108ad8d56b8ad5145e5 |