Enhanced Document Semantic Search with Cosine Similarity and Document Scanning
Project description
🚀 Enhanced Document Semantic Search
Overview
The "Enhanced Document Semantic Search" package provides powerful tools for searching and analyzing documents using advanced natural language processing techniques. With this package, you can:
- Search documents: Quickly find relevant information within your documents by running semantic searches based on natural language queries.
- Scan and search documents: Easily extract text from images and PDFs, then perform semantic searches on the extracted content.
- Utilize cutting-edge embeddings: The package leverages the state-of-the-art Sentence Transformer model from Hugging Face to generate high-quality document and query embeddings.
- Achieve high accuracy: The package uses cosine similarity to rank document relevance, providing accurate and meaningful search results.
Installation
You can install the "Enhanced Document Search" package using pip:
pip install enhanced-document-search
Usage
Here's an example of how to use the package to search a document:
from vectrieve.functions import search_document
# Search a document
results = search_document("path/to/document.pdf", "query")
for doc, score in results:
print(f"Text: {doc.page_content}")
print(f"Source: {doc.metadata.get('source', 'Unknown')}")
print(f"Similarity: {score:.4f}")
And here's an example of how to use the package to scan and search a document:
from vectrieve.functions import scan_and_search
from PIL import Image
# Scan and search a document
image = Image.open("path/to/image.jpg")
results = scan_and_search(image, "query")
for doc, score in results:
print(f"Text: {doc.page_content}")
print(f"Source: {doc.metadata.get('source', 'Unknown')}")
print(f"Similarity: {score:.4f}")
Features
Robust document loading: The package supports a variety of file formats, including PDF, DOCX, TXT, XLS, and XLSX.
Intelligent document chunking: The package automatically splits long documents into manageable chunks, ensuring efficient processing and search.
Highly accurate search: The use of cosine similarity and state-of-the-art embeddings provides accurate and relevant search results.
Seamless document scanning: The package can extract text from images using Tesseract OCR, allowing you to search scanned documents.
Contributing
We welcome contributions to the "Vectrieve" package. If you'd like to report a bug, request a feature, or contribute code, please visit the project's GitHub repository.
License
This project is licensed under the MIT License.
This README file provides a high-level overview of the "Vectrieve" package, including its key features, installation instructions, usage examples, and information on contributing to the project. Feel free to customize this template to fit your specific package and requirements.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file vectrieve-0.1.0.tar.gz.
File metadata
- Download URL: vectrieve-0.1.0.tar.gz
- Upload date:
- Size: 3.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
84d53db50358490ff74b16b0b2ee038d2807867393fd6b64b4aa7ca7da435232
|
|
| MD5 |
d124147ea690d9259b215cd75d874bda
|
|
| BLAKE2b-256 |
62fa2a502b94145f765710d43d188c522607b7f41e935d59134a2c9058109355
|
File details
Details for the file Vectrieve-0.1.0-py3-none-any.whl.
File metadata
- Download URL: Vectrieve-0.1.0-py3-none-any.whl
- Upload date:
- Size: 4.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6e242737d29a2a97714337fb7cba6bdd3b44eeb9088635b7fa93ae7a9f986c3d
|
|
| MD5 |
3d61f7aafd7898af07466fe28cfc3dee
|
|
| BLAKE2b-256 |
856d512ee426ed55536046664c311246516c05773f0c2cf3725cdb07be6e056f
|