Skip to main content

Enhanced Document Semantic Search with Cosine Similarity and Document Scanning

Project description

🚀 Enhanced Document Semantic Search

Overview

The "Enhanced Document Semantic Search" package provides powerful tools for searching and analyzing documents using advanced natural language processing techniques. With this package, you can:

  • Search documents: Quickly find relevant information within your documents by running semantic searches based on natural language queries.
  • Scan and search documents: Easily extract text from images and PDFs, then perform semantic searches on the extracted content.
  • Utilize cutting-edge embeddings: The package leverages the state-of-the-art Sentence Transformer model from Hugging Face to generate high-quality document and query embeddings.
  • Achieve high accuracy: The package uses cosine similarity to rank document relevance, providing accurate and meaningful search results.

Installation

You can install the "Enhanced Document Search" package using pip:

pip install enhanced-document-search

Usage

Here's an example of how to use the package to search a document:

from vectrieve.functions import search_document

# Search a document
results = search_document("path/to/document.pdf", "query")
for doc, score in results:
    print(f"Text: {doc.page_content}")
    print(f"Source: {doc.metadata.get('source', 'Unknown')}")
    print(f"Similarity: {score:.4f}")

And here's an example of how to use the package to scan and search a document:

from vectrieve.functions import scan_and_search
from PIL import Image

# Scan and search a document
image = Image.open("path/to/image.jpg")
results = scan_and_search(image, "query")
for doc, score in results:
    print(f"Text: {doc.page_content}")
    print(f"Source: {doc.metadata.get('source', 'Unknown')}")
    print(f"Similarity: {score:.4f}")

Features

Robust document loading: The package supports a variety of file formats, including PDF, DOCX, TXT, XLS, and XLSX.
Intelligent document chunking: The package automatically splits long documents into manageable chunks, ensuring efficient processing and search.
Highly accurate search: The use of cosine similarity and state-of-the-art embeddings provides accurate and relevant search results.
Seamless document scanning: The package can extract text from images using Tesseract OCR, allowing you to search scanned documents.

Contributing
We welcome contributions to the "Vectrieve" package. If you'd like to report a bug, request a feature, or contribute code, please visit the project's GitHub repository.
License
This project is licensed under the MIT License.

This README file provides a high-level overview of the "Vectrieve" package, including its key features, installation instructions, usage examples, and information on contributing to the project. Feel free to customize this template to fit your specific package and requirements.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vectrieve-0.1.0.tar.gz (3.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

Vectrieve-0.1.0-py3-none-any.whl (4.6 kB view details)

Uploaded Python 3

File details

Details for the file vectrieve-0.1.0.tar.gz.

File metadata

  • Download URL: vectrieve-0.1.0.tar.gz
  • Upload date:
  • Size: 3.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.12

File hashes

Hashes for vectrieve-0.1.0.tar.gz
Algorithm Hash digest
SHA256 84d53db50358490ff74b16b0b2ee038d2807867393fd6b64b4aa7ca7da435232
MD5 d124147ea690d9259b215cd75d874bda
BLAKE2b-256 62fa2a502b94145f765710d43d188c522607b7f41e935d59134a2c9058109355

See more details on using hashes here.

File details

Details for the file Vectrieve-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: Vectrieve-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 4.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.12

File hashes

Hashes for Vectrieve-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6e242737d29a2a97714337fb7cba6bdd3b44eeb9088635b7fa93ae7a9f986c3d
MD5 3d61f7aafd7898af07466fe28cfc3dee
BLAKE2b-256 856d512ee426ed55536046664c311246516c05773f0c2cf3725cdb07be6e056f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page