No project description provided
Project description
PDF and Web Content Query Package
This package provides functionality to process PDF files and web pages, allowing users to query their content using natural language processing techniques.
Table of Contents
Features
- Process PDF files and answer queries about their content
- Crawl web pages and answer queries about their content
- Utilizes advanced embedding techniques for accurate content matching
Installation
To install this package, run:
pip install semanticbot
Replace semanticbot
with the actual name of your package.
Usage
Processing a PDF
To process a PDF file and query its content:
from your_package_name import process_pdf
pdf_path = "path/to/your/file.pdf"
query = "What is the main topic of this document?"
results = process_pdf(pdf_path, query)
for chunk, similarity in results:
print(f"Similarity: {similarity}")
print(f"Text chunk: {chunk}
")
Crawling and Querying a Web Page
To crawl a web page and query its content:
from your_package_name import crawl_and_query
url = "https://example.com"
query = "What are the key features of the product?"
results = crawl_and_query(url, query)
for chunk, similarity in results:
print(f"Similarity: {similarity}")
print(f"Text chunk: {chunk}
")
How It Works
- For PDFs: The package extracts text content from the file.
- For Web Pages: It crawls the specified URL and extracts the text content.
- The extracted text is split into manageable chunks.
- The package uses HuggingFace's BGE embeddings to convert text chunks and the query into vector representations.
- Cosine similarity is used to find the most relevant text chunks for the given query.
- The top 5 most relevant chunks are returned along with their similarity scores.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
semanticbot-0.4.5.tar.gz
(4.0 kB
view details)
Built Distribution
File details
Details for the file semanticbot-0.4.5.tar.gz
.
File metadata
- Download URL: semanticbot-0.4.5.tar.gz
- Upload date:
- Size: 4.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 60f5a850d1ad2bdfc3042d01d5547ab7a7422444813074bb9f206274e2a25f80 |
|
MD5 | 720ad91f601492b3051fef9ec9581f5d |
|
BLAKE2b-256 | a47aa52ee844a9789e29efab67fbe2fe1a6ff576f19b2f410ecffc0ca361e56b |
File details
Details for the file semanticbot-0.4.5-py3-none-any.whl
.
File metadata
- Download URL: semanticbot-0.4.5-py3-none-any.whl
- Upload date:
- Size: 5.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5665ac593971914eb4df74631ea1395c0e57b6f17aab92f4fd0def10b5515978 |
|
MD5 | b016aadd19aacb7ce47606c4a7134436 |
|
BLAKE2b-256 | 56decfd3e1ece8205ba2d9a489aac804e2f1782e81b842689f5aad47c9438b02 |