SearchURL lets perform Keyword, Fuzzy and Semantic Search through the text on websites using thier URLs.
Project description
SearchURL
github-searchURL
Installation
Install SearchURL with pip
pip install SearchURL
Documentation
1. Getting all the text from a webpage by not passing in keywords:
from SearchURL.main import SearchURL
search = SearchURL(cache=True)
data = search.searchUrl(
url="https://en.wikipedia.org/wiki/Web_scraping"
)
print(data)
output: {'success': True, 'data': 'Web scraping - Wikipedia ...'}
2. Searching with keywords:
from SearchURL.main import SearchURL
search = SearchURL(cache=True)
data = search.searchUrl(
url="https://en.wikipedia.org/wiki/Web_scraping",
keywords=['legal'])
print(data)
output: {'success': True, 'data': 'Legal issues Toggle Legal issues subsection Legal issues [ edit ] The legality of web scraping varies across the world ...'}
3. Fuzzy Searching:
from SearchURL.main import SearchURL
search = SearchURL(cache=True)
data = search.searchUrlFuzz(
url="https://en.wikipedia.org/wiki/Web_scraping",
keywords=['legal'])
print(data)
output: {'success': True, 'data': 'Legal issues [ edit ] | In the United States, website owners can use three major legal claims to prevent undesired web scraping: (1) copyright ...'}
4. Semantic Search: Yes, this package supports Semantic Search!
from SearchURL.main import SearchURL
search = SearchURL(createVector=True) # creates a in-memory vector database using chromadb
data = search.createEmbededData("https://en.wikipedia.org/wiki/Web_scraping") # loads and embeds all the data from the webpage.
if data.get('success'): # data = {'success': True, 'db': db}
db = data.get('db')
results = db.query(keywords=['benefits', 'what benifits can we get from web scraping'], limit=10)
print(results)
else:
print(data.get('detail')) # data = {'success': False, 'detail': 'ERROR'}
Errors
If this package runs into some error while fetching and searching, it will return an object like this: {'success': False, 'detail': 'The error that occurred'}
The URL used in this readme is a link to an article on wikipedia.org on the topic of Web_scraping.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file SearchURL-1.1.4.tar.gz
.
File metadata
- Download URL: SearchURL-1.1.4.tar.gz
- Upload date:
- Size: 5.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3e909e16dd75a734e00b6e358a2a663608c24a3ae288e289c55d18a065e48b4f |
|
MD5 | af416fb8d90552fa7f6546b49d128971 |
|
BLAKE2b-256 | 53f594f0b2e9557d3b1a14242d4fbc39a3f6155a404c02e062b92cf72dfa863a |
File details
Details for the file SearchURL-1.1.4-py3-none-any.whl
.
File metadata
- Download URL: SearchURL-1.1.4-py3-none-any.whl
- Upload date:
- Size: 5.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c6b8d848459dac364978a40d1af6a8476e6643a049bb9818e79a24e59ffa3f4a |
|
MD5 | 7b4f3843cc5f92bfd717d5eba4cef250 |
|
BLAKE2b-256 | 38185f5e4c07407a241fc6fcef37ca0b6f8c35949055ed094077f75c40b61580 |