SearchURL lets perform Keyword, Fuzzy and Semantic Search through the text on websites using thier URLs.
Project description
SearchURL
github-searchURL
Installation
Install SearchURL with pip
pip install SearchURL
Documentation
1. Getting all the text from a webpage by not passing in keywords:
from SearchURL.main import SearchURL
search = SearchURL(cache=True)
data = search.searchUrl(
url="https://en.wikipedia.org/wiki/Web_scraping"
)
print(data)
output: {'success': True, 'data': 'Web scraping - Wikipedia ...'}
2. Searching with keywords:
from SearchURL.main import SearchURL
search = SearchURL(cache=True)
data = search.searchUrl(
url="https://en.wikipedia.org/wiki/Web_scraping",
keywords=['legal'])
print(data)
output: {'success': True, 'data': 'Legal issues Toggle Legal issues subsection Legal issues [ edit ] The legality of web scraping varies across the world ...'}
3. Fuzzy Searching:
from SearchURL.main import SearchURL
search = SearchURL(cache=True)
data = search.searchUrlFuzz(
url="https://en.wikipedia.org/wiki/Web_scraping",
keywords=['legal'])
print(data)
output: {'success': True, 'data': 'Legal issues [ edit ] | In the United States, website owners can use three major legal claims to prevent undesired web scraping: (1) copyright ...'}
4. Semantic Search: Yes, this package supports Semantic Search!
from SearchURL.main import SearchURL
search = SearchURL(createVector=True) # creates a in-memory vector database using chromadb
data = search.createEmbededData("https://en.wikipedia.org/wiki/Web_scraping") # loads and embeds all the data from the webpage.
if data.get('success'): # data = {'success': True, 'db': db}
db = data.get('db')
results = db.query(keywords=['benefits', 'what benifits can we get from web scraping'], limit=10)
print(results)
else:
print(data.get('detail')) # data = {'success': False, 'detail': 'ERROR'}
Errors
If this package runs into some error while fetching and searching, it will return an object like this: {'success': False, 'detail': 'The error that occurred'}
The URL used in this readme is a link to an article on wikipedia.org on the topic of Web_scraping.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for SearchURL-1.1.4-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c6b8d848459dac364978a40d1af6a8476e6643a049bb9818e79a24e59ffa3f4a |
|
MD5 | 7b4f3843cc5f92bfd717d5eba4cef250 |
|
BLAKE2b-256 | 38185f5e4c07407a241fc6fcef37ca0b6f8c35949055ed094077f75c40b61580 |