Search text on websites.
Project description
SearchURL
Installation
Install SearchURL with pip
pip install SearchURL
Documentation
1. Getting all the text from a webpage by not passing in keywords:
from SearchURL.main import SearchURL
search = SearchURL(cache=True)
data = search.searchUrl(
url="https://en.wikipedia.org/wiki/Web_scraping"
)
print(data)
output: {'success': True, 'data': 'Web scraping - Wikipedia ...'}
2. Searching with keywords:
from SearchURL.main import SearchURL
search = SearchURL(cache=True)
data = search.searchUrl(
url="https://en.wikipedia.org/wiki/Web_scraping",
keywords=['legal'])
print(data)
output: {'success': True, 'data': 'Legal issues Toggle Legal issues subsection Legal issues [ edit ] The legality of web scraping varies across the world ...'}
3. Fuzzy Searching:
from SearchURL.main import SearchURL
search = SearchURL(cache=True)
data = search.searchUrlFuzz(
url="https://en.wikipedia.org/wiki/Web_scraping",
keywords=['legal'])
print(data)
output: {'success': True, 'data': 'Legal issues [ edit ] | In the United States, website owners can use three major legal claims to prevent undesired web scraping: (1) copyright ...'}
4. Semantic Search: Yes, this package supports Semantic Search!
from SearchURL.main import SearchURL
search = SearchURL(createVector=True) # creates a in-memory vector database using chromadb
data = search.createEmbededData("https://en.wikipedia.org/wiki/Web_scraping") # loads and embeds all the data from the webpage.
if data.get('success'): # data = {'success': True, 'db': db}
db = data.get('db')
results = db.query(keywords=['benefits', 'what benifits can we get from web scraping'], limit=10)
print(results)
else:
print(data.get('detail')) # data = {'success': False, 'detail': 'ERROR'}
Errors
If this package runs into some error while fetching and searching, it will return an object like this: {'success': False, 'detail': 'The error that occurred'}
The URL used in this readme is a link to an article on wikipedia.org on the topic of Web_scraping.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for SearchURL-1.1.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1f0fc486efe5e200c2877e392b0cc26f898133646507aee2811d43fcb7eb6b61 |
|
MD5 | 6a1a852058de1718673415c89d97761b |
|
BLAKE2b-256 | 475245ff386941590408f6dede8c2399fefe3d360e10e4afa21b8e47a67b44bf |