This is a package that extracts keyword links based on web keywords
Project description
pip install keyword_extract_links
httpx,fake_useragent must be installed to test before use,but the keyword_extract_Links library is not dependent
from keyword_extract_links import extract_links
import httpx
from fake_useragent import UserAgent
ua = UserAgent()
def get_html(url_address):
headers = {'user-agent': ua.random}
r = httpx.get(url_address, headers=headers)
return r.text
url_list = [
"https://tieba.baidu.com/f/search/res?ie=utf-8&qw=%E6%98%93%E6%AC%A1%E5%85%83",
"https://s.weibo.com/weibo?q=%E6%98%93%E6%AC%A1%E5%85%83&Refer=STopic_history"
]
for url in url_list:
html = get_html(url)
extract = extract_links.Extract(url, html, "易次元")
url_list = extract.get_url_list()
print(url_list)
# ['https://tieba.baidu.com/home/main?un=%E6%98%93%E6%AC%A1%E5%85%83&from=tieba']
# ['https://weibo.com/6509857538?refer_flag=1001030103_', 'https://app.weibo.com/t/feed/2nxWC7', 'https://k.sina.cn/article_5790946818_1592ad6020010112vs.html?from=animation&wm=3049_0032', 'https://c.m.163.com/news/a/EQ35KNJQ00318PFH.html?spss=newsapp']
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
No source distribution files available for this release.See tutorial on generating distribution archives.
Built Distribution
File details
Details for the file keyword_extract_links-0.0.4-py3-none-any.whl
.
File metadata
- Download URL: keyword_extract_links-0.0.4-py3-none-any.whl
- Upload date:
- Size: 3.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/51.0.0 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.6.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f5c457f18632ac5ca0bb359fc23ceebf39272e26d29efca945231e9a20921e96 |
|
MD5 | dd6d403c97882e52d2f2299b1e6f2162 |
|
BLAKE2b-256 | 2fb25159c8e4465b31bf518f8253437b717e908484826900399832d15024e977 |