Sohu scraper
Project description
Title
Sohu.com Scraper
Descrption
- With Sohu.com Scraper you can scrape search results and extract the contents produced by the search result.
- In Sohu scraper it will scrape the data present of the website and give json data which contains the details of the contents on the website.
- The Sohu.com scraper will contain the information of profile link, feed publish, title etc.
JSON sample data
{
"Scraper_\u9999\u6e2f\u9762\u5411\u5185\u5730\u5f15\u8fdb\u4eba\u624d_\u7b26\u5408\u6761\u4ef6\u53ef\u7533\u8bf7\u9999\u6e2f\u8eab\u4efd": [
{
"blank": "\u65b0\u4eac\u62a5",
"feed_four_title_style_link": "https://www.sohu.com/a/499810521_114988?scm=1004.773955565398458368.0.0.672&spm=smpc.ch13.fd-news.1.1636356674692YG6AwTs",
"feed_publish": "\u4eca\u5929 04:28",
"feed_visited_theme_history_color_hover": "\u4eac\u534e\u7269\u8bed\u4e28\u57281920\u5e74\u4ee3\u7684\u5317\u4eac\uff0c\u4eba\u529b\u8f66\u771f\u53ef\u8c13\u516c\u5171\u5947\u666f",
"profile_link": "http://mp.sohu.com/profile?xpt=c29odXptdDNqdHpnY0Bzb2h1LmNvbQ==&spm=smpc.ch13.fd-news.1.1636356674692YG6AwTs"
}
Run Scraper
from sohu_scraper import *
link="http://history.sohu.com/?spm=smpc.home.history-nav.1.1633101794696TEciRMP"
data=run_sohu_scraper(link)
How it works?
- It takes URL of Sohu page with a search keyword to scrape the data.
- It generates the json data which contains the information of the sohu search result.
Examples
Below are some of the examples of URLs using which you can scrape:
Queries/ Feedback
If you have some queries or feedback please contact us at following
Telegram
Email
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
sohu_scraper-1.0.3.tar.gz
(3.0 kB
view details)
File details
Details for the file sohu_scraper-1.0.3.tar.gz
.
File metadata
- Download URL: sohu_scraper-1.0.3.tar.gz
- Upload date:
- Size: 3.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.5.0 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2491b7bdb2220be94034a9c44fd5027c8d4b396d63da5a644574a2202b65ddaf |
|
MD5 | facc19f72fc20fa2ada336034abb4261 |
|
BLAKE2b-256 | e0278f2b091e4f8df25d3481ae082af35502a2fdf50cb5c0aa7e658cdca0a8b8 |