No project description provided
Project description
Weibo Scraper
Simple weibo tweet scraper . Crawl weibo tweets without authorization. There are many limitations in official API . In general , we can inspect mobile site which has it's own API by Chrome.
Why
-
Crawl weibo data in order to research big data .
-
Back up data for weibo's shameful blockade .
Installation
pip
$ pip install weibo-scraper
Or Upgrade it.
$ pip install --upgrade weibo-scraper
pipenv
$ pipenv install weibo-scraper
Or Upgrade it.
$ pipenv update --outdated # show packages which are outdated
$ pipenv update weibo-scraper # just update weibo-scraper
Only Python 3.6+ is supported
Usage
CLI
$ weibo-scraper -h
usage: weibo-scraper [-h] [-u U] [-p P] [-o O] [-f FORMAT]
[-efn EXPORTED_FILE_NAME] [-s] [-d] [--more] [-v]
weibo-scraper-1.0.7-beta 🚀
optional arguments:
-h, --help show this help message and exit
-u U username [nickname] which want to exported
-p P pages which exported [ default 1 page ]
-o O output file path which expected [ default 'current
dir' ]
-f FORMAT, --format FORMAT
format which expected [ default 'txt' ]
-efn EXPORTED_FILE_NAME, --exported_file_name EXPORTED_FILE_NAME
file name which expected
-s, --simplify simplify available info
-d, --debug open debug mode
--more more
-v, --version weibo scraper version
API
- Firstly , you can get weibo profile by
name
oruid
.
>> > from weibo_scraper import get_weibo_profile
>> > weibo_profile = get_weibo_profile(name='来去之间', )
>> > ....
You will get weibo profile response which is type of weibo_base.UserMeta
, and this response include fields as below
field | chinese | type | sample | ext |
---|---|---|---|---|
id | 用户id | str | ||
screen_name | 微博昵称 | Option[str] | ||
avatar_hd | 高清头像 | Option[str] | 'https://ww2.sinaimg.cn/orj480/4242e8adjw8elz58g3kyvj20c80c8myg.jpg' | |
cover_image_phone | 手机版封面 | Option[str] | 'https://tva1.sinaimg.cn/crop.0.0.640.640.640/549d0121tw1egm1kjly3jj20hs0hsq4f.jpg' | |
description | 描述 | Option[str] | ||
follow_count | 关注数 | Option[int] | 3568 | |
follower_count | 被关注数 | Option[int] | 794803 | |
gender | 性别 | Option[str] | 'm'/'f' | |
raw_user_response | 原始返回 | Option[dict] |
- Secondly , via
tweet_container_id
to get weibo tweets is a rare way to use but it also works well .
>> > from weibo_scraper import get_weibo_tweets
>> > for tweet in get_weibo_tweets(tweet_container_id='1076033637346297', pages=1):
>> > print(tweet)
>> > ....
- Of Course , you can also get raw weibo tweets by nick name which is exist . And the param of
pages
is optional .
>> > from weibo_scraper import get_weibo_tweets_by_name
>> > for tweet in get_weibo_tweets_by_name(name='嘻红豆', pages=1):
>> > print(tweet)
>> > ....
- If you want to get all tweets , you can set the param of
pages
asNone
>> > from weibo_scraper import get_weibo_tweets_by_name
>> > for tweet in get_weibo_tweets_by_name(name='嘻红豆', pages=None):
>> > print(tweet)
>> > ....
- You can also get formatted tweets via api of
weibo_scrapy.get_formatted_weibo_tweets_by_name
,
>> > from weibo_scraper import get_formatted_weibo_tweets_by_name
>> > result_iterator = get_formatted_weibo_tweets_by_name(name='嘻红豆', pages=None)
>> > for user_meta in result_iterator:
>> > if user_meta is not None:
>> > for tweetMeta in user_meta.cards_node:
>> > print(tweetMeta.mblog.text)
>> > ....
- Get realtime hot words
hotwords = weibo_scraper.get_realtime_hotwords()
for hw in hotwords:
print(str(hw))
- Get realtime hot words in every interval
wt = Timer(name="realtime_hotword_timer", fn=weibo_scraper.get_realtime_hotwords, interval=1)
wt.set_ignore_ex(True)
wt.scheduler()
LICENSE
MIT
This Project Powered By Jetbrains OpenSource License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
No source distribution files available for this release.See tutorial on generating distribution archives.
Built Distribution
Close
Hashes for weibo_scraper-1.0.7rc1.dev2-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bfadd0808f88b5f33d1928aac19d636483aa10579dfecd993895a6a889b69669 |
|
MD5 | e28c9a5a4a006fb63fc12634c27a24eb |
|
BLAKE2b-256 | f7dc2aa095c781f5b430cffad1c3eda0ad827a77477e0cb2af7f97e18227a407 |