No project description provided
Project description
Weibo Scraper
Simple weibo tweet scraper . Crawl weibo tweets without authorization. There are many limitations in official API . In general , we can inspect mobile site which has it's own API by Chrome.
Why
-
Crawl weibo data in order to research big data .
-
Back up data for weibo's shameful blockade .
Installation
pip
$ pip install weibo-scraper
Or Upgrade it.
$ pip install --upgrade weibo-scraper
pipenv
$ pipenv install weibo-scraper
Or Upgrade it.
$ pipenv update --outdated # show packages which are outdated
$ pipenv update weibo-scraper # just update weibo-scraper
Only Python 3.6+ is supported
Usage
CLI
$ weibo-scraper -h
usage: weibo-scraper [-h] [-u U] [-p P] [-o O] [-f FORMAT]
[-efn EXPORTED_FILE_NAME] [-s] [-d] [--more] [-v]
weibo-scraper-1.0.7-beta 🚀
optional arguments:
-h, --help show this help message and exit
-u U username [nickname] which want to exported
-p P pages which exported [ default 1 page ]
-o O output file path which expected [ default 'current
dir' ]
-f FORMAT, --format FORMAT
format which expected [ default 'txt' ]
-efn EXPORTED_FILE_NAME, --exported_file_name EXPORTED_FILE_NAME
file name which expected
-s, --simplify simplify available info
-d, --debug open debug mode
--more more
-v, --version weibo scraper version
API
- Firstly , you can get weibo profile by
name
oruid
.
>> > from weibo_scraper import get_weibo_profile
>> > weibo_profile = get_weibo_profile(name='来去之间', )
>> > ....
You will get weibo profile response which is type of weibo_base.UserMeta
, and this response include fields as below
field | chinese | type | sample | ext |
---|---|---|---|---|
id | 用户id | str | ||
screen_name | 微博昵称 | Option[str] | ||
avatar_hd | 高清头像 | Option[str] | 'https://ww2.sinaimg.cn/orj480/4242e8adjw8elz58g3kyvj20c80c8myg.jpg' | |
cover_image_phone | 手机版封面 | Option[str] | 'https://tva1.sinaimg.cn/crop.0.0.640.640.640/549d0121tw1egm1kjly3jj20hs0hsq4f.jpg' | |
description | 描述 | Option[str] | ||
follow_count | 关注数 | Option[int] | 3568 | |
follower_count | 被关注数 | Option[int] | 794803 | |
gender | 性别 | Option[str] | 'm'/'f' | |
raw_user_response | 原始返回 | Option[dict] |
- Secondly , via
tweet_container_id
to get weibo tweets is a rare way to use but it also works well .
>> > from weibo_scraper import get_weibo_tweets
>> > for tweet in get_weibo_tweets(tweet_container_id='1076033637346297', pages=1):
>> > print(tweet)
>> > ....
- Of Course , you can also get raw weibo tweets by nick name which is exist . And the param of
pages
is optional .
>> > from weibo_scraper import get_weibo_tweets_by_name
>> > for tweet in get_weibo_tweets_by_name(name='嘻红豆', pages=1):
>> > print(tweet)
>> > ....
- If you want to get all tweets , you can set the param of
pages
asNone
>> > from weibo_scraper import get_weibo_tweets_by_name
>> > for tweet in get_weibo_tweets_by_name(name='嘻红豆', pages=None):
>> > print(tweet)
>> > ....
- You can also get formatted tweets via api of
weibo_scrapy.get_formatted_weibo_tweets_by_name
,
>> > from weibo_scraper import get_formatted_weibo_tweets_by_name
>> > result_iterator = get_formatted_weibo_tweets_by_name(name='嘻红豆', pages=None)
>> > for user_meta in result_iterator:
>> > if user_meta is not None:
>> > for tweetMeta in user_meta.cards_node:
>> > print(tweetMeta.mblog.text)
>> > ....
- Get realtime hot words
hotwords = weibo_scraper.get_realtime_hotwords()
for hw in hotwords:
print(str(hw))
- Get realtime hot words in every interval
wt = Timer(name="realtime_hotword_timer", fn=weibo_scraper.get_realtime_hotwords, interval=1)
wt.set_ignore_ex(True)
wt.scheduler()
LICENSE
MIT
This Project Powered By Jetbrains OpenSource License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for weibo-scraper-1.0.7rc1.dev3.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8680bd0ad1fbe7866575d31d92013b939866b46bf44c9f25646508480fd1a453 |
|
MD5 | b09a79528ce8bfe4ad078eb45c03b5db |
|
BLAKE2b-256 | 1dd3e6afdab29e87a2a6ae3baf2fdae4d55022567872456a97dfb5752dc26a5d |
Close
Hashes for weibo_scraper-1.0.7rc1.dev3-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9088c1f862637d33e1b369b8eb4bf94bfe8475bba6983cca88e17e7208c93f35 |
|
MD5 | d25e7013e8af23a458e3cf21caff178b |
|
BLAKE2b-256 | 302f802514634835d565613aae9421e1787176eed79d6829be137712c9d68f8c |