Skip to main content

No project description provided

Project description

Weibo Scraper

PyPI PyPI - Python Version Build Status codecov


Simple weibo tweet scraper . Crawl weibo tweets without authorization. There are many limitations in official API . In general , we can inspect mobile site which has it's own API by Chrome.


Why

  1. Crawl weibo data in order to research big data .

  2. Back up data for weibo's shameful blockade .


Installation

pip

$ pip install weibo-scraper

Or Upgrade it.

$ pip install --upgrade weibo-scraper

pipenv

$ pipenv install weibo-scraper

Or Upgrade it.

$ pipenv update --outdated # show packages which are outdated

$ pipenv update weibo-scraper # just update weibo-scraper

Only Python 3.6+ is supported


Usage

  1. Firstly , you can get weibo profile by name or uid .
>>> from weibo_scraper import get_weibo_profile
>>> weibo_profile = get_weibo_profile(name='来去之间',)
>>> ....

You will get weibo profile response which is type of weibo_base.UserMeta, and this response include fields as below

field chinese type sample ext
id 用户id str
screen_name 微博昵称 Option[str]
avatar_hd 高清头像 Option[str] 'https://ww2.sinaimg.cn/orj480/4242e8adjw8elz58g3kyvj20c80c8myg.jpg'
cover_image_phone 手机版封面 Option[str] 'https://tva1.sinaimg.cn/crop.0.0.640.640.640/549d0121tw1egm1kjly3jj20hs0hsq4f.jpg'
description 描述 Option[str]
follow_count 关注数 Option[int] 3568
follower_count 被关注数 Option[int] 794803
gender 性别 Option[str] 'm'/'f'
raw_user_response 原始返回 Option[dict]
  1. Secondly , via tweet_container_id to get weibo tweets is a rare way to use but it also works well .
>>> from weibo_scraper import  get_weibo_tweets
>>> for tweet in get_weibo_tweets(tweet_container_id='1076033637346297',pages=1):
>>>     print(tweet)
>>> ....
  1. Of Course , you can also get raw weibo tweets by nick name which is exist . And the param of pages is optional .
>>> from weibo_scraper import  get_weibo_tweets_by_name
>>> for tweet in get_weibo_tweets_by_name(name='嘻红豆', pages=1):
>>>     print(tweet)
>>> ....
  1. If you want to get all tweets , you can set the param of pages as None
>>> from weibo_scraper import  get_weibo_tweets_by_name
>>> for tweet in get_weibo_tweets_by_name(name='嘻红豆', pages=None):
>>>     print(tweet)
>>> ....
  1. There is a giant update since 1.0.5 🍰!

You can also get formatted tweets via api of weibo_scrapy.get_formatted_weibo_tweets_by_name,

>>> from weibo_scraper import  get_formatted_weibo_tweets_by_name
>>> result_iterator = get_formatted_weibo_tweets_by_name(name='嘻红豆', pages=None)
>>> for user_meta in result_iterator:
>>>     for tweetMeta in user_meta.cards_node:
>>>         print(tweetMeta.mblog.text)
>>> ....

img


Weibo Flasgger

Weibo Flasgger is a web api document for weibo scraper , and powered by flasgger .

img

P.S

  1. Inspiration from Twitter-Scraper .

  2. For 'XIHONGDOU' .

  3. Welcome To Fork Me .


LICENSE

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

weibo-scraper-1.0.6.tar.gz (7.8 kB view details)

Uploaded Source

Built Distribution

weibo_scraper-1.0.6-py2.py3-none-any.whl (8.5 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file weibo-scraper-1.0.6.tar.gz.

File metadata

File hashes

Hashes for weibo-scraper-1.0.6.tar.gz
Algorithm Hash digest
SHA256 02ee046aa163cd9c35eb92f1dd03adaccee3902bc17bd6856de9a828d958a6c7
MD5 745738a5581fff1d1e1fe762616e265d
BLAKE2b-256 47c077ea5373f5e404f0234d523c6cda1354f7537bf04d7e2131d1a51dcdfda3

See more details on using hashes here.

File details

Details for the file weibo_scraper-1.0.6-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for weibo_scraper-1.0.6-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 b2cf9a7b0d88e32df97cb83928c72e50b079ba9a72c387976c3a4f4b337e6653
MD5 a638d292600fc5c245615fc3aab97492
BLAKE2b-256 eafcbb8fb49af5ccc52e00cac26bfa270d20920fc4b8575580863e468d0afc3e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page