Skip to main content

Scraping Weibos

Project description

准备工作

  1. 安装 postgresql 数据库:
    brew install postgresql
    brew services start postgresql
    
  2. 创建数据库.
    createdb your_database_name
    
  3. 配置信息
    from sinaspider import _config
    # 写入配置信息
    config(
       account_id = 'your accout id' # 你的微博账号
       database_name = 'your_database_name' # 微博和用户信息将保存在该数据库
       write_xmp=True # 是否将微博信息写入图片(可选, 需安装Exiftool)
    )
    # 读取配置信息
    config()
    >>> ConfigObj({'database_name': 'sina_test', 'write_xmp': 'True', 'account_id': '6619193364'})
    
  4. 设置cookie
    import keyring
    cookie = '...your cookie get from www.m.weibo.cn ...' # 需要m.weibo.cn网页的cookie
    keyring.set_password('sinaspider', 'cookie', cookie)
    

Quick Start

将关注者放入配置列表中:

owner = Owner()
for following in owner.following():
    UserConfig(following)

读取配置列表中的用户:

>>> for user_config in UserConfig.yield_config_user():
>>>     pring(user_config)
>>>     break
# 打开所有配置选项
>>> user_config.toggle_all()
Fetch Weibo: True
Fetch Retweet: True
Download Media: True
Fetch following: True
# 获取所有微博
>>> user_config.fetch_weibo()
Fetching Retweet: True
Media Saving: ~/Downloads/sinaspider
Update Config: True

每个用户提供如下配置选项:

  1. weibo_fetch: 是否下载微博
  2. weibo_since: 只获取该日期后的微博(默认为1970-01-01, 即获取所有微博)
  3. retweet_fetch: 是否下载转发微博
  4. media_download: 是否下载图片和视频

微博保存与下载

User

获取用户信息

>>> from sinaspider import User
>>> uid = 6619193364 # 填写 用户id
>>> user = User(uid)

可通过user.weibos获取微博页面, 其具体参数参加get_weibo_pages

# 获取第3页到第10页的所有微博, 并将文件保存在`path/to/download`
weibos=user.weibos(retweet=True, star_page=3, end_page=10, 
                  download_dir='path/to/download')
# 返回下一条微博
next(weibos)

Owner

from sinaspider import Owner
from pathlib import Path
owner = Owner()
#获取自己的资料
owner.info
# 获取自己的关注信息
myfollow = owner.following()
# 获取自己的微博
myweibo = owner.weibos(download_dir='path/to/dir')
# 获取收藏页面
>>> mycollection=owner.collections(download_dir='path/to/dir)
>>> next(mycollection)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sinaspider-0.4.1.tar.gz (15.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sinaspider-0.4.1-py2.py3-none-any.whl (16.9 kB view details)

Uploaded Python 2Python 3

File details

Details for the file sinaspider-0.4.1.tar.gz.

File metadata

  • Download URL: sinaspider-0.4.1.tar.gz
  • Upload date:
  • Size: 15.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-requests/2.27.1

File hashes

Hashes for sinaspider-0.4.1.tar.gz
Algorithm Hash digest
SHA256 35666c72fc7f8e82d16c3c2a4296e385c9f8bcbc960edf0a326483ade30df589
MD5 01fef47906e9de6033824eb6a57674c6
BLAKE2b-256 4ca0d7c33652867ef20a082890d461f49e31badc418fc613ac60078c6cfcccaf

See more details on using hashes here.

File details

Details for the file sinaspider-0.4.1-py2.py3-none-any.whl.

File metadata

  • Download URL: sinaspider-0.4.1-py2.py3-none-any.whl
  • Upload date:
  • Size: 16.9 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-requests/2.27.1

File hashes

Hashes for sinaspider-0.4.1-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 e6d74abe0b79a0f7f6f402f2896bad67bc39d531f3af34bbc662cac5b2113051
MD5 c19eec474dd136053206d1adebf3fb0d
BLAKE2b-256 847fca80b7b7f496a4c9e4ee6534a002c5a11c3101da8377b7f8ba71df709441

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page