Scraping Weibos
Project description
准备工作
- 安装 postgresql 数据库:
brew install postgresql brew services start postgresql
- 创建数据库.
createdb your_database_name
- 配置信息
from sinaspider import _config # 写入配置信息 config( account_id = 'your accout id' # 你的微博账号 database_name = 'your_database_name' # 微博和用户信息将保存在该数据库 write_xmp=True # 是否将微博信息写入图片(可选, 需安装Exiftool) ) # 读取配置信息 config() >>> ConfigObj({'database_name': 'sina_test', 'write_xmp': 'True', 'account_id': '6619193364'})
- 设置cookie
import keyring cookie = '...your cookie get from www.m.weibo.cn ...' # 需要m.weibo.cn网页的cookie keyring.set_password('sinaspider', 'cookie', cookie)
Quick Start
将关注者放入配置列表中:
owner = Owner()
for following in owner.following():
UserConfig(following)
读取配置列表中的用户:
>>> for user_config in UserConfig.yield_config_user():
>>> pring(user_config)
>>> break
# 打开所有配置选项
>>> user_config.toggle_all()
Fetch Weibo: True
Fetch Retweet: True
Download Media: True
Fetch following: True
# 获取所有微博
>>> user_config.fetch_weibo()
Fetching Retweet: True
Media Saving: ~/Downloads/sinaspider
Update Config: True
每个用户提供如下配置选项:
weibo_fetch
: 是否下载微博weibo_since
: 只获取该日期后的微博(默认为1970-01-01
, 即获取所有微博)retweet_fetch
: 是否下载转发微博media_download
: 是否下载图片和视频
微博保存与下载
User
获取用户信息
>>> from sinaspider import User
>>> uid = 6619193364 # 填写 用户id
>>> user = User(uid)
可通过user.weibos
获取微博页面, 其具体参数参加get_weibo_pages
# 获取第3页到第10页的所有微博, 并将文件保存在`path/to/download`
weibos=user.weibos(retweet=True, star_page=3, end_page=10,
download_dir='path/to/download')
# 返回下一条微博
next(weibos)
Owner
from sinaspider import Owner
from pathlib import Path
owner = Owner()
#获取自己的资料
owner.info
# 获取自己的关注信息
myfollow = owner.following()
# 获取自己的微博
myweibo = owner.weibos(download_dir='path/to/dir')
# 获取收藏页面
>>> mycollection=owner.collections(download_dir='path/to/dir)
>>> next(mycollection)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
sinaspider-0.4.1.tar.gz
(15.8 kB
view hashes)
Built Distribution
Close
Hashes for sinaspider-0.4.1-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e6d74abe0b79a0f7f6f402f2896bad67bc39d531f3af34bbc662cac5b2113051 |
|
MD5 | c19eec474dd136053206d1adebf3fb0d |
|
BLAKE2b-256 | 847fca80b7b7f496a4c9e4ee6534a002c5a11c3101da8377b7f8ba71df709441 |