Scraping Weibos
Project description
准备工作
- 安装 postgresql 数据库:
brew install postgresql brew services start postgresql
- 创建数据库.
createdb your_database_name - 配置信息
from sinaspider import _config # 写入配置信息 config( account_id = 'your accout id' # 你的微博账号 database_name = 'your_database_name' # 微博和用户信息将保存在该数据库 write_xmp=True # 是否将微博信息写入图片(可选, 需安装Exiftool) ) # 读取配置信息 config() >>> ConfigObj({'database_name': 'sina_test', 'write_xmp': 'True', 'account_id': '6619193364'})
- 设置cookie
import keyring cookie = '...your cookie get from www.m.weibo.cn ...' # 需要m.weibo.cn网页的cookie keyring.set_password('sinaspider', 'cookie', cookie)
Quick Start
将关注者放入配置列表中:
owner = Owner()
for following in owner.following():
UserConfig(following)
读取配置列表中的用户:
>>> for user_config in UserConfig.yield_config_user():
>>> pring(user_config)
>>> break
# 打开所有配置选项
>>> user_config.toggle_all()
Fetch Weibo: True
Fetch Retweet: True
Download Media: True
Fetch following: True
# 获取所有微博
>>> user_config.fetch_weibo()
Fetching Retweet: True
Media Saving: ~/Downloads/sinaspider
Update Config: True
每个用户提供如下配置选项:
weibo_fetch: 是否下载微博weibo_since: 只获取该日期后的微博(默认为1970-01-01, 即获取所有微博)retweet_fetch: 是否下载转发微博media_download: 是否下载图片和视频
微博保存与下载
User
获取用户信息
>>> from sinaspider import User
>>> uid = 6619193364 # 填写 用户id
>>> user = User(uid)
可通过user.weibos获取微博页面, 其具体参数参加get_weibo_pages
# 获取第3页到第10页的所有微博, 并将文件保存在`path/to/download`
weibos=user.weibos(retweet=True, star_page=3, end_page=10,
download_dir='path/to/download')
# 返回下一条微博
next(weibos)
Owner
from sinaspider import Owner
from pathlib import Path
owner = Owner()
#获取自己的资料
owner.info
# 获取自己的关注信息
myfollow = owner.following()
# 获取自己的微博
myweibo = owner.weibos(download_dir='path/to/dir')
# 获取收藏页面
>>> mycollection=owner.collections(download_dir='path/to/dir)
>>> next(mycollection)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
sinaspider-0.4.1.tar.gz
(15.8 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sinaspider-0.4.1.tar.gz.
File metadata
- Download URL: sinaspider-0.4.1.tar.gz
- Upload date:
- Size: 15.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: python-requests/2.27.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
35666c72fc7f8e82d16c3c2a4296e385c9f8bcbc960edf0a326483ade30df589
|
|
| MD5 |
01fef47906e9de6033824eb6a57674c6
|
|
| BLAKE2b-256 |
4ca0d7c33652867ef20a082890d461f49e31badc418fc613ac60078c6cfcccaf
|
File details
Details for the file sinaspider-0.4.1-py2.py3-none-any.whl.
File metadata
- Download URL: sinaspider-0.4.1-py2.py3-none-any.whl
- Upload date:
- Size: 16.9 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: python-requests/2.27.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e6d74abe0b79a0f7f6f402f2896bad67bc39d531f3af34bbc662cac5b2113051
|
|
| MD5 |
c19eec474dd136053206d1adebf3fb0d
|
|
| BLAKE2b-256 |
847fca80b7b7f496a4c9e4ee6534a002c5a11c3101da8377b7f8ba71df709441
|