ruia_motor - a Ruia plugin that uses the motor to store data
Project description
ruia-motor
A Ruia plugin that uses the motor to store data
Notice: Works on ruia >= 0.5.0
Installation
pip install -U ruia-motor
Usage
ruia-motor will be automatically store data to mongodb:
from ruia import AttrField, Item, Spider, TextField
from ruia_motor import RuiaMotor
class DoubanItem(Item):
target_item = TextField(css_select='div.item')
title = TextField(css_select='span.title')
cover = AttrField(css_select='div.pic>a>img', attr='src')
abstract = TextField(css_select='span.inq', default='')
async def clean_title(self, title):
if isinstance(title, str):
return title
else:
return ''.join([i.text.strip().replace('\xa0', '') for i in title])
class DoubanSpider(Spider):
start_urls = ['https://movie.douban.com/top250']
mongodb_config = {
'host': '127.0.0.1',
'port': 27017,
'db': 'ruia_motor'
}
async def parse(self, response):
etree = response.html_etree
pages = ['?start=0&filter='] + [i.get('href') for i in etree.cssselect('.paginator>a')]
for index, page in enumerate(pages):
url = self.start_urls[0] + page
yield self.request(
url=url,
metadata={'index': index},
callback=self.parse_item
)
async def parse_item(self, response):
async for item in DoubanItem.get_items(html=response.html):
data = item.results
yield RuiaMotor(collection='douban250', data=data)
async def init_plugins_after_start(spider_ins):
RuiaMotor.init_spider(spider_ins=spider_ins)
if __name__ == '__main__':
DoubanSpider.start(after_start=init_plugins_after_start)
Enjoy it :)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
ruia_motor-0.0.3.tar.gz
(3.5 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ruia_motor-0.0.3.tar.gz.
File metadata
- Download URL: ruia_motor-0.0.3.tar.gz
- Upload date:
- Size: 3.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ae306833e3d2c7b4849aa279c91474b5db1701297d82a92491aefb6c214e9f10
|
|
| MD5 |
28ab7de945df0b9854321aa7d112cf68
|
|
| BLAKE2b-256 |
344dbda652a119a299b4baa55c333411f42006c229c5a51760ffbcdb10ec176e
|
File details
Details for the file ruia_motor-0.0.3-py3-none-any.whl.
File metadata
- Download URL: ruia_motor-0.0.3-py3-none-any.whl
- Upload date:
- Size: 5.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
93e2893ef4c4584a88d105062102fd2672ce935cebf166b39b8cc4becd99c1ea
|
|
| MD5 |
f421c9570a6195047c0487a82556f0fc
|
|
| BLAKE2b-256 |
3bdefa804991bbdecf7445c846e654823584bc69d1a27dc826c675ebd01eb96e
|