Skip to main content

ruia_motor - a Ruia plugin that uses the motor to store data

Project description

ruia-motor

A Ruia plugin that uses the motor to store data

Notice:  Works on ruia >= 0.5.0

Installation

pip install -U ruia-motor

Usage

ruia-motor will be automatically store data to mongodb:

from ruia import AttrField, Item, Spider, TextField
from ruia_motor import RuiaMotor


class DoubanItem(Item):
    target_item = TextField(css_select='div.item')
    title = TextField(css_select='span.title')
    cover = AttrField(css_select='div.pic>a>img', attr='src')
    abstract = TextField(css_select='span.inq', default='')

    async def clean_title(self, title):
        if isinstance(title, str):
            return title
        else:
            return ''.join([i.text.strip().replace('\xa0', '') for i in title])


class DoubanSpider(Spider):
    start_urls = ['https://movie.douban.com/top250']

    mongodb_config = {
        'host': '127.0.0.1',
        'port': 27017,
        'db': 'ruia_motor'
    }

    async def parse(self, response):
        etree = response.html_etree
        pages = ['?start=0&filter='] + [i.get('href') for i in etree.cssselect('.paginator>a')]
        for index, page in enumerate(pages):
            url = self.start_urls[0] + page
            yield self.request(
                url=url,
                metadata={'index': index},
                callback=self.parse_item
            )

    async def parse_item(self, response):
        async for item in DoubanItem.get_items(html=response.html):
            data = item.results
            yield RuiaMotor(collection='douban250', data=data)


async def init_plugins_after_start(spider_ins):
    RuiaMotor.init_spider(spider_ins=spider_ins)


if __name__ == '__main__':
    DoubanSpider.start(after_start=init_plugins_after_start)

Enjoy it :)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ruia_motor-0.0.3.tar.gz (3.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ruia_motor-0.0.3-py3-none-any.whl (5.7 kB view details)

Uploaded Python 3

File details

Details for the file ruia_motor-0.0.3.tar.gz.

File metadata

  • Download URL: ruia_motor-0.0.3.tar.gz
  • Upload date:
  • Size: 3.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.0

File hashes

Hashes for ruia_motor-0.0.3.tar.gz
Algorithm Hash digest
SHA256 ae306833e3d2c7b4849aa279c91474b5db1701297d82a92491aefb6c214e9f10
MD5 28ab7de945df0b9854321aa7d112cf68
BLAKE2b-256 344dbda652a119a299b4baa55c333411f42006c229c5a51760ffbcdb10ec176e

See more details on using hashes here.

File details

Details for the file ruia_motor-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: ruia_motor-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 5.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.0

File hashes

Hashes for ruia_motor-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 93e2893ef4c4584a88d105062102fd2672ce935cebf166b39b8cc4becd99c1ea
MD5 f421c9570a6195047c0487a82556f0fc
BLAKE2b-256 3bdefa804991bbdecf7445c846e654823584bc69d1a27dc826c675ebd01eb96e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page