Skip to main content

ruia_motor - a Ruia plugin that uses the motor to store data

Project description

ruia-motor

A Ruia plugin that uses the motor to store data

Notice:  Works on ruia >= 0.8.0

Installation

pip install -U ruia-motor

Usage

ruia-motor will be automatically store data to mongodb:

from ruia import AttrField, Item, Response, Spider, TextField

from ruia_motor import RuiaMotorInsert, RuiaMotorUpdate, init_spider


class HackerNewsItem(Item):
    target_item = TextField(css_select="tr.athing")
    title = TextField(css_select="a.storylink")
    url = AttrField(css_select="a.storylink", attr="href")

    async def clean_title(self, value):
        return value.strip()


class HackerNewsSpider(Spider):
    start_urls = ["https://news.ycombinator.com/news?p=1"]
    aiohttp_kwargs = {"proxy": "http://0.0.0.0:1087"}

    async def parse(self, response: Response):
        async for item in HackerNewsItem.get_items(html=await response.text()):
            # Update data
            # https://motor.readthedocs.io/en/stable/api-asyncio/asyncio_motor_collection.html#motor.motor_asyncio.AsyncIOMotorCollection.update_one
            yield RuiaMotorUpdate(
                collection="hn_demo",
                filter={"title": item.title},
                update={"$set": item.results},
                upsert=True,
            )
            # Insert data
            # https://motor.readthedocs.io/en/stable/api-asyncio/asyncio_motor_collection.html#motor.motor_asyncio.AsyncIOMotorCollection.insert_one
            # yield RuiaMotorInsert(collection="hn_demo", data=item.results)


async def init_plugins_after_start(spider_ins):
    spider_ins.mongodb_config = {"host": "127.0.0.1", "port": 27017, "db": "ruia_motor"}
    init_spider(spider_ins=spider_ins)


if __name__ == "__main__":
    HackerNewsSpider.start(after_start=init_plugins_after_start)

Enjoy it :)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ruia_motor-0.0.5.tar.gz (4.7 kB view hashes)

Uploaded source

Built Distribution

ruia_motor-0.0.5-py3-none-any.whl (6.1 kB view hashes)

Uploaded py3

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Huawei Huawei PSF Sponsor Microsoft Microsoft PSF Sponsor NVIDIA NVIDIA PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page