Skip to main content

A Ruia plugin that uses the peewee-async to store data to MySQL

Project description

ruia-peewee-async

996.icu LICENSE

A Ruia plugin that uses peewee-async to store data to MySQL or PostgreSQL or both of them.

Installation

Using pip or pipenv or poetry to install.

pip install ruia-peewee-async[aiomysql]
pipenv install ruia-peewee-async[aiomysql]
poetry add ruia-peewee-async[aiomysql]

or

pip install ruia-peewee-async[aiopg]
pipenv install ruia-peewee-async[aiopg]
poetry add ruia-peewee-async[aiopg]

or

pip install ruia-peewee-async[all]
pipenv install ruia-peewee-async[all]
poetry install ruia-peewee-async[all]

ruia-peewee-async[all] means to install both aiomysql and aiopg.

Usage

A complete example is like below.

# -*- coding: utf-8 -*-
from peewee import CharField
from ruia import AttrField, Item, Response, TextField

from ruia_peewee_async import (
    RuiaPeeweeInsert,
    RuiaPeeweeUpdate,
    Spider,
    TargetDB,
    after_start,
)


class DoubanItem(Item):
    target_item = TextField(css_select="tr.item")
    title = AttrField(css_select="a.nbg", attr="title")
    url = AttrField(css_select="a.nbg", attr="href")

    async def clean_title(self, value):
        return value.strip()


class DoubanSpider(Spider):
    start_urls = ["https://movie.douban.com/chart"]
    # aiohttp_kwargs = {"proxy": "http://127.0.0.1:7890"}

    async def parse(self, response: Response):
        async for item in DoubanItem.get_items(html=await response.text()):
            yield RuiaPeeweeInsert(item.results)  # default is MySQL
            # yield RuiaPeeweeInsert(item.results, database=TargetDB.POSTGRES) # save to Postgresql
            # yield RuiaPeeweeInsert(item.results, database=TargetDB.BOTH) # save to both MySQL and Postgresql


class DoubanUpdateSpider(Spider):
    start_urls = ["https://movie.douban.com/chart"]

    async def parse(self, response: Response):
        async for item in DoubanItem.get_items(html=await response.text()):
            res = {}
            res["title"] = item.results["title"]
            res["url"] = "http://whatever.youwanttoupdate.com"
            yield RuiaPeeweeUpdate(
                res,
                {"title": res["title"]},
                database=TargetDB.POSTGRES,  # default is MySQL
            )

            # Args for RuiaPeeweeUpdate
            # data: A dict that's going to be updated in the database.
            # query: A peewee's query or a dict to search for the target data in database.
            # database: The target database type.
            # create_when_not_exists: Default is True. If True, will create a record when query can't get the record.
            # not_update_when_exists: Default is True. If True and record exists, won't update data to the records.
            # only: A list or tuple of fields that should be updated only.


mysql = {
    "host": "127.0.0.1",
    "port": 3306,
    "user": "ruiamysql",
    "password": "abc123",
    "database": "ruiamysql",
    "model": {
        "table_name": "ruia_mysql",
        "title": CharField(),
        "url": CharField(),
    },
}
postgres = {
    "host": "127.0.0.1",
    "port": 5432,
    "user": "ruiapostgres",
    "password": "abc123",
    "database": "ruiapostgres",
    "model": {
        "table_name": "ruia_postgres",
        "title": CharField(),
        "url": CharField(),
    },
}

if __name__ == "__main__":
    DoubanSpider.start(after_start=after_start(mysql=mysql))
    # DoubanSpider.start(after_start=after_start(postgres=postgres))
    # DoubanSpider.start(after_start=after_start(mysql=mysql, postgres=postgres))
    # DoubanUpdateSpider.start(after_start=after_start(mysql=mysql))

There's a create_model method to create the Peewee model based on database configuration.

from ruia_peewee_async import create_model

model = create_model(mysql=mysql) # or postgres=postgres or both
# create the table at the same time
model = create_mode(postgres=postgres, create_table=True)
rows = model.select().count()
print(rows)

And class Spider from ruia_peewee_async has attributes below related to database you can use.

from peewee import Model
from typing import Dict
from peewee_async import Manager, MySQLDatabase, PostgresqlDatabase
from ruia import Spider as RuiaSpider

class Spider(RuiaSpider):
    mysql_model: Union[Model, Dict] # It will be a Model instance after spider started.
    mysql_manager: Manager
    postgres_model: Union[Model, Dict] # same above
    postgres_manager: Manager
    mysql_db: MySQLDatabase
    postgres_db: PostgresqlDatabase

For more information, check out peewee's documentation and peewee-async's documentation.

Development

Using pyenv to install the version of python that you need. For example

pyenv install 3.7.9

Then go to the root of the project and run:

poetry install && poetry install -E aiomysql -E aiopg

to install all dependencies.

  • Using poetry shell to enter the virtual environment. Or open your favorite editor and select the virtual environment to start coding.
  • Using pytest to run unit tests under tests folder.
  • Using pytest --cov . to run all tests and generate coverage report in terminal.

Thanks

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ruia-peewee-async-1.2.1.tar.gz (10.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ruia_peewee_async-1.2.1-py3-none-any.whl (12.1 kB view details)

Uploaded Python 3

File details

Details for the file ruia-peewee-async-1.2.1.tar.gz.

File metadata

  • Download URL: ruia-peewee-async-1.2.1.tar.gz
  • Upload date:
  • Size: 10.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.3 readme-renderer/37.0 requests/2.28.1 requests-toolbelt/0.9.1 urllib3/1.26.12 tqdm/4.64.0 importlib-metadata/4.12.0 keyring/23.8.2 rfc3986/2.0.0 colorama/0.4.5 CPython/3.9.13

File hashes

Hashes for ruia-peewee-async-1.2.1.tar.gz
Algorithm Hash digest
SHA256 b647e69aa5124a0bcf6c2dc5b1c332be223cd19b12c9c67f6ecfc8a3860d83a3
MD5 6c28a2e8b62832506580eff43d34a47e
BLAKE2b-256 e1516cb5c2bdb4a485e3f4c74ea8e830e8b6112a60a4896deb6220633df23e04

See more details on using hashes here.

File details

Details for the file ruia_peewee_async-1.2.1-py3-none-any.whl.

File metadata

  • Download URL: ruia_peewee_async-1.2.1-py3-none-any.whl
  • Upload date:
  • Size: 12.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.3 readme-renderer/37.0 requests/2.28.1 requests-toolbelt/0.9.1 urllib3/1.26.12 tqdm/4.64.0 importlib-metadata/4.12.0 keyring/23.8.2 rfc3986/2.0.0 colorama/0.4.5 CPython/3.9.13

File hashes

Hashes for ruia_peewee_async-1.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 9b2b91c556d633eb93111dae77a49976bb3a529752847e8edf7d4a7d0c0f5e58
MD5 b625e11872cda0f790f763c71b3f8ce6
BLAKE2b-256 a430dcdbca9f74d3dfa49903dcdc15a14ec221b8938074b0daf86790814c5ee9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page