A Ruia plugin that uses the peewee-async to store data to MySQL
Project description
ruia-peewee-async
A Ruia plugin that uses peewee-async to store data to MySQL or PostgreSQL or both of them.
Installation
Using pip or pipenv or poetry to install.
pip install ruia-peewee-async[aiomysql]
pipenv install ruia-peewee-async[aiomysql]
poetry add ruia-peewee-async[aiomysql]
or
pip install ruia-peewee-async[aiopg]
pipenv install ruia-peewee-async[aiopg]
poetry add ruia-peewee-async[aiopg]
or
pip install ruia-peewee-async[all]
pipenv install ruia-peewee-async[all]
poetry install ruia-peewee-async[all]
ruia-peewee-async[all] means to install both aiomysql and aiopg.
Usage
A complete example is like below.
# -*- coding: utf-8 -*-
from peewee import CharField
from ruia import AttrField, Item, Response, TextField
from ruia_peewee_async import (
RuiaPeeweeInsert,
RuiaPeeweeUpdate,
Spider,
TargetDB,
after_start,
)
class DoubanItem(Item):
target_item = TextField(css_select="tr.item")
title = AttrField(css_select="a.nbg", attr="title")
url = AttrField(css_select="a.nbg", attr="href")
async def clean_title(self, value):
return value.strip()
class DoubanSpider(Spider):
start_urls = ["https://movie.douban.com/chart"]
# aiohttp_kwargs = {"proxy": "http://127.0.0.1:7890"}
async def parse(self, response: Response):
async for item in DoubanItem.get_items(html=await response.text()):
yield RuiaPeeweeInsert(item.results) # default is MySQL
# yield RuiaPeeweeInsert(item.results, database=TargetDB.POSTGRES) # save to Postgresql
# yield RuiaPeeweeInsert(item.results, database=TargetDB.BOTH) # save to both MySQL and Postgresql
class DoubanUpdateSpider(Spider):
start_urls = ["https://movie.douban.com/chart"]
async def parse(self, response: Response):
async for item in DoubanItem.get_items(html=await response.text()):
res = {}
res["title"] = item.results["title"]
res["url"] = "http://whatever.youwanttoupdate.com"
yield RuiaPeeweeUpdate(
res,
{"title": res["title"]},
database=TargetDB.POSTGRES, # default is MySQL
)
# Args for RuiaPeeweeUpdate
# data: A dict that's going to be updated in the database.
# query: A peewee's query or a dict to search for the target data in database.
# database: The target database type.
# create_when_not_exists: Default is True. If True, will create a record when query can't get the record.
# not_update_when_exists: Default is True. If True and record exists, won't update data to the records.
# only: A list or tuple of fields that should be updated only.
mysql = {
"host": "127.0.0.1",
"port": 3306,
"user": "ruiamysql",
"password": "abc123",
"database": "ruiamysql",
"model": {
"table_name": "ruia_mysql",
"title": CharField(),
"url": CharField(),
},
}
postgres = {
"host": "127.0.0.1",
"port": 5432,
"user": "ruiapostgres",
"password": "abc123",
"database": "ruiapostgres",
"model": {
"table_name": "ruia_postgres",
"title": CharField(),
"url": CharField(),
},
}
if __name__ == "__main__":
DoubanSpider.start(after_start=after_start(mysql=mysql))
# DoubanSpider.start(after_start=after_start(postgres=postgres))
# DoubanSpider.start(after_start=after_start(mysql=mysql, postgres=postgres))
# DoubanUpdateSpider.start(after_start=after_start(mysql=mysql))
There's a create_model method to create the Peewee model based on database configuration.
from ruia_peewee_async import create_model
model = create_model(mysql=mysql) # or postgres=postgres or both
# create the table at the same time
model = create_mode(postgres=postgres, create_table=True)
rows = model.select().count()
print(rows)
And class Spider from ruia_peewee_async has attributes below related to database you can use.
from peewee import Model
from typing import Dict
from peewee_async import Manager, MySQLDatabase, PostgresqlDatabase
from ruia import Spider as RuiaSpider
class Spider(RuiaSpider):
mysql_model: Union[Model, Dict] # It will be a Model instance after spider started.
mysql_manager: Manager
postgres_model: Union[Model, Dict] # same above
postgres_manager: Manager
mysql_db: MySQLDatabase
postgres_db: PostgresqlDatabase
For more information, check out peewee's documentation and peewee-async's documentation.
Development
Using pyenv to install the version of python that you need.
For example
pyenv install 3.7.9
Then go to the root of the project and run:
poetry install && poetry install -E aiomysql -E aiopg
to install all dependencies.
- Using
poetry shellto enter the virtual environment. Or open your favorite editor and select the virtual environment to start coding. - Using
pytestto run unit tests undertestsfolder. - Using
pytest --cov .to run all tests and generate coverage report in terminal.
Thanks
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ruia-peewee-async-1.2.1.tar.gz.
File metadata
- Download URL: ruia-peewee-async-1.2.1.tar.gz
- Upload date:
- Size: 10.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.8.3 readme-renderer/37.0 requests/2.28.1 requests-toolbelt/0.9.1 urllib3/1.26.12 tqdm/4.64.0 importlib-metadata/4.12.0 keyring/23.8.2 rfc3986/2.0.0 colorama/0.4.5 CPython/3.9.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b647e69aa5124a0bcf6c2dc5b1c332be223cd19b12c9c67f6ecfc8a3860d83a3
|
|
| MD5 |
6c28a2e8b62832506580eff43d34a47e
|
|
| BLAKE2b-256 |
e1516cb5c2bdb4a485e3f4c74ea8e830e8b6112a60a4896deb6220633df23e04
|
File details
Details for the file ruia_peewee_async-1.2.1-py3-none-any.whl.
File metadata
- Download URL: ruia_peewee_async-1.2.1-py3-none-any.whl
- Upload date:
- Size: 12.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.8.3 readme-renderer/37.0 requests/2.28.1 requests-toolbelt/0.9.1 urllib3/1.26.12 tqdm/4.64.0 importlib-metadata/4.12.0 keyring/23.8.2 rfc3986/2.0.0 colorama/0.4.5 CPython/3.9.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9b2b91c556d633eb93111dae77a49976bb3a529752847e8edf7d4a7d0c0f5e58
|
|
| MD5 |
b625e11872cda0f790f763c71b3f8ce6
|
|
| BLAKE2b-256 |
a430dcdbca9f74d3dfa49903dcdc15a14ec221b8938074b0daf86790814c5ee9
|