Add your description here
Project description
quickplay
Overview - 概要
quickplay is a scraping utility library built on Playwright and selectolax. quickplayはPlaywrightとselectolaxをベースにしたスクレイピングユーティリティライブラリです。
- PlayPage — Playwright
Pageのラッパー。ライブスクレイピング用。 - LocalPage — 保存済みHTMLファイルをPlayPage風に操作するクラス。selectolaxベースで高速。
- ユーティリティ関数群 —
browse/save_html/append_csv/html_filename/sleep_between
Requirements - 必要条件
- Python 3.12 or higher
- Libraries: playwright, selectolax, pandas(自動インストール)
- Browser binary(別途インストールが必要)
Installation - インストール
pip
pip install quickplay
uv (recommended)
uv add quickplay
ブラウザバイナリを別途インストールしてください。
pip
python -m playwright install chromium
uv
uv run playwright install chromium
Basic Usage - 基本的な使い方
from playwright.sync_api import Page
from quickplay import PlayPage, BasePaths, browse, append_csv, sleep_between
paths = BasePaths(__file__)
def scrape(page: Page) -> None:
p = PlayPage(page)
p.goto('https://www.foobarbaz1.jp')
pref_urls = [p.attr('href', e) for e in p.ss('li.item > ul > li > a')]
classroom_urls = []
for i, url in enumerate(pref_urls, 1):
print(f'{i}/{len(pref_urls)} pref_urls')
if not p.goto(url):
continue
sleep_between(1, 2)
links = [p.attr('href', e) for e in p.ss('.school-area h4 a')]
classroom_urls.extend(links)
for i, url in enumerate(classroom_urls, 1):
print(f'{i}/{len(classroom_urls)} classroom_urls')
if not p.goto(url):
continue
sleep_between(1, 2)
row = {
'URL': page.url,
'教室名': p.text(p.s('h1 .text01')),
'住所': p.text(p.s('.item .mapText')),
'電話番号': p.text(p.s('.item .phoneNumber')),
'HP': p.attr('href', p.s_in('a', p.next(p.s_re('th', 'ホームページ')))),
}
append_csv(paths.from_here('out.csv'), row)
if __name__ == '__main__':
browse(
scrape,
user_agent='Mozilla/5.0 ...',
block_resources={'image', 'font'},
)
Save HTML while scraping - スクレイピングしながらHTMLを保存する
from playwright.sync_api import Page
from quickplay import PlayPage, BasePaths, browse, save_html, html_filename, sleep_between
paths = BasePaths(__file__)
def scrape(page: Page) -> None:
p = PlayPage(page)
p.goto('https://www.foobarbaz1.jp')
item_urls = [p.attr('href', e) for e in p.ss('ul.items > li > a')]
for i, url in enumerate(item_urls, 1):
print(f'{i}/{len(item_urls)} item_urls')
if not p.goto(url):
continue
sleep_between(1, 2)
save_html(paths.from_here('html'), html_filename(page.url), page.content())
if __name__ == '__main__':
browse(scrape, block_resources={'image', 'font'})
Scrape from local HTML files - 保存済みHTMLからスクレイピングしてCSVに出力する
from quickplay import LocalPage, BasePaths, append_csv
paths = BasePaths(__file__)
p = LocalPage()
for path in paths.from_here('html').glob('*.html'):
if not p.goto(path):
continue
row = {
'商品名': p.text(p.s('h1.product-name')),
'価格': p.text(p.s('span.price')),
'HP': p.attr('href', p.s_in('a', p.next(p.s_re('th', 'ホームページ')))),
}
append_csv(paths.from_here('out.csv'), row)
License - ライセンス
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
quickplay-1.1.2.tar.gz
(6.4 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file quickplay-1.1.2.tar.gz.
File metadata
- Download URL: quickplay-1.1.2.tar.gz
- Upload date:
- Size: 6.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: python-requests/2.32.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
79bfaf57615ad5314ba66e5f11add2589ce7e6d2d40f4672e0db25172a340bd9
|
|
| MD5 |
b0445a437be530e001403ac71f2d9f42
|
|
| BLAKE2b-256 |
979c1d0508991d639188e28b6d61ac3dfb3bcdc4ab94c5049e74f6300b16b690
|
File details
Details for the file quickplay-1.1.2-py3-none-any.whl.
File metadata
- Download URL: quickplay-1.1.2-py3-none-any.whl
- Upload date:
- Size: 6.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: python-requests/2.32.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2a62d107ac5d001a744d6acd6d7cce88c2013eb89206dbd810823de66cb68f14
|
|
| MD5 |
738bd1dbdd8207c177e98fc9222c41bd
|
|
| BLAKE2b-256 |
93ae2fea6dcdd34bbb0e9a5afbfb39f664d9cb4934ff667537bcee7f13acd6f4
|