Add your description here
Project description
quickplay
Overview - 概要
quickplay is a scraping utility library built on Playwright and selectolax. quickplayはPlaywrightとselectolaxをベースにしたスクレイピングユーティリティライブラリです。
- PlayPage — Playwright
Pageのラッパー。ライブスクレイピング用。 - LocalPage — 保存済みHTMLファイルをPlayPage風に操作するクラス。selectolaxベースで高速。
- ユーティリティ関数群 —
browse/save_html/append_csv/html_filename/sleep_between
Requirements - 必要条件
- Python 3.12 or higher
- Libraries: playwright, selectolax, pandas(自動インストール)
- Browser binary(別途インストールが必要)
Installation - インストール
pip
pip install quickplay
uv (recommended)
uv add quickplay
ブラウザバイナリを別途インストールしてください。
pip
python -m playwright install chromium
uv
uv run playwright install chromium
Basic Usage - 基本的な使い方
from playwright.sync_api import Page
from quickplay import PlayPage, BasePaths, browse, append_csv, sleep_between
paths = BasePaths(__file__)
def scrape(page: Page) -> None:
p = PlayPage(page)
p.goto('https://www.foobarbaz1.jp')
pref_urls = [p.attr('href', e) for e in p.ss('li.item > ul > li > a')]
classroom_urls = []
for i, url in enumerate(pref_urls, 1):
print(f'{i}/{len(pref_urls)} pref_urls')
if not p.goto(url):
continue
sleep_between(1, 2)
links = [p.attr('href', e) for e in p.ss('.school-area h4 a')]
classroom_urls.extend(links)
for i, url in enumerate(classroom_urls, 1):
print(f'{i}/{len(classroom_urls)} classroom_urls')
if not p.goto(url):
continue
sleep_between(1, 2)
row = {
'URL': page.url,
'教室名': p.text(p.s('h1 .text01')),
'住所': p.text(p.s('.item .mapText')),
'電話番号': p.text(p.s('.item .phoneNumber')),
'HP': p.attr('href', p.s_in('a', p.next(p.s_re('th', 'ホームページ')))),
}
append_csv(paths.from_here('out.csv'), row)
if __name__ == '__main__':
browse(
scrape,
user_agent='Mozilla/5.0 ...',
block_resources={'image', 'font'},
)
Save HTML while scraping - スクレイピングしながらHTMLを保存する
from playwright.sync_api import Page
from quickplay import PlayPage, BasePaths, browse, save_html, html_filename, sleep_between
paths = BasePaths(__file__)
def scrape(page: Page) -> None:
p = PlayPage(page)
p.goto('https://www.foobarbaz1.jp')
item_urls = [p.attr('href', e) for e in p.ss('ul.items > li > a')]
for i, url in enumerate(item_urls, 1):
print(f'{i}/{len(item_urls)} item_urls')
if not p.goto(url):
continue
sleep_between(1, 2)
save_html(paths.from_here('html'), html_filename(page.url), page.content())
if __name__ == '__main__':
browse(scrape, block_resources={'image', 'font'})
Scrape from local HTML files - 保存済みHTMLからスクレイピングしてCSVに出力する
from quickplay import LocalPage, BasePaths, append_csv
paths = BasePaths(__file__)
p = LocalPage()
for path in paths.from_here('html').glob('*.html'):
if not p.goto(path):
continue
row = {
'商品名': p.text(p.s('h1.product-name')),
'価格': p.text(p.s('span.price')),
'HP': p.attr('href', p.s_in('a', p.next(p.s_re('th', 'ホームページ')))),
}
append_csv(paths.from_here('out.csv'), row)
License - ライセンス
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
quickplay-1.1.1.tar.gz
(6.4 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file quickplay-1.1.1.tar.gz.
File metadata
- Download URL: quickplay-1.1.1.tar.gz
- Upload date:
- Size: 6.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: python-requests/2.32.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
83a1fa7ecdaedb10de99290742175d24e15e4d389835b71c9a322a02ab7901b9
|
|
| MD5 |
115cd8f323143517e8fdbc9c57297d1c
|
|
| BLAKE2b-256 |
dab9fb22ae6e5ea618e03a53beab4512eefe40dcc57264f8f047832383b3fc5d
|
File details
Details for the file quickplay-1.1.1-py3-none-any.whl.
File metadata
- Download URL: quickplay-1.1.1-py3-none-any.whl
- Upload date:
- Size: 6.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: python-requests/2.32.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
767e1a5e4a74ba484b1ad4e20b78a6f457fccd8a1ec3f0877bd10650d6321764
|
|
| MD5 |
442fa2b72f5639aeb597d6fe4141e09d
|
|
| BLAKE2b-256 |
16170f03fcd4f3eb9b53a2c8fedde80ed6857a2de72adb3a1f89c882d8e531a6
|