Add your description here
Project description
quickplay
Overview - 概要
quickplay is a scraping utility library built on Playwright and selectolax. quickplayはPlaywrightとselectolaxをベースにしたスクレイピングユーティリティライブラリです。
- PlayPage — Playwright
Pageのラッパー。ライブスクレイピング用。 - LocalPage — 保存済みHTMLファイルをPlayPage風に操作するクラス。selectolaxベースで高速。
- ユーティリティ関数群 —
browse/save_html/append_csv/html_filename/sleep_between
Requirements - 必要条件
- Python 3.12 or higher
- Libraries: playwright, selectolax, pandas(自動インストール)
- Browser binary(別途インストールが必要)
Installation - インストール
pip
pip install quickplay
uv (recommended)
uv add quickplay
ブラウザバイナリを別途インストールしてください。
pip
python -m playwright install chromium
uv
uv run playwright install chromium
Basic Usage - 基本的な使い方
from playwright.sync_api import Page
from quickplay import PlayPage, BasePaths, browse, append_csv, sleep_between
paths = BasePaths(__file__)
def scrape(page: Page) -> None:
p = PlayPage(page)
p.goto('https://www.foobarbaz1.jp')
pref_urls = [p.url(e) for e in p.ss('li.item > ul > li > a')]
classroom_urls = []
for i, url in enumerate(pref_urls, 1):
print(f'{i}/{len(pref_urls)} pref_urls')
if not p.goto(url):
continue
sleep_between(1, 2)
links = [p.url(e) for e in p.ss('.school-area h4 a')]
classroom_urls.extend(links)
for i, url in enumerate(classroom_urls, 1):
print(f'{i}/{len(classroom_urls)} classroom_urls')
if not p.goto(url):
continue
sleep_between(1, 2)
row = {
'URL': page.url,
'教室名': p.text(p.s('h1 .text01')),
'住所': p.text(p.s('.item .mapText')),
'電話番号': p.text(p.s('.item .phoneNumber')),
'HP': p.attr('href', p.s_in('a', p.next(p.s_re('th', 'ホームページ')))),
}
append_csv(paths.from_here('out.csv'), row)
if __name__ == '__main__':
browse(
scrape,
user_agent='Mozilla/5.0 ...',
block_resources={'image', 'font'},
)
Save HTML while scraping - スクレイピングしながらHTMLを保存する
from playwright.sync_api import Page
from quickplay import PlayPage, BasePaths, browse, save_html, html_filename, sleep_between
paths = BasePaths(__file__)
def scrape(page: Page) -> None:
p = PlayPage(page)
p.goto('https://www.foobarbaz1.jp')
item_urls = [p.url(e) for e in p.ss('ul.items > li > a')]
for i, url in enumerate(item_urls, 1):
print(f'{i}/{len(item_urls)} item_urls')
if not p.goto(url):
continue
sleep_between(1, 2)
save_html(paths.from_here('html'), html_filename(page.url), page.content())
if __name__ == '__main__':
browse(scrape, block_resources={'image', 'font'})
Scrape from local HTML files - 保存済みHTMLからスクレイピングしてCSVに出力する
from quickplay import LocalPage, BasePaths, append_csv
paths = BasePaths(__file__)
p = LocalPage()
for path in paths.from_here('html').glob('*.html'):
if not p.goto(path):
continue
row = {
'商品名': p.text(p.s('h1.product-name')),
'価格': p.text(p.s('span.price')),
'HP': p.attr('href', p.s_in('a', p.next(p.s_re('th', 'ホームページ')))),
}
append_csv(paths.from_here('out.csv'), row)
License - ライセンス
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
quickplay-1.1.3.tar.gz
(6.8 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file quickplay-1.1.3.tar.gz.
File metadata
- Download URL: quickplay-1.1.3.tar.gz
- Upload date:
- Size: 6.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: python-requests/2.32.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cccf8d34062ad149e04704709470b9dd93387a6935565c73ad5618d326656a68
|
|
| MD5 |
98780ae0e8cf1e72f519daeef0480d17
|
|
| BLAKE2b-256 |
bb6f547a9d96b93f452a4d8efb648064e6323c7f951dbfcf567c82ef620e69ad
|
File details
Details for the file quickplay-1.1.3-py3-none-any.whl.
File metadata
- Download URL: quickplay-1.1.3-py3-none-any.whl
- Upload date:
- Size: 7.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: python-requests/2.32.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
122e252989e9beb39e099a5a901a202941b6a03c7c049b2ce0e9c8c0a855f83f
|
|
| MD5 |
af1aab986cf7e578f2d5797955c43fee
|
|
| BLAKE2b-256 |
cc476b0e1bb434a986810f5fe2d29a38fec5f0d8b50166c5ea0dedf978d382a6
|