A wrapper for Playwright Page that simplifies browser automation and scraping.
Project description
quickpage
Overview - 概要
QuickPage is a wrapper for Playwright.
quickpageはPlaywrightのラッパーです。
Requirements - 必要条件
To run quickpage, you need the following environment:
quickpageの実行には、以下の環境が必要です。
- Python 3.10 or higher
- Libraries:
- playwright (version 1.49.0 or higher)
- Browser binaries (install separately / 別途インストールが必要):
Installation - インストール
You can install quickpage and all required dependencies from PyPI:
quickpageとその実行に必要なライブラリは以下でインストールできます。
pip
pip install quickpage
uv (recommended)
uv add quickpage
After installation, you need to install the browser binary separately:
インストール後、ブラウザのバイナリを別途インストールする必要があります。
pip
python -m playwright install chromium
uv
uv run playwright install chromium
Basic Usage - 基本的な使い方
import random
import time
import pandas as pd
from pathlib import Path
from playwright.sync_api import sync_playwright, Page
from quickpage import QuickPage
BASE_DIR = Path(__file__).parent
CSV_PATH = BASE_DIR / 'classroom_info.csv'
def scrape(page: Page) -> None:
p = QuickPage(page)
p.goto('https://www.foobarbaz1.jp')
pref_urls = [p.attr('href', e) for e in p.ss('li.item > ul > li > a')]
classroom_urls = []
for i, url in enumerate(pref_urls, 1):
print(f'{i}/{len(pref_urls)} pref_urls')
if not p.goto(url):
continue
time.sleep(random.uniform(1, 2))
links = [p.attr('href', e) for e in p.ss('.school-area h4 a')]
classroom_urls.extend(links)
for i, url in enumerate(classroom_urls, 1):
print(f'{i}/{len(classroom_urls)} classroom_urls')
if not p.goto(url):
continue
time.sleep(random.uniform(1, 2))
row = {
'URL': page.url,
'教室名': p.text_c(p.s('h1 .text01')),
'住所': p.i_text(p.s('.item .mapText')),
'電話番号': p.text_c(p.s('.item .phoneNumber')),
'HP': p.attr('href', p.s_in('a', p.next(p.s_re('th', 'ホームページ')))),
}
pd.DataFrame([row]).to_csv(
CSV_PATH,
mode='a',
index=False,
header=not CSV_PATH.exists(),
encoding='utf-8-sig',
)
def main() -> None:
with sync_playwright() as pw:
with pw.chromium.launch(headless=False, channel="chrome") as browser:
with browser.new_context(
viewport={'width': 1920, 'height': 1080},
# ChromeのUser-Agentは chrome://version で確認できる。それをそのままコピーして使うのが一番自然。
user_agent='Mozilla/5.0 ...',
extra_http_headers={'Accept-Language': 'ja-JP,ja;q=0.9'}
) as context:
page = context.new_page()
page.set_default_timeout(15000)
blocked = {'image', 'font', 'media'}
def handler(route):
if route.request.resource_type in blocked:
route.abort()
else:
route.continue_()
page.route('**/*', handler)
scrape(page)
if __name__ == '__main__':
main()
License - ライセンス
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file quickpage-2.0.0.tar.gz.
File metadata
- Download URL: quickpage-2.0.0.tar.gz
- Upload date:
- Size: 13.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: python-requests/2.32.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e3af39f318642692dca0cba67ec24aaa16de66811ac6ea87f5b70b8accb17a9c
|
|
| MD5 |
07e10c227d020c20f6903ec8f6c27989
|
|
| BLAKE2b-256 |
25f6e530812b325490af772f274062db868acd7226607aafa7690eb4d5849c42
|
File details
Details for the file quickpage-2.0.0-py3-none-any.whl.
File metadata
- Download URL: quickpage-2.0.0-py3-none-any.whl
- Upload date:
- Size: 4.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: python-requests/2.32.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cafc8b94bbfa543b6300e94f5641e85672f6e036f6f0e8e33de7102af6131c95
|
|
| MD5 |
dfbf68fa9cdd74f3958125d91c4619b6
|
|
| BLAKE2b-256 |
c6d78441bed4106e6046b0b8e1f1fbc177d96bbf6746815185b0f13c19d36683
|