Skip to main content

A wrapper for Playwright Page that simplifies browser automation and scraping.

Project description

quickpage

Overview - 概要

QuickPage is a wrapper for Playwright.

quickpageはPlaywrightのラッパーです。

Requirements - 必要条件

To run quickpage, you need the following environment:

quickpageの実行には、以下の環境が必要です。

  • Python 3.10 or higher
  • Libraries:
    • playwright (version 1.49.0 or higher)
  • Browser binaries (install separately / 別途インストールが必要):

Installation - インストール

You can install quickpage and all required dependencies from PyPI:

quickpageとその実行に必要なライブラリは以下でインストールできます。

pip

pip install quickpage

uv (recommended)

uv add quickpage

After installation, you need to install the browser binary separately:

インストール後、ブラウザのバイナリを別途インストールする必要があります。

pip

python -m playwright install chromium

uv

uv run playwright install chromium

Basic Usage - 基本的な使い方

import random
import time
import pandas as pd
from pathlib import Path
from playwright.sync_api import sync_playwright, Page
from quickpage import QuickPage

BASE_DIR = Path(__file__).parent

CSV_PATH = BASE_DIR / 'classroom_info.csv'

def scrape(page: Page) -> None:
    p = QuickPage(page)

    p.goto('https://www.foobarbaz1.jp')
    pref_urls = [p.attr('href', e) for e in p.ss('li.item > ul > li > a')]

    classroom_urls = []
    for i, url in enumerate(pref_urls, 1):
        print(f'{i}/{len(pref_urls)} pref_urls')
        if not p.goto(url):
            continue
        time.sleep(random.uniform(1, 2))
        links = [p.attr('href', e) for e in p.ss('.school-area h4 a')]
        classroom_urls.extend(links)

    for i, url in enumerate(classroom_urls, 1):
        print(f'{i}/{len(classroom_urls)} classroom_urls')
        if not p.goto(url):
            continue
        time.sleep(random.uniform(1, 2))
        row = {
            'URL': page.url,
            '教室名': p.text_c(p.s('h1 .text01')),
            '住所': p.i_text(p.s('.item .mapText')),
            '電話番号': p.text_c(p.s('.item .phoneNumber')),
            'HP': p.attr('href', p.s_in('a', p.next(p.s_re('th', 'ホームページ')))),
        }
        pd.DataFrame([row]).to_csv(
            CSV_PATH,
            mode='a',
            index=False,
            header=not CSV_PATH.exists(),
            encoding='utf-8-sig',
        )


def main() -> None:
    with sync_playwright() as pw:
        with pw.chromium.launch(headless=False, channel="chrome") as browser:
            with browser.new_context(
                viewport={'width': 1920, 'height': 1080},
                # ChromeのUser-Agentは chrome://version で確認できる。それをそのままコピーして使うのが一番自然。
                user_agent='Mozilla/5.0 ...',
                extra_http_headers={'Accept-Language': 'ja-JP,ja;q=0.9'}
            ) as context:
                page = context.new_page()
                page.set_default_timeout(15000) 
                blocked = {'image', 'font', 'media'}
                def handler(route):
                    if route.request.resource_type in blocked:
                        route.abort()
                    else:
                        route.continue_()
                page.route('**/*', handler)
                scrape(page)

if __name__ == '__main__':
    main()

License - ライセンス

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

quickpage-2.0.0.tar.gz (13.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

quickpage-2.0.0-py3-none-any.whl (4.6 kB view details)

Uploaded Python 3

File details

Details for the file quickpage-2.0.0.tar.gz.

File metadata

  • Download URL: quickpage-2.0.0.tar.gz
  • Upload date:
  • Size: 13.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-requests/2.32.5

File hashes

Hashes for quickpage-2.0.0.tar.gz
Algorithm Hash digest
SHA256 e3af39f318642692dca0cba67ec24aaa16de66811ac6ea87f5b70b8accb17a9c
MD5 07e10c227d020c20f6903ec8f6c27989
BLAKE2b-256 25f6e530812b325490af772f274062db868acd7226607aafa7690eb4d5849c42

See more details on using hashes here.

File details

Details for the file quickpage-2.0.0-py3-none-any.whl.

File metadata

  • Download URL: quickpage-2.0.0-py3-none-any.whl
  • Upload date:
  • Size: 4.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-requests/2.32.5

File hashes

Hashes for quickpage-2.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 cafc8b94bbfa543b6300e94f5641e85672f6e036f6f0e8e33de7102af6131c95
MD5 dfbf68fa9cdd74f3958125d91c4619b6
BLAKE2b-256 c6d78441bed4106e6046b0b8e1f1fbc177d96bbf6746815185b0f13c19d36683

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page