Skip to main content

Add your description here

Project description

quickplay

Overview - 概要

quickplay is a scraping utility library built on Playwright and selectolax. quickplayはPlaywrightとselectolaxをベースにしたスクレイピングユーティリティライブラリです。

  • PlayPage — Playwright Page のラッパー。ライブスクレイピング用。
  • LocalPage — 保存済みHTMLファイルをPlayPage風に操作するクラス。selectolaxベースで高速。
  • ユーティリティ関数群 — browse / save_html / append_csv / html_filename / sleep_between

Requirements - 必要条件

  • Python 3.12 or higher
  • Libraries: playwright, selectolax, pandas(自動インストール)
  • Browser binary(別途インストールが必要)

Installation - インストール

pip

pip install quickplay

uv (recommended)

uv add quickplay

ブラウザバイナリを別途インストールしてください。

pip

python -m playwright install chromium

uv

uv run playwright install chromium

Basic Usage - 基本的な使い方

from playwright.sync_api import Page
from quickplay import PlayPage, BasePaths, browse, append_csv, sleep_between

paths = BasePaths(__file__)

def scrape(page: Page) -> None:
    p = PlayPage(page)
    p.goto('https://www.foobarbaz1.jp')

    pref_urls = [p.attr('href', e) for e in p.ss('li.item > ul > li > a')]

    classroom_urls = []
    for i, url in enumerate(pref_urls, 1):
        print(f'{i}/{len(pref_urls)} pref_urls')
        if not p.goto(url):
            continue
        sleep_between(1, 2)
        links = [p.attr('href', e) for e in p.ss('.school-area h4 a')]
        classroom_urls.extend(links)

    for i, url in enumerate(classroom_urls, 1):
        print(f'{i}/{len(classroom_urls)} classroom_urls')
        if not p.goto(url):
            continue
        sleep_between(1, 2)
        row = {
            'URL': page.url,
            '教室名': p.text(p.s('h1 .text01')),
            '住所': p.text(p.s('.item .mapText')),
            '電話番号': p.text(p.s('.item .phoneNumber')),
            'HP': p.attr('href', p.s_in('a', p.next(p.s_re('th', 'ホームページ')))),
        }
        append_csv(paths.from_here('out.csv'), row)

if __name__ == '__main__':
    browse(
        scrape,
        user_agent='Mozilla/5.0 ...',
        block_resources={'image', 'font'},
    )

Save HTML while scraping - スクレイピングしながらHTMLを保存する

from playwright.sync_api import Page
from quickplay import PlayPage, BasePaths, browse, save_html, html_filename, sleep_between

paths = BasePaths(__file__)

def scrape(page: Page) -> None:
    p = PlayPage(page)
    p.goto('https://www.foobarbaz1.jp')

    item_urls = [p.attr('href', e) for e in p.ss('ul.items > li > a')]

    for i, url in enumerate(item_urls, 1):
        print(f'{i}/{len(item_urls)} item_urls')
        if not p.goto(url):
            continue
        sleep_between(1, 2)
        save_html(paths.from_here('html'), html_filename(page.url), page.content())

if __name__ == '__main__':
    browse(scrape, block_resources={'image', 'font'})

Scrape from local HTML files - 保存済みHTMLからスクレイピングしてCSVに出力する

from quickplay import LocalPage, BasePaths, append_csv

paths = BasePaths(__file__)
p = LocalPage()

for path in paths.from_here('html').glob('*.html'):
    if not p.goto(path):
        continue
    row = {
        '商品名': p.text(p.s('h1.product-name')),
        '価格':   p.text(p.s('span.price')),
        'HP':     p.attr('href', p.s_in('a', p.next(p.s_re('th', 'ホームページ')))),
    }
    append_csv(paths.from_here('out.csv'), row)

License - ライセンス

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

quickplay-1.1.1.tar.gz (6.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

quickplay-1.1.1-py3-none-any.whl (6.9 kB view details)

Uploaded Python 3

File details

Details for the file quickplay-1.1.1.tar.gz.

File metadata

  • Download URL: quickplay-1.1.1.tar.gz
  • Upload date:
  • Size: 6.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-requests/2.32.5

File hashes

Hashes for quickplay-1.1.1.tar.gz
Algorithm Hash digest
SHA256 83a1fa7ecdaedb10de99290742175d24e15e4d389835b71c9a322a02ab7901b9
MD5 115cd8f323143517e8fdbc9c57297d1c
BLAKE2b-256 dab9fb22ae6e5ea618e03a53beab4512eefe40dcc57264f8f047832383b3fc5d

See more details on using hashes here.

File details

Details for the file quickplay-1.1.1-py3-none-any.whl.

File metadata

  • Download URL: quickplay-1.1.1-py3-none-any.whl
  • Upload date:
  • Size: 6.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-requests/2.32.5

File hashes

Hashes for quickplay-1.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 767e1a5e4a74ba484b1ad4e20b78a6f457fccd8a1ec3f0877bd10650d6321764
MD5 442fa2b72f5639aeb597d6fe4141e09d
BLAKE2b-256 16170f03fcd4f3eb9b53a2c8fedde80ed6857a2de72adb3a1f89c882d8e531a6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page