Skip to main content

Add your description here

Project description

quickplay

Overview - 概要

quickplay is a scraping utility library built on Playwright and selectolax. quickplayはPlaywrightとselectolaxをベースにしたスクレイピングユーティリティライブラリです。

  • PlayPage — Playwright Page のラッパー。ライブスクレイピング用。
  • LocalPage — 保存済みHTMLファイルをPlayPage風に操作するクラス。selectolaxベースで高速。
  • ユーティリティ関数群 — browse / save_html / append_csv / html_filename / sleep_between

Requirements - 必要条件

  • Python 3.12 or higher
  • Libraries: playwright, selectolax, pandas(自動インストール)
  • Browser binary(別途インストールが必要)

Installation - インストール

pip

pip install quickplay

uv (recommended)

uv add quickplay

ブラウザバイナリを別途インストールしてください。

pip

python -m playwright install chromium

uv

uv run playwright install chromium

Basic Usage - 基本的な使い方

import random
import time
from playwright.sync_api import Page
from quickplay import PlayPage, BasePaths, browse, append_csv, sleep_between

paths = BasePaths(__file__)

def scrape(page: Page) -> None:
    p = PlayPage(page)
    p.goto('https://www.foobarbaz1.jp')

    pref_urls = [p.attr('href', e) for e in p.ss('li.item > ul > li > a')]

    classroom_urls = []
    for i, url in enumerate(pref_urls, 1):
        print(f'{i}/{len(pref_urls)} pref_urls')
        if not p.goto(url):
            continue
        sleep_between(1, 2)
        links = [p.attr('href', e) for e in p.ss('.school-area h4 a')]
        classroom_urls.extend(links)

    for i, url in enumerate(classroom_urls, 1):
        print(f'{i}/{len(classroom_urls)} classroom_urls')
        if not p.goto(url):
            continue
        sleep_between(1, 2)
        row = {
            'URL': page.url,
            '教室名': p.text(p.s('h1 .text01')),
            '住所': p.text(p.s('.item .mapText')),
            '電話番号': p.text(p.s('.item .phoneNumber')),
            'HP': p.attr('href', p.s_in('a', p.next(p.s_re('th', 'ホームページ')))),
        }
        append_csv(paths.from_here('out.csv'), row)

if __name__ == '__main__':
    browse(
        scrape,
        user_agent='Mozilla/5.0 ...',
        block_resources={'image', 'font'},
    )

Save HTML while scraping - スクレイピングしながらHTMLを保存する

from playwright.sync_api import Page
from quickplay import PlayPage, BasePaths, browse, save_html, html_filename, sleep_between

paths = BasePaths(__file__)

def scrape(page: Page) -> None:
    p = PlayPage(page)
    p.goto('https://www.foobarbaz1.jp')

    item_urls = [p.attr('href', e) for e in p.ss('ul.items > li > a')]

    for i, url in enumerate(item_urls, 1):
        print(f'{i}/{len(item_urls)} item_urls')
        if not p.goto(url):
            continue
        sleep_between(1, 2)
        save_html(paths.from_here('html'), html_filename(page.url), page.content())

if __name__ == '__main__':
    browse(scrape, block_resources={'image', 'font'})

Scrape from local HTML files - 保存済みHTMLからスクレイピングしてCSVに出力する

from quickplay import LocalPage, BasePaths, append_csv

paths = BasePaths(__file__)
p = LocalPage()

for path in paths.from_here('html').glob('*.html'):
    if not p.goto(path):
        continue
    row = {
        '商品名': p.text(p.s('h1.product-name')),
        '価格':   p.text(p.s('span.price')),
        'HP':     p.attr('href', p.s_in('a', p.next(p.s_re('th', 'ホームページ')))),
    }
    append_csv(paths.from_here('out.csv'), row)

License - ライセンス

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

quickplay-1.1.0.tar.gz (6.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

quickplay-1.1.0-py3-none-any.whl (6.8 kB view details)

Uploaded Python 3

File details

Details for the file quickplay-1.1.0.tar.gz.

File metadata

  • Download URL: quickplay-1.1.0.tar.gz
  • Upload date:
  • Size: 6.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-requests/2.32.5

File hashes

Hashes for quickplay-1.1.0.tar.gz
Algorithm Hash digest
SHA256 f7a9e8f4a744ba4ee2086bf38b83ecc9be1dedf4e39bb3ba4642c67c5f5e19ed
MD5 68e766ac510c0b3c45266bf5f1a517a3
BLAKE2b-256 6f6d71005f82db4a4c963cc332a2ef8e87dffbf1bfca7c2b9e7c96dd179d2938

See more details on using hashes here.

File details

Details for the file quickplay-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: quickplay-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 6.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-requests/2.32.5

File hashes

Hashes for quickplay-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f7877ddf6a95ca2ac9868e6b32d6a0fdc17242f6caf36722061df11726b11373
MD5 b96493dea7aaad893748c526111ab467
BLAKE2b-256 541803d5940c5534bf6435fc173e0055ff1f14c5e2fba56d1f98b119708be8cc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page