Add your description here

Project description

nuki

Overview - 概要

nuki is a scraping utility library built on Patchright and selectolax. nuki(抜き)はPatchrightとselectolaxをベースにしたスクレイピングユーティリティライブラリです。

DOM・パーサのラッパーは nuki から、ブラウザ起動は nuki.browser、CSV やログなどの周辺は nuki.utils から import します。

Requirements - 必要条件

Python 3.12 or higher
Libraries: patchright, selectolax, pandas, camoufox（自動インストール）
write_parquet を使う場合は pandas の Parquet エンジンとして pyarrow（または fastparquet）が必要です。
Browser binaries（別途インストールが必要）

Installation - インストール

pip

pip install nuki

uv (推奨)

uv add nuki

ブラウザバイナリを別途インストールしてください。

Patchright（Chromium）

pip

python -m patchright install chromium

uv (推奨)

uv run patchright install chromium

Camoufox（Firefox）

pip

camoufox fetch

uv (推奨)

uv run camoufox fetch

メソッド

`nuki.browser`

patchright_page(user_data_dir) … コンテキストマネージャ。Patchright（Chrome チャネル・永続コンテキスト）で Page を開き、with ブロック内に渡す。
user_data_dir は 'C:\Users\あなた\...\User Data' のような文字列（chrome://version/ で確認可）。
camoufox_page(locale=...) … 同様に Camoufox（Firefox）で Page を開く。bot 検知が厳しいサイト向け。
例: with camoufox_page(locale='en-US,en') as page:
ロケールのデフォルトは 'ja-JP,ja'。

`nuki.utils`

ログ・相対パス・CSV・Parquet・HTML 保存など（各関数はモジュールの docstring / ソース参照）。

Basic Usage - 基本的な使い方

from nuki import npage
from nuki.browser import patchright_page
from nuki.utils import add_log_file, append_csv, from_here, random_sleep

fh = from_here(__file__)
add_log_file(fh('log/scraping.log'))

user_data_dir = r'C:\Users\あなたのユーザ名\AppData\Local\Google\Chrome\User Data'
with patchright_page(user_data_dir) as page:
    p = npage(page)
    p.goto('https://www.foobarbaz1.jp')

    pref_urls = p.ss('li.item > ul > li > a').abs_urls()

    classroom_urls = []
    for i, url in enumerate(pref_urls, 1):
        print(f'{i}/{len(pref_urls)} pref_urls')
        if not p.goto(url):
            continue
        random_sleep(1, 2)
        classroom_urls.extend(p.ss('.school-area h4 a').abs_urls())

    for i, url in enumerate(classroom_urls, 1):
        print(f'{i}/{len(classroom_urls)} classroom_urls')
        if not p.goto(url):
            continue
        random_sleep(1, 2)
        append_csv(fh('csv/out.csv'), {
            'URL': page.url,
            '教室名': p.s('h1 .text01').text_content(),
            '住所': p.s('.item .mapText').text_content(),
            '電話番号': p.s('.item .phoneNumber').text_content(),
            'HP': p.ss('th').re('ホームページ').first().next().s('a').url(),
        })

Save HTML while scraping - スクレイピングしながらHTMLを保存する

from nuki import npage
from nuki.browser import camoufox_page
from nuki.utils import add_log_file, append_csv, from_here, hash_name, random_sleep, save_html

fh = from_here(__file__)
add_log_file(fh('log/scraping.log'))

with camoufox_page() as page:
    ctx = {}
    p = npage(page)
    p.goto('https://www.foobarbaz1.jp')

    ctx['アイテムURLs'] = p.ss('ul.items > li > a').abs_urls()

    for i, url in enumerate(ctx['アイテムURLs'], 1):
        print(f"{i}/{len(ctx['アイテムURLs'])} アイテムURLs")
        if not p.goto(url):
            continue
        random_sleep(1, 2)
        if p.wait('#logo', timeout=10000).unwrap() is None:
            continue
        file_name = f'{hash_name(url)}.html'
        if not save_html(fh('html') / file_name, page.content()):
            continue
        append_csv(fh('outurlhtml.csv'), {
            'URL': url,
            'HTML': file_name,
        })

Scrape from local HTML files - 保存済みHTMLからスクレイピングしてParquetに出力する

import pandas as pd

from nuki import nparser
from nuki.utils import add_log_file, from_here, parse_html, write_parquet

fh = from_here(__file__)
add_log_file(fh('log/scraping.log'))

df = pd.read_csv(fh('outurlhtml.csv'))
results = []
for i, (url, path) in enumerate(zip(df['URL'], df['HTML']), 1):
    print(i)
    if not (parser := parse_html(fh('html') / path)):
        continue
    p = nparser(parser)
    results.append({
        'URL': url,
        '教室名': p.s('h1 .text02').text(),
        '住所': p.s('.item .mapText').text(),
        '所在地': p.ss('dt').re(r'所在地').first().next('dd').text(),
    })
write_parquet(fh('outhtml.parquet'), results)

License - ライセンス

MIT

Project details

Release history Release notifications | RSS feed

0.1.10

May 4, 2026

0.1.9

May 3, 2026

0.1.8

May 3, 2026

0.1.7

Apr 30, 2026

0.1.6

Apr 29, 2026

0.1.5

Apr 27, 2026

0.1.4

Apr 27, 2026

This version

0.1.3

Apr 10, 2026

0.1.2

Apr 7, 2026

0.1.1

Apr 1, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nuki-0.1.3.tar.gz (8.5 kB view details)

Uploaded Apr 10, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

nuki-0.1.3-py3-none-any.whl (7.2 kB view details)

Uploaded Apr 10, 2026 Python 3

File details

Details for the file nuki-0.1.3.tar.gz.

File metadata

Download URL: nuki-0.1.3.tar.gz
Upload date: Apr 10, 2026
Size: 8.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: python-requests/2.33.1

File hashes

Hashes for nuki-0.1.3.tar.gz
Algorithm	Hash digest
SHA256	`94c0350c73261db79110a1c165377578fc67cbe6aa3cd2e711cd92b0adbbecc2`
MD5	`61502d2e6e853154e04c81e78b0124c4`
BLAKE2b-256	`515ed7f3a012e1e97d2a411690bed5a3bcc893aa523abccf35fbd9a6ca3fa770`

See more details on using hashes here.

File details

Details for the file nuki-0.1.3-py3-none-any.whl.

File metadata

Download URL: nuki-0.1.3-py3-none-any.whl
Upload date: Apr 10, 2026
Size: 7.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: python-requests/2.33.1

File hashes

Hashes for nuki-0.1.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`56aa15fbbc55c6a45533600fdd0c1e8357b38841a7a828da326b8f2179755e0c`
MD5	`d381754fa0317f75f81430d196b9cbdb`
BLAKE2b-256	`5f9a6e7391c1288e8647755fa5ef82c8fb4f59dec0ac7441b7596202ecc57da2`

See more details on using hashes here.

nuki 0.1.3

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

nuki

Overview - 概要

Requirements - 必要条件

Installation - インストール

pip

uv (推奨)

Patchright（Chromium）

pip

uv (推奨)

Camoufox（Firefox）

pip

uv (推奨)

メソッド

nuki.browser

nuki.utils

Basic Usage - 基本的な使い方

Save HTML while scraping - スクレイピングしながらHTMLを保存する

Scrape from local HTML files - 保存済みHTMLからスクレイピングしてParquetに出力する

License - ライセンス

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`nuki.browser`

`nuki.utils`