Skip to main content

A small importable Python module.

Project description

nscraper

nscraper is a small Python package scaffolded for two use cases:

  • import it from other projects
  • run it directly with python -m nscraper

License

MIT. You can fork, modify, and reuse it with minimal restrictions as long as the license notice is kept with the software.

Install

pip install nscraper

For development:

uv sync --dev

Use as a module

from nscraper import HttpScraper, ScrapeOptions

options = ScrapeOptions(
    url="https://example.com",
    headers={"Accept": "text/html"},
)

content = HttpScraper(options).scrape()
print(content)

Run the Module

python -m nscraper -u https://example.com -H default

Fetch a URL:

python -m nscraper -u https://example.com -H default
python -m nscraper -u https://example.com -H '{"Accept": "text/html"}'
python -m nscraper -u https://example.com -H default -c cookies.json

Current API

  • nscraper.ScrapeOptions
  • nscraper.BaseScraper
  • nscraper.HttpScraper
  • nscraper.SeleniumBaseScraper
  • nscraper.get_scraper(options: ScrapeOptions) -> BaseScraper
  • nscraper.validate_url(url: str) -> str
  • nscraper.parse_headers(raw_headers: str | None) -> dict[str, str]
  • nscraper.load_cookies_file(path: Path | str | None) -> dict[str, str] | None
  • nscraper.basic_html_transform(content: str) -> str
  • runtime dependency: niquests==3.18.4
  • runtime dependency: justhtml==1.14.0
  • development dependency: pytest

Module Flags

  • -u / --url required
  • -H / --headers required, or default
  • -e / --engine with http or seleniumbase
  • -p / --proxy
  • --timeout default 3
  • -o / --output
  • -c / --cookies-file optional JSON file
  • -t / --transform default raw

Behavior:

  • invalid or malformed URLs raise InvalidUrlError
  • missing or malformed headers raise InvalidHeadersError
  • missing or malformed cookie files raise InvalidCookiesError
  • use -H default to apply the built-in Accept and User-Agent header dict
  • use -c only when you want to send cookies; omit it to keep current behavior
  • output files are always overwritten
  • basic_html removes non-content elements and writes cleaned HTML output

Default User-Agent:

Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/146.0.0.0 Safari/537.36

The package is intentionally minimal so you can extend it into a reusable library and publish it to PyPI.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nscraper-0.1.3.tar.gz (46.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nscraper-0.1.3-py3-none-any.whl (9.4 kB view details)

Uploaded Python 3

File details

Details for the file nscraper-0.1.3.tar.gz.

File metadata

  • Download URL: nscraper-0.1.3.tar.gz
  • Upload date:
  • Size: 46.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for nscraper-0.1.3.tar.gz
Algorithm Hash digest
SHA256 26196dd6609803259139dcc8ca6c16a254c33cac91529ab9b70d306ed227844d
MD5 31f89b83793abb5b7172e23bcdc3b1a1
BLAKE2b-256 ae2bbfdda1da99d3c4d798caf2c64f9ae6782e64143a6318f36da1ad0f17a9f4

See more details on using hashes here.

Provenance

The following attestation bundles were made for nscraper-0.1.3.tar.gz:

Publisher: release.yml on mikerr1/nscraper

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file nscraper-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: nscraper-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 9.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for nscraper-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 01a7ffe4138138c258401ba9c04895deb0cd1685426cfde460d02b1aa6c85f26
MD5 da009eda43490cbd5ce2b2d74c96ac7a
BLAKE2b-256 c84d8e4e026b297e54ceb3822665c28907764432da1209fb6bf7c5e93cb4dd5d

See more details on using hashes here.

Provenance

The following attestation bundles were made for nscraper-0.1.3-py3-none-any.whl:

Publisher: release.yml on mikerr1/nscraper

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page