A small importable Python module.

These details have not been verified by PyPI

Project description

nscraper

nscraper is a small Python package scaffolded for two use cases:

import it from other projects
run it directly with python -m nscraper

License

MIT. You can fork, modify, and reuse it with minimal restrictions as long as the license notice is kept with the software.

Install

pip install nscraper

For development:

uv sync --dev

Use as a module

from nscraper import HttpScraper, ScrapeOptions

options = ScrapeOptions(
    url="https://example.com",
    headers={"Accept": "text/html"},
)

content = HttpScraper(options).scrape()
print(content)

Run the Module

python -m nscraper -u https://example.com -H default

Fetch a URL:

python -m nscraper -u https://example.com -H default
python -m nscraper -u https://example.com -H '{"Accept": "text/html"}'
python -m nscraper -u https://example.com -H default -c cookies.json
python -m nscraper -u https://example.com -H default -t fast -o ~/scraped_data/example.html
python -m nscraper -u https://example.com -H default -o
python -m nscraper -u https://example.com -H default --pretty --print
python -m nscraper -u https://httpbin.org/get -H default -o --pretty --print
python -m nscraper -u https://example.com -H default -t basic
python -m nscraper -u https://example.com -H default --print
python -m nscraper -u https://example.com -H default -o ~/scraped_data/example.html --print

Current API

nscraper.ScrapeOptions
nscraper.BaseScraper
nscraper.HttpScraper
nscraper.SeleniumBaseScraper
nscraper.get_scraper(options: ScrapeOptions) -> BaseScraper
nscraper.validate_url(url: str) -> str
nscraper.parse_headers(raw_headers: str | None) -> dict[str, str]
nscraper.load_cookies_file(path: Path | str | None) -> dict[str, str] | None
nscraper.fast_html_transform(content: str) -> str
nscraper.basic_html_transform(content: str) -> str
runtime dependency: niquests==3.18.4
runtime dependency: justhtml==1.14.0
development dependency: pytest

Module Flags

-u / --url required
-H / --headers required, or default
-e / --engine with http or seleniumbase
-p / --proxy
--timeout default 3
-o / --output writes to a file; bare -o uses automatic output, explicit paths must be absolute
--print prints the result to stdout
--pretty pretty-prints the final HTML output
-c / --cookies-file optional JSON file
-t / --transform with raw, basic, or fast; optional
-d / --debug compatibility flag; runtime status lines are printed by default

Behavior:

invalid or malformed URLs raise InvalidUrlError
missing or malformed headers raise InvalidHeadersError
missing or malformed cookie files raise InvalidCookiesError
use -H default to apply the built-in Accept and User-Agent header dict
use -c only when you want to send cookies; omit it to keep current behavior
no transform runs unless -t / --transform is explicitly provided
no HTML is printed unless --print is provided
when --output and --print are both provided, stdout prints the written file content
output files are always overwritten
missing parent directories for output files are created automatically
bare -o writes to .nscraper/<netloc>/<path>.<ext>
bare -o uses index for root URLs such as /
bare -o preserves nested URL path segments as directories
bare -o appends a short query hash when the URL contains a query string
explicit output paths must be absolute; relative paths fail immediately
auto-generated output extensions are content-aware: HTML-like responses use .html, JSON responses use .json
--pretty formats the final response after the selected transform mode is applied; JSON responses are pretty-printed as JSON
raw returns the fetched supported response with no cleanup
fast removes a small set of noisy elements such as script, style, noscript, iframe, and template for HTML responses
basic performs heavier cleanup, including hidden elements, head cleanup, and ad-like selectors for HTML responses
response handling is classified by content type; only HTML and JSON responses are supported
unsupported content types fail immediately before transform or output is written
runtime status lines include per-step timings for request, transform, pretty-formatting, and file write operations

Default User-Agent:

Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/146.0.0.0 Safari/537.36

The package is intentionally minimal so you can extend it into a reusable library and publish it to PyPI.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.1.5

Apr 14, 2026

This version

0.1.4

Apr 6, 2026

0.1.3

Apr 5, 2026

0.1.0

Apr 5, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nscraper-0.1.4.tar.gz (54.0 kB view details)

Uploaded Apr 6, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

nscraper-0.1.4-py3-none-any.whl (16.4 kB view details)

Uploaded Apr 6, 2026 Python 3

File details

Details for the file nscraper-0.1.4.tar.gz.

File metadata

Download URL: nscraper-0.1.4.tar.gz
Upload date: Apr 6, 2026
Size: 54.0 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for nscraper-0.1.4.tar.gz
Algorithm	Hash digest
SHA256	`fe5fb466babcf503fe65a314761da1f2506d4df7c6bb0dc58e6198a599dda32b`
MD5	`324a54b91b6e2f642b62eaee2be08729`
BLAKE2b-256	`f782d22222ec5781a676a7af2e6d9d9e43f3cab735c034cf1fda72008d2f1140`

See more details on using hashes here.

Provenance

The following attestation bundles were made for nscraper-0.1.4.tar.gz:

Publisher: release.yml on mikerr1/nscraper

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: nscraper-0.1.4.tar.gz
- Subject digest: fe5fb466babcf503fe65a314761da1f2506d4df7c6bb0dc58e6198a599dda32b
- Sigstore transparency entry: 1242469192
- Sigstore integration time: Apr 6, 2026
Source repository:
- Permalink: mikerr1/nscraper@8c83574ed2e826003bae40da6d93b362426ab6a8
- Branch / Tag: refs/tags/v0.1.4
- Owner: https://github.com/mikerr1
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@8c83574ed2e826003bae40da6d93b362426ab6a8
- Trigger Event: release

File details

Details for the file nscraper-0.1.4-py3-none-any.whl.

File metadata

Download URL: nscraper-0.1.4-py3-none-any.whl
Upload date: Apr 6, 2026
Size: 16.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for nscraper-0.1.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`24b3d11bcfe910c1fcd34af2270c192cc9e3c50c14ad3ca2963e570e69efe4dc`
MD5	`e342cd77d2501613502c98aaddebf4c5`
BLAKE2b-256	`109355990183ae14411eecde3f09b3c4f657146d3cfe422851659be09bc428eb`

See more details on using hashes here.

Provenance

The following attestation bundles were made for nscraper-0.1.4-py3-none-any.whl:

Publisher: release.yml on mikerr1/nscraper

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: nscraper-0.1.4-py3-none-any.whl
- Subject digest: 24b3d11bcfe910c1fcd34af2270c192cc9e3c50c14ad3ca2963e570e69efe4dc
- Sigstore transparency entry: 1242469255
- Sigstore integration time: Apr 6, 2026
Source repository:
- Permalink: mikerr1/nscraper@8c83574ed2e826003bae40da6d93b362426ab6a8
- Branch / Tag: refs/tags/v0.1.4
- Owner: https://github.com/mikerr1
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@8c83574ed2e826003bae40da6d93b362426ab6a8
- Trigger Event: release

nscraper 0.1.4

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

nscraper

License

Install

Use as a module

Run the Module

Current API

Module Flags

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance