A small importable Python module.

Project description

nscraper

nscraper is a small Python package scaffolded for two use cases:

import it from other projects
run it directly with python -m nscraper

License

MIT. You can fork, modify, and reuse it with minimal restrictions as long as the license notice is kept with the software.

Install

pip install nscraper

For development:

uv sync --dev

Use as a module

from nscraper import HttpScraper, ScrapeOptions

options = ScrapeOptions(
    url="https://example.com",
    headers={"Accept": "text/html"},
)

content = HttpScraper(options).scrape()
print(content)

Run the Module

python -m nscraper -u https://example.com -H default

Fetch a URL:

python -m nscraper -u https://example.com -H default
python -m nscraper -u https://example.com -H '{"Accept": "text/html"}'
python -m nscraper -u https://example.com -H default -c cookies.json

Current API

nscraper.ScrapeOptions
nscraper.BaseScraper
nscraper.HttpScraper
nscraper.SeleniumBaseScraper
nscraper.get_scraper(options: ScrapeOptions) -> BaseScraper
nscraper.validate_url(url: str) -> str
nscraper.parse_headers(raw_headers: str | None) -> dict[str, str]
nscraper.load_cookies_file(path: Path | str | None) -> dict[str, str] | None
nscraper.basic_html_transform(content: str) -> str
runtime dependency: niquests==3.18.4
runtime dependency: justhtml==1.14.0
development dependency: pytest

Module Flags

-u / --url required
-H / --headers required, or default
-e / --engine with http or seleniumbase
-p / --proxy
--timeout default 3
-o / --output
-c / --cookies-file optional JSON file
-t / --transform default raw

Behavior:

invalid or malformed URLs raise InvalidUrlError
missing or malformed headers raise InvalidHeadersError
missing or malformed cookie files raise InvalidCookiesError
use -H default to apply the built-in Accept and User-Agent header dict
use -c only when you want to send cookies; omit it to keep current behavior
output files are always overwritten
basic_html removes non-content elements and writes cleaned HTML output

Default User-Agent:

Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/146.0.0.0 Safari/537.36

The package is intentionally minimal so you can extend it into a reusable library and publish it to PyPI.

GitHub And PyPI Release Flow

pull requests to master run tests in GitHub Actions
published GitHub releases run tests, build sdist and wheel, then publish to PyPI
the publish workflow is in .github/workflows/release.yml

Before the release workflow can publish, configure Trusted Publishing in PyPI:

create the project on PyPI if it does not exist yet
in PyPI, open the project publishing settings
add a trusted publisher for this GitHub repository
use the release workflow on the master branch

After that, the normal flow is:

push code to GitHub
merge to master
create a GitHub release for the version tag
let GitHub Actions test, build, and publish the package

Project details

Release history Release notifications | RSS feed

0.1.5

Apr 14, 2026

0.1.4

Apr 6, 2026

0.1.3

Apr 5, 2026

This version

0.1.0

Apr 5, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nscraper-0.1.0.tar.gz (11.0 kB view details)

Uploaded Apr 5, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

nscraper-0.1.0-py3-none-any.whl (8.6 kB view details)

Uploaded Apr 5, 2026 Python 3

File details

Details for the file nscraper-0.1.0.tar.gz.

File metadata

Download URL: nscraper-0.1.0.tar.gz
Upload date: Apr 5, 2026
Size: 11.0 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for nscraper-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`bd16ae699118f1c115fdd41ae3aa7465a4fb1bb2bc5f270b6ecd2b11da772cfc`
MD5	`21a9a63ae834f01c9c9f8545351b1988`
BLAKE2b-256	`4637c350dcc902bab07483e0dc1e342b3b961423574e1cbe1e17b434248673bc`

See more details on using hashes here.

Provenance

The following attestation bundles were made for nscraper-0.1.0.tar.gz:

Publisher: release.yml on mikerr1/nscraper

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: nscraper-0.1.0.tar.gz
- Subject digest: bd16ae699118f1c115fdd41ae3aa7465a4fb1bb2bc5f270b6ecd2b11da772cfc
- Sigstore transparency entry: 1239309056
- Sigstore integration time: Apr 5, 2026
Source repository:
- Permalink: mikerr1/nscraper@97aca31a5843c9b6c197815b54d712e115449092
- Branch / Tag: refs/tags/0.1.1
- Owner: https://github.com/mikerr1
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@97aca31a5843c9b6c197815b54d712e115449092
- Trigger Event: release

File details

Details for the file nscraper-0.1.0-py3-none-any.whl.

File metadata

Download URL: nscraper-0.1.0-py3-none-any.whl
Upload date: Apr 5, 2026
Size: 8.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for nscraper-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d83169d9a805dbeac8f609bf34aa837cad12da22ae02aeebcbfdd605f1721a88`
MD5	`6c6f79180a6d1fc6d2beef66abe934d7`
BLAKE2b-256	`47b16e5dd9a5aec002207bbabc7e0864463ccab735157136029d7a2fb756335c`

See more details on using hashes here.

Provenance

The following attestation bundles were made for nscraper-0.1.0-py3-none-any.whl:

Publisher: release.yml on mikerr1/nscraper

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: nscraper-0.1.0-py3-none-any.whl
- Subject digest: d83169d9a805dbeac8f609bf34aa837cad12da22ae02aeebcbfdd605f1721a88
- Sigstore transparency entry: 1239309059
- Sigstore integration time: Apr 5, 2026
Source repository:
- Permalink: mikerr1/nscraper@97aca31a5843c9b6c197815b54d712e115449092
- Branch / Tag: refs/tags/0.1.1
- Owner: https://github.com/mikerr1
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@97aca31a5843c9b6c197815b54d712e115449092
- Trigger Event: release

nscraper 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

nscraper

License

Install

Use as a module

Run the Module

Current API

Module Flags

GitHub And PyPI Release Flow

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance