A small importable Python module.
Project description
nscraper
nscraper is a small Python package scaffolded for two use cases:
- import it from other projects
- run it directly with
python -m nscraper
License
MIT. You can fork, modify, and reuse it with minimal restrictions as long as the license notice is kept with the software.
Install
pip install nscraper
For development:
uv sync --dev
Use as a module
from nscraper import HttpScraper, ScrapeOptions
options = ScrapeOptions(
url="https://example.com",
headers={"Accept": "text/html"},
)
content = HttpScraper(options).scrape()
print(content)
Run the Module
python -m nscraper -u https://example.com -H default
Fetch a URL:
python -m nscraper -u https://example.com -H default
python -m nscraper -u https://example.com -H '{"Accept": "text/html"}'
python -m nscraper -u https://example.com -H default -c cookies.json
Current API
nscraper.ScrapeOptionsnscraper.BaseScrapernscraper.HttpScrapernscraper.SeleniumBaseScrapernscraper.get_scraper(options: ScrapeOptions) -> BaseScrapernscraper.validate_url(url: str) -> strnscraper.parse_headers(raw_headers: str | None) -> dict[str, str]nscraper.load_cookies_file(path: Path | str | None) -> dict[str, str] | Nonenscraper.basic_html_transform(content: str) -> str- runtime dependency:
niquests==3.18.4 - runtime dependency:
justhtml==1.14.0 - development dependency:
pytest
Module Flags
-u/--urlrequired-H/--headersrequired, ordefault-e/--enginewithhttporseleniumbase-p/--proxy--timeoutdefault3-o/--output-c/--cookies-fileoptional JSON file-t/--transformdefaultraw
Behavior:
- invalid or malformed URLs raise
InvalidUrlError - missing or malformed headers raise
InvalidHeadersError - missing or malformed cookie files raise
InvalidCookiesError - use
-H defaultto apply the built-inAcceptandUser-Agentheader dict - use
-conly when you want to send cookies; omit it to keep current behavior - output files are always overwritten
basic_htmlremoves non-content elements and writes cleaned HTML output
Default User-Agent:
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/146.0.0.0 Safari/537.36
The package is intentionally minimal so you can extend it into a reusable library and publish it to PyPI.
GitHub And PyPI Release Flow
- pull requests to
masterrun tests in GitHub Actions - published GitHub releases run tests, build
sdistandwheel, then publish to PyPI - the publish workflow is in .github/workflows/release.yml
Before the release workflow can publish, configure Trusted Publishing in PyPI:
- create the project on PyPI if it does not exist yet
- in PyPI, open the project publishing settings
- add a trusted publisher for this GitHub repository
- use the
releaseworkflow on themasterbranch
After that, the normal flow is:
- push code to GitHub
- merge to
master - create a GitHub release for the version tag
- let GitHub Actions test, build, and publish the package
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file nscraper-0.1.0.tar.gz.
File metadata
- Download URL: nscraper-0.1.0.tar.gz
- Upload date:
- Size: 11.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bd16ae699118f1c115fdd41ae3aa7465a4fb1bb2bc5f270b6ecd2b11da772cfc
|
|
| MD5 |
21a9a63ae834f01c9c9f8545351b1988
|
|
| BLAKE2b-256 |
4637c350dcc902bab07483e0dc1e342b3b961423574e1cbe1e17b434248673bc
|
Provenance
The following attestation bundles were made for nscraper-0.1.0.tar.gz:
Publisher:
release.yml on mikerr1/nscraper
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
nscraper-0.1.0.tar.gz -
Subject digest:
bd16ae699118f1c115fdd41ae3aa7465a4fb1bb2bc5f270b6ecd2b11da772cfc - Sigstore transparency entry: 1239309056
- Sigstore integration time:
-
Permalink:
mikerr1/nscraper@97aca31a5843c9b6c197815b54d712e115449092 -
Branch / Tag:
refs/tags/0.1.1 - Owner: https://github.com/mikerr1
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@97aca31a5843c9b6c197815b54d712e115449092 -
Trigger Event:
release
-
Statement type:
File details
Details for the file nscraper-0.1.0-py3-none-any.whl.
File metadata
- Download URL: nscraper-0.1.0-py3-none-any.whl
- Upload date:
- Size: 8.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d83169d9a805dbeac8f609bf34aa837cad12da22ae02aeebcbfdd605f1721a88
|
|
| MD5 |
6c6f79180a6d1fc6d2beef66abe934d7
|
|
| BLAKE2b-256 |
47b16e5dd9a5aec002207bbabc7e0864463ccab735157136029d7a2fb756335c
|
Provenance
The following attestation bundles were made for nscraper-0.1.0-py3-none-any.whl:
Publisher:
release.yml on mikerr1/nscraper
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
nscraper-0.1.0-py3-none-any.whl -
Subject digest:
d83169d9a805dbeac8f609bf34aa837cad12da22ae02aeebcbfdd605f1721a88 - Sigstore transparency entry: 1239309059
- Sigstore integration time:
-
Permalink:
mikerr1/nscraper@97aca31a5843c9b6c197815b54d712e115449092 -
Branch / Tag:
refs/tags/0.1.1 - Owner: https://github.com/mikerr1
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@97aca31a5843c9b6c197815b54d712e115449092 -
Trigger Event:
release
-
Statement type: