The developer‑friendly web content extractor with CSS selectors.
Project description
NitroWebfetch
Extract web content, cleanly.
NitroWebfetch – the developer‑friendly web content extractor with CSS selectors.
This project is in alpha phase.
Features
- Extracts content from web pages using CSS selectors
- Converts HTML to clean Markdown format
- Fallback selectors for maximum compatibility
- Command-line interface with various options
- Built on Playwright for reliable web scraping
- Completely free (open source, MIT license)
Ideas for next steps
- Add support for multiple output formats (JSON, plain text)
- Batch processing for multiple URLs
- Custom user-agent and headers configuration
- Integration with NitroDigest for web page summarization
- Support for authentication and cookies
- Content filtering and cleaning options
Usage
Prerequisites
To run this tool, you need to have Python installed on your local machine.
Installation
Install NitroWebfetch via pip:
pip install nitrowebfetch-cli
playwright install firefox
For development installation:
cd Projects/Nitrowebfetch
pip install -e .
playwright install firefox
Basic Usage
Run NitroWebfetch to extract content from web pages:
nitrowebfetch <url> > <output_file>
Examples
Extract article content from a webpage and save it to a file:
nitrowebfetch https://example.com/article > article.md
Extract content using a custom CSS selector:
nitrowebfetch https://example.com --selector ".main-content" > content.md
Get HTML output instead of Markdown:
nitrowebfetch https://example.com --format html > content.html
Command Line Arguments
You can customize the extraction process using command line arguments:
nitrowebfetch \
--selector ".article-body" \
--format md \
https://example.com
Available arguments:
url: URL to fetch content from (required)--selector: CSS selector to use for content extraction (default: article)--format: Format of output content - 'md' for Markdown or 'html' for raw HTML (default: md)
Fallback Selectors
If the primary selector doesn't match any elements, NitroWebfetch automatically tries these alternatives:
articlemain.article.content#content.post.entry-content
Contributing
Do you want to contribute to this tool? Check the Contributing page:
Report an issue
Found an issue? You can easily report it here:
https://github.com/Frodigo/garage/issues/new
License
This project is licensed under the MIT License - see the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file nitrowebfetch_cli-0.1.0.tar.gz.
File metadata
- Download URL: nitrowebfetch_cli-0.1.0.tar.gz
- Upload date:
- Size: 3.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5248c0cb19a36c24272cbb81e573df63c615971a4f70c9ed747f7c51acfa3764
|
|
| MD5 |
454b90bc7cba0555aa37e0ef944ebd23
|
|
| BLAKE2b-256 |
03f5f9a7ef65b86fd3945363f00c68b77e7d95d33b09673a9147043d617a36b9
|
Provenance
The following attestation bundles were made for nitrowebfetch_cli-0.1.0.tar.gz:
Publisher:
publish_package.yml on Frodigo/garage
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
nitrowebfetch_cli-0.1.0.tar.gz -
Subject digest:
5248c0cb19a36c24272cbb81e573df63c615971a4f70c9ed747f7c51acfa3764 - Sigstore transparency entry: 517424288
- Sigstore integration time:
-
Permalink:
Frodigo/garage@f305434954426745edf627dc8e334f23c3ad0e49 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/Frodigo
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish_package.yml@f305434954426745edf627dc8e334f23c3ad0e49 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file nitrowebfetch_cli-0.1.0-py3-none-any.whl.
File metadata
- Download URL: nitrowebfetch_cli-0.1.0-py3-none-any.whl
- Upload date:
- Size: 4.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1bdcc1f4cafa3f9f89527367f870053555e02411da8830a067ad62d1c5cf1794
|
|
| MD5 |
36426abc2e0a04505d0e3884e8097fcf
|
|
| BLAKE2b-256 |
19fa5007a90186892f190b9e83ffa35e6d983e31a47d8311571687cb1098f615
|
Provenance
The following attestation bundles were made for nitrowebfetch_cli-0.1.0-py3-none-any.whl:
Publisher:
publish_package.yml on Frodigo/garage
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
nitrowebfetch_cli-0.1.0-py3-none-any.whl -
Subject digest:
1bdcc1f4cafa3f9f89527367f870053555e02411da8830a067ad62d1c5cf1794 - Sigstore transparency entry: 517424299
- Sigstore integration time:
-
Permalink:
Frodigo/garage@f305434954426745edf627dc8e334f23c3ad0e49 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/Frodigo
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish_package.yml@f305434954426745edf627dc8e334f23c3ad0e49 -
Trigger Event:
workflow_dispatch
-
Statement type: