Skip to main content

The developer‑friendly web content extractor with CSS selectors.

Project description

NitroWebfetch

Extract web content, cleanly.

NitroWebfetch – the developer‑friendly web content extractor with CSS selectors.

This project is in alpha phase.

Features

  • Extracts content from web pages using CSS selectors
  • Converts HTML to clean Markdown format
  • Fallback selectors for maximum compatibility
  • Command-line interface with various options
  • Built on Playwright for reliable web scraping
  • Completely free (open source, MIT license)

Ideas for next steps

  • Add support for multiple output formats (JSON, plain text)
  • Batch processing for multiple URLs
  • Custom user-agent and headers configuration
  • Integration with NitroDigest for web page summarization
  • Support for authentication and cookies
  • Content filtering and cleaning options

Usage

Prerequisites

To run this tool, you need to have Python installed on your local machine.

Installation

Install NitroWebfetch via pip:

pip install nitrowebfetch-cli
playwright install firefox

For development installation:

cd Projects/Nitrowebfetch
pip install -e .
playwright install firefox

Basic Usage

Run NitroWebfetch to extract content from web pages:

nitrowebfetch <url> > <output_file>

Examples

Extract article content from a webpage and save it to a file:

nitrowebfetch https://example.com/article > article.md

Extract content using a custom CSS selector:

nitrowebfetch https://example.com --selector ".main-content" > content.md

Get HTML output instead of Markdown:

nitrowebfetch https://example.com --format html > content.html

Command Line Arguments

You can customize the extraction process using command line arguments:

nitrowebfetch \
    --selector ".article-body" \
    --format md \
    https://example.com

Available arguments:

  • url: URL to fetch content from (required)
  • --selector: CSS selector to use for content extraction (default: article)
  • --format: Format of output content - 'md' for Markdown or 'html' for raw HTML (default: md)

Fallback Selectors

If the primary selector doesn't match any elements, NitroWebfetch automatically tries these alternatives:

  • article
  • main
  • .article
  • .content
  • #content
  • .post
  • .entry-content

Contributing

Do you want to contribute to this tool? Check the Contributing page:

Getting started

Report an issue

Found an issue? You can easily report it here:

https://github.com/Frodigo/garage/issues/new

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nitrowebfetch_cli-0.1.0.tar.gz (3.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nitrowebfetch_cli-0.1.0-py3-none-any.whl (4.1 kB view details)

Uploaded Python 3

File details

Details for the file nitrowebfetch_cli-0.1.0.tar.gz.

File metadata

  • Download URL: nitrowebfetch_cli-0.1.0.tar.gz
  • Upload date:
  • Size: 3.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for nitrowebfetch_cli-0.1.0.tar.gz
Algorithm Hash digest
SHA256 5248c0cb19a36c24272cbb81e573df63c615971a4f70c9ed747f7c51acfa3764
MD5 454b90bc7cba0555aa37e0ef944ebd23
BLAKE2b-256 03f5f9a7ef65b86fd3945363f00c68b77e7d95d33b09673a9147043d617a36b9

See more details on using hashes here.

Provenance

The following attestation bundles were made for nitrowebfetch_cli-0.1.0.tar.gz:

Publisher: publish_package.yml on Frodigo/garage

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file nitrowebfetch_cli-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for nitrowebfetch_cli-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1bdcc1f4cafa3f9f89527367f870053555e02411da8830a067ad62d1c5cf1794
MD5 36426abc2e0a04505d0e3884e8097fcf
BLAKE2b-256 19fa5007a90186892f190b9e83ffa35e6d983e31a47d8311571687cb1098f615

See more details on using hashes here.

Provenance

The following attestation bundles were made for nitrowebfetch_cli-0.1.0-py3-none-any.whl:

Publisher: publish_package.yml on Frodigo/garage

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page