Skip to main content

A tool to fetch the main content of a webpage and convert it to Markdown or plain text.

Project description

webclipper

webclipper is a simple Python tool to fetch the main content of a webpage and convert it into clean, readable Markdown or plain text. It removes clutter like ads, headers, and navigation bars, letting you focus on the article's text.

It can be used as a command-line application for quick conversions in your terminal or as a library in your own Python projects.

Features

  • Content Extraction: Uses readibility to identify and extract the primary article or content from a URL.
  • Dual Output: Convertis cleaned HTML to either Markdown or plain text.
  • Flexible Usage: Works as both a standalone command-line tool and an importable Python library.

Installation

To install webclipper, you can clone the repository and install it using pip.


# Clone the repository (if you haven't already)

git clone [https://github.com/your-username/webclipper.git](https://www.google.com/search?q=https://github.com/your-username/webclipper.git)
cd webclipper

# Install the package in editable mode

# (Your changes to the source code will be reflected immediately)

pip install -e .

This will install the package and its dependencies, and also make the webclipper command available in your terminal.

How to Use

As a Command-Line App

Once installed, you can use the webclipper command directly from your terminal. The output is sent to standard output, so you can easily redirect it to a file.

Basic Usage (get plain text):


webclipper "[https://en.wikipedia.org/wiki/Python\_(programming\_language](https://en.wikipedia.org/wiki/Python_\(programming_language\))"

Get Markdown Output:

Use the -m or --markdown flag.


webclipper "[https://www.some-article-url.com](https://www.google.com/search?q=https://www.some-article-url.com)" --markdown

Include the Source URL:

Use the -i or --include-url flag to append the source URL at the end of the output.


webclipper "[https://www.some-article-url.com](https://www.google.com/search?q=https://www.some-article-url.com)" -m -i

Redirect to a File:

You can save the output using standard shell redirection.


webclipper "[https://www.some-article-url.com](https://www.google.com/search?q=https://www.some-article-url.com)" \> my\_article.txt

As a Library

You can also import webclipper into your own Python scripts to integrate its functionality. The get_url_content function is all you need.

from webclipper import get\_url\_content

# The URL of the article you want to clip

article\_url = "[https://en.wikipedia.org/wiki/Web\_scraping](https://en.wikipedia.org/wiki/Web_scraping)"

try:
    # Get the content as Markdown
    markdown\_content = get\_url\_content(article\_url, output\_format='markdown')
    print("--- MARKDOWN ---")
    print(markdown\_content)

    # Get the content as plain text
    text_content = get_url_content(article_url, output_format='text')
    print("\n--- PLAIN TEXT ---")
    print(text_content)

except Exception as e:
    print(f"An error occurred: {e}")

License

This project is licensed under the MIT License. See the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

webclipper-0.1.2.tar.gz (4.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

webclipper-0.1.2-py3-none-any.whl (4.4 kB view details)

Uploaded Python 3

File details

Details for the file webclipper-0.1.2.tar.gz.

File metadata

  • Download URL: webclipper-0.1.2.tar.gz
  • Upload date:
  • Size: 4.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.2

File hashes

Hashes for webclipper-0.1.2.tar.gz
Algorithm Hash digest
SHA256 641af02a8595e9d7c1392b059e23958e1eb38d55365ede9dba78dc1fc518f776
MD5 d9ae68f702b5b9c68164d5d7e55df430
BLAKE2b-256 5e000aeb8a428aab2b0df3b6278d3ae71c725029f84b19e8d91cafcd935d9ad0

See more details on using hashes here.

File details

Details for the file webclipper-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: webclipper-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 4.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.2

File hashes

Hashes for webclipper-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 785c6b5b814eff3dd0fca7aa6ef1eef163ec75e6cdd6c78ce8730ae92293bf22
MD5 0a8f1e5f7a72b7f0ae8812db24226f70
BLAKE2b-256 05fd7ffce109047a3bf2da9083b0b5247c3d9c833d0c54e9fcac759201ab4a4f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page