Skip to main content

A package for pulling news articles directly from a website given the URL

Project description

Website Reader Library

The Website Reader Library is a Python library that allows you to read the HTML content of a website and extract the core text contents (usually the article for news pages). It uses the popular Python libraries requests, re, and BeautifulSoup.

Installation

You can install the Website Reader Library using pip:

pip install website-reader

python Copy code

Usage

from ArtiPull import read_website

# Provide the URL of the website you want to read
url = "https://example.com"
text = read_website(url)

# Extracted text content from the inner-most HTML tags
print(text)

Functionality

The read_website(url) function takes a URL as input and returns the text content from the inner-most HTML tags of the website. It uses requests library to make a GET request to the URL, BeautifulSoup library to parse the HTML content, and regular expressions (re) to extract the text content from inner-most tags.

License

This library is released under the MIT License. See LICENSE for more information.

Contributing

If you find any issues or have suggestions for improvements, please feel free to contribute to this project by opening an issue or submitting a pull request. Contributions are welcome!

Authors

This library is developed and maintained by Nick Kraftor.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

artipull-0.0.1.tar.gz (2.7 kB view details)

Uploaded Source

Built Distribution

artipull-0.0.1-py3-none-any.whl (2.9 kB view details)

Uploaded Python 3

File details

Details for the file artipull-0.0.1.tar.gz.

File metadata

  • Download URL: artipull-0.0.1.tar.gz
  • Upload date:
  • Size: 2.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.11

File hashes

Hashes for artipull-0.0.1.tar.gz
Algorithm Hash digest
SHA256 42d8bd018d3ac9abf2cad18ef45938214780d670419f14fb675c555c2c0a7da4
MD5 a3261f12192b53c496c05810aaf70bed
BLAKE2b-256 9cca2e073929cde6b1d0f10e8872587aaff2370d2511e23dbb8cb8af2ecdc6ac

See more details on using hashes here.

File details

Details for the file artipull-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: artipull-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 2.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.11

File hashes

Hashes for artipull-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 32aa6c5b700deec80467a6451f04b8dcd2e75e991cf7635947bb8ef51f3f004c
MD5 58e0d055e4a722b7ea95230de122df71
BLAKE2b-256 9c3e5787309cbb85866918f57ce1cfa15574523a8223a003534653d8e2939f97

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page