A package for pulling news articles directly from a website given the URL
Project description
Website Reader Library
The Website Reader Library is a Python library that allows you to read the HTML content of a website and extract the core text contents (usually the article for news pages). It uses the popular Python libraries requests
, re
, and BeautifulSoup
.
Installation
You can install the Website Reader Library using pip
:
pip install website-reader
python Copy code
Usage
from ArtiPull import read_website
# Provide the URL of the website you want to read
url = "https://example.com"
text = read_website(url)
# Extracted text content from the inner-most HTML tags
print(text)
Functionality
The read_website(url) function takes a URL as input and returns the text content from the inner-most HTML tags of the website. It uses requests library to make a GET request to the URL, BeautifulSoup library to parse the HTML content, and regular expressions (re) to extract the text content from inner-most tags.
License
This library is released under the MIT License. See LICENSE for more information.
Contributing
If you find any issues or have suggestions for improvements, please feel free to contribute to this project by opening an issue or submitting a pull request. Contributions are welcome!
Authors
This library is developed and maintained by Nick Kraftor.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file artipull-0.0.1.tar.gz
.
File metadata
- Download URL: artipull-0.0.1.tar.gz
- Upload date:
- Size: 2.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.11
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 42d8bd018d3ac9abf2cad18ef45938214780d670419f14fb675c555c2c0a7da4 |
|
MD5 | a3261f12192b53c496c05810aaf70bed |
|
BLAKE2b-256 | 9cca2e073929cde6b1d0f10e8872587aaff2370d2511e23dbb8cb8af2ecdc6ac |
File details
Details for the file artipull-0.0.1-py3-none-any.whl
.
File metadata
- Download URL: artipull-0.0.1-py3-none-any.whl
- Upload date:
- Size: 2.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.11
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 32aa6c5b700deec80467a6451f04b8dcd2e75e991cf7635947bb8ef51f3f004c |
|
MD5 | 58e0d055e4a722b7ea95230de122df71 |
|
BLAKE2b-256 | 9c3e5787309cbb85866918f57ce1cfa15574523a8223a003534653d8e2939f97 |