Get title and main body text from an article in a web page
Project description
Bug Patched
Home-page: https://github.com/RobinZhangWhyCoding/htmltext
Author: Robin Zhang
Author-email: whycoding@outlook.com
License: UNKNOWN
Description: HTMLText
=========
HTMLText is a simple tool to get main body text of articles in HTML web pages, such as news,bolg .etc.
Installation:
-------------
pip install htmltext
Usage:
------
from htmltext import HTMLText
title, text = HTMLText(html_data)
Example:
--------
import requests
from htmltext import HTMLText
r = requests.get(url_of_the_article)
title, text = HTMLText(r.content)
print(title)
print(text)
Platform: UNKNOWN
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 2
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Operating System :: OS Independent
Description-Content-Type: text/markdown
Home-page: https://github.com/RobinZhangWhyCoding/htmltext
Author: Robin Zhang
Author-email: whycoding@outlook.com
License: UNKNOWN
Description: HTMLText
=========
HTMLText is a simple tool to get main body text of articles in HTML web pages, such as news,bolg .etc.
Installation:
-------------
pip install htmltext
Usage:
------
from htmltext import HTMLText
title, text = HTMLText(html_data)
Example:
--------
import requests
from htmltext import HTMLText
r = requests.get(url_of_the_article)
title, text = HTMLText(r.content)
print(title)
print(text)
Platform: UNKNOWN
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 2
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Operating System :: OS Independent
Description-Content-Type: text/markdown
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
htmltext-0.0.7.tar.gz
(2.5 kB
view details)
Built Distribution
File details
Details for the file htmltext-0.0.7.tar.gz
.
File metadata
- Download URL: htmltext-0.0.7.tar.gz
- Upload date:
- Size: 2.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.19.1 setuptools/40.4.3 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/3.7.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
d42a210fd7da844275a1a24ea96d992a32c385ed5d8a62daf6345c967277e742
|
|
MD5 |
352fa5debfcbaab48bbce5e7450f75a8
|
|
BLAKE2b-256 |
fe4673c52d6d609da65d7dcd07f2418a4712dd6f1969902b88242ec261217b9a
|
File details
Details for the file htmltext-0.0.7-py3-none-any.whl
.
File metadata
- Download URL: htmltext-0.0.7-py3-none-any.whl
- Upload date:
- Size: 4.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.19.1 setuptools/40.4.3 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/3.7.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
7dabcfc30125bd96eae7e2ef49944340e33f8a27430393fc691097189721d35d
|
|
MD5 |
342e78c093e5c590f73665dc6aabc22c
|
|
BLAKE2b-256 |
d71cf6d5062c3667c8ab398ab8d016424bc2d88fe2a48fee2cd61e52d3bad61b
|