Skip to main content

Get title and main body text from an article in a web page

Project description

Bug Patched
Home-page: https://github.com/RobinZhangWhyCoding/htmltext
Author: Robin Zhang
Author-email: whycoding@outlook.com
License: UNKNOWN
Description: HTMLText
=========
HTMLText is a simple tool to get main body text of articles in HTML web pages, such as news,bolg .etc.

Installation:
-------------
pip install htmltext

Usage:
------
from htmltext import HTMLText

title, text = HTMLText(html_data)

Example:
--------
import requests
from htmltext import HTMLText

r = requests.get(url_of_the_article)
title, text = HTMLText(r.content)
print(title)
print(text)



Platform: UNKNOWN
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 2
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Operating System :: OS Independent
Description-Content-Type: text/markdown

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

htmltext-0.0.7.tar.gz (2.5 kB view details)

Uploaded Source

Built Distribution

htmltext-0.0.7-py3-none-any.whl (4.0 kB view details)

Uploaded Python 3

File details

Details for the file htmltext-0.0.7.tar.gz.

File metadata

  • Download URL: htmltext-0.0.7.tar.gz
  • Upload date:
  • Size: 2.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.19.1 setuptools/40.4.3 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/3.7.0

File hashes

Hashes for htmltext-0.0.7.tar.gz
Algorithm Hash digest
SHA256 d42a210fd7da844275a1a24ea96d992a32c385ed5d8a62daf6345c967277e742
MD5 352fa5debfcbaab48bbce5e7450f75a8
BLAKE2b-256 fe4673c52d6d609da65d7dcd07f2418a4712dd6f1969902b88242ec261217b9a

See more details on using hashes here.

File details

Details for the file htmltext-0.0.7-py3-none-any.whl.

File metadata

  • Download URL: htmltext-0.0.7-py3-none-any.whl
  • Upload date:
  • Size: 4.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.19.1 setuptools/40.4.3 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/3.7.0

File hashes

Hashes for htmltext-0.0.7-py3-none-any.whl
Algorithm Hash digest
SHA256 7dabcfc30125bd96eae7e2ef49944340e33f8a27430393fc691097189721d35d
MD5 342e78c093e5c590f73665dc6aabc22c
BLAKE2b-256 d71cf6d5062c3667c8ab398ab8d016424bc2d88fe2a48fee2cd61e52d3bad61b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page