Multilingual Web Page Content Extractor
Project description
pyce3: Multilingual Web Page Content Extractor for Python3
Introduction
pyce3
is a python3 package for multilingual web page content extraction. It is used to extract the content of article type web pages, such as news, blog posts, etc.
Usage
import pyce3
import requests
url = "http://caijing.chinadaily.com.cn/a/201911/21/WS5dd62455a31099ab995ed438.html"
html = requests.get(url).content
encoding, time, title, text, next_link = pyce3.parse(url, html)
print("编码:"+encoding)
print('='*10)
print("标题:"+title)
print("时间:"+time)
print('='*10)
print("内容:"+text)
print("NextPageLink: ", next_link)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pyce3-1.0.0.tar.gz
(4.2 kB
view details)
Built Distribution
pyce3-1.0.0-py3-none-any.whl
(8.6 kB
view details)
File details
Details for the file pyce3-1.0.0.tar.gz
.
File metadata
- Download URL: pyce3-1.0.0.tar.gz
- Upload date:
- Size: 4.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.0.0 pkginfo/1.4.2 requests/2.21.0 setuptools/40.6.3 requests-toolbelt/0.9.1 tqdm/4.28.1 CPython/3.7.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 13dcd5738bfb65b80fd8f7d4bad29520655b80b07bd433978c28cd72c038e5d3 |
|
MD5 | ccfdbe45d58f20e43d5af29dc0b8aac6 |
|
BLAKE2b-256 | 89aff13ad2c9ad0ed1c9dc6c736119d8187887c22dd241a867f4990402f7e732 |
File details
Details for the file pyce3-1.0.0-py3-none-any.whl
.
File metadata
- Download URL: pyce3-1.0.0-py3-none-any.whl
- Upload date:
- Size: 8.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.0.0 pkginfo/1.4.2 requests/2.21.0 setuptools/40.6.3 requests-toolbelt/0.9.1 tqdm/4.28.1 CPython/3.7.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3923560efdd8fc13dc9c4e3aa7215c6eb8c082e996c525c36bf1e1d7e2775ea4 |
|
MD5 | 2535aa3bea1820dee69cb59664259ea9 |
|
BLAKE2b-256 | 02c384374e67d368af3e1f81936e73fa703e520f6de8212f6b390e8a32e85fb1 |