A enhanced parser which can extract title, content, images and form from html pages, inspired by jparser
Project description
from __future__ import print_function
import requests
from jableparser import PageModel
html = requests.get("https://hollywoodmask.com/entertainment/jason-carroll-cnn-age-gay.html", verify=False).text
pm = PageModel(html)
result = pm.extract()
print("==title==")
print(result['title')
print("==content==")
for x in result['content']:
if x['type'] == 'text':
print(x['data'])
if x['type'] == 'image':
print("[IMAGE]", x['data']['src'])
if x['type'] == 'html':
print("Raw table string: )
print(x['data'])
print("Processed table data if two columns: )
print(pm.processtable(x['data']))
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
jableparser-0.0.1.tar.gz
(5.5 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file jableparser-0.0.1.tar.gz.
File metadata
- Download URL: jableparser-0.0.1.tar.gz
- Upload date:
- Size: 5.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.3 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.6.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f35847b0d2f895e9f81fed46911aa0481282b675b0a6f290fbbd14670d551912
|
|
| MD5 |
4c7d5d1948f22e08110b9a5043b64d80
|
|
| BLAKE2b-256 |
bce3170ea70bcfb8537941d84fa4216ddb1395ba455d8b721e2dd4a1806bed7a
|
File details
Details for the file jableparser-0.0.1-py3-none-any.whl.
File metadata
- Download URL: jableparser-0.0.1-py3-none-any.whl
- Upload date:
- Size: 7.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.3 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.6.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a86adfdd1511cf39ace173e14eb21b40c72ab9182e624ff12bec57a955feb529
|
|
| MD5 |
44d9972f81646a8675bb23b5f0efb8d1
|
|
| BLAKE2b-256 |
20300f57a4833fd046a86eacd2fcf8aa12684f5f62eaf57307bcac85948a25e3
|