A enhanced parser which can extract title, content, images and form from html pages, inspired by jparser
Project description
from __future__ import print_function import requests from jableparser import PageModel html = requests.get("https://hollywoodmask.com/entertainment/jason-carroll-cnn-age-gay.html", verify=False).text pm = PageModel(html) result = pm.extract() print("==title==") print(result['title') print("==content==") for x in result['content']: if x['type'] == 'text': print(x['data']) if x['type'] == 'image': print("[IMAGE]", x['data']['src']) if x['type'] == 'html': print("Raw table string: ) print(x['data']) print("Processed table data if two columns: ) print(pm.processtable(x['data']))
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
jableparser-0.0.1.tar.gz
(5.5 kB
view hashes)
Built Distribution
Close
Hashes for jableparser-0.0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a86adfdd1511cf39ace173e14eb21b40c72ab9182e624ff12bec57a955feb529 |
|
MD5 | 44d9972f81646a8675bb23b5f0efb8d1 |
|
BLAKE2b-256 | 20300f57a4833fd046a86eacd2fcf8aa12684f5f62eaf57307bcac85948a25e3 |