Skip to main content

A enhanced parser which can extract title, content, images and form from html pages, inspired by jparser

Project description

from __future__ import print_function
import requests
from jableparser import PageModel
html = requests.get("https://hollywoodmask.com/entertainment/jason-carroll-cnn-age-gay.html", verify=False).text
pm = PageModel(html)
result = pm.extract()

print("==title==")
print(result['title')
print("==content==")
for x in result['content']:
    if x['type'] == 'text':
        print(x['data'])
    if x['type'] == 'image':
        print("[IMAGE]", x['data']['src'])
    if x['type'] == 'html':
        print("Raw table string: )
        print(x['data'])
        print("Processed table data if two columns: )
        print(pm.processtable(x['data']))

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jableparser-0.0.1.tar.gz (5.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

jableparser-0.0.1-py3-none-any.whl (7.6 kB view details)

Uploaded Python 3

File details

Details for the file jableparser-0.0.1.tar.gz.

File metadata

  • Download URL: jableparser-0.0.1.tar.gz
  • Upload date:
  • Size: 5.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.3 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.6.3

File hashes

Hashes for jableparser-0.0.1.tar.gz
Algorithm Hash digest
SHA256 f35847b0d2f895e9f81fed46911aa0481282b675b0a6f290fbbd14670d551912
MD5 4c7d5d1948f22e08110b9a5043b64d80
BLAKE2b-256 bce3170ea70bcfb8537941d84fa4216ddb1395ba455d8b721e2dd4a1806bed7a

See more details on using hashes here.

File details

Details for the file jableparser-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: jableparser-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 7.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.3 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.6.3

File hashes

Hashes for jableparser-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 a86adfdd1511cf39ace173e14eb21b40c72ab9182e624ff12bec57a955feb529
MD5 44d9972f81646a8675bb23b5f0efb8d1
BLAKE2b-256 20300f57a4833fd046a86eacd2fcf8aa12684f5f62eaf57307bcac85948a25e3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page