A simple way to transform a HTML file or URL to structured data.
Project description
A simple way to transform a HTML file or URL to structured data. For example:
>>> ## start the console >>> from html2data import html2data >>> html = """<!DOCTYPE html><html lang="en"><head></head> <body> <h1><b>Title</b></h1> <div class="description">This is not a valid HTML </body> </html>"""
>>> config = { 'map': [ ['body_title', u'//h1/b/text()'], ['description', u'//div[@class="description"]/text()'], ] }
>>> handler = html2data() >>> received_obj = handler.load(html = html, config=config) >>> print received_obj { 'body_title': 'Title', 'description': 'This is not a valid HTML'}
- To use it you will need:
lxml 2.0+
httplib2
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
html2data-0.2.tar.gz
(2.1 kB
view hashes)