Skip to main content

Naive parser for the Drudge Report

Project description

.. image:: https://travis-ci.org/mrasband/drudge_parser.svg?branch=master

A pretty simple parser for [Drudge Report](http://drudgereport.com). I find the site impossible to look at and wanted a way to more easily digest the information, as I like to keep tabs on lots of differing news outlets.

This library has no external dependencies and supports Python 2.7+ (targeted for Python 3+)


Installation
============

PyPI

pip install drudge_parser


Usage
=====

import drudge_parser

# You can use and feed the parser directly if you would like:
parser = drudge_parser.DrudgeParser()
parser.feed('<html string>')
print(parser.articles)

# Or just use the helper to scrape the current site:
articles = drudge_parser.scrape_page()
print(articles)

Articles is a list of article groupings. These are ordered down the page, so they will always be TOP\_STORY, MAIN\_HEADLINE, followed by COLUMN (note that the column number is not included, there does not appear to be any real distinguishing reason for the separation).

An article grouping looks like:

{
"images": [str], # This often is just empty, never None
"articles": [
# These will be ordered by appearance, in some cases drudge
# builds related titles on each other
{
"title": str,
"href": str
}
], # Never None
"location": str # One of the drudge_parser.Location 'enumeration'
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

drudge_parser-3.0.1.tar.gz (4.3 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page