This is a pre-production deployment of Warehouse, however changes made here WILL affect the production instance of PyPI.
Project Description
TV Grab Brazil (source - http://tudonoar.uol.com.br/)

It requires the python tv grabber library: pytvgrab-lib
(see http://pytvgrab.sourceforge.net)
It is used to extract information from the source webpage and
outputs in the xmltv format. (version 0.5.15 see
http://xmltv.sourceforge.net)



The guide provider: http://tudonoar.uol.com.br provides the guide
in 3 levels:

Channel Listing: here we get the channel name and url for its
programs list
+-> Channel Programs: here we get the program name and time
for a channel
+-> Program Information: here we get the program info


My approach to grab the guide
=============================
(To be used by new grabbers developers)

To parse the web guide I use a Customized HTMLParser
(customizedparser) that is a parser that ignores some tags,
parsing just the wanted ones, like <table>, <tr>, <td>, <a>; and
just some attributes, like href, ... This class map the HTML to a
python equivalent structure based on the Tag class, which has a
name, attributes, contents data and children.

Then I get the structure and call one of the 3 functions that
'knows' how to get the data we want from each document. They are:

- get_channels(): this function knows how to get the channel
name and url and returns a list of tuples ( name, url );

- get_programs(): this one knows how to get the programs from
a given channel and returns a list of tuples;

- get_program_info(): this knows how to get the program
information and returns a dict with the parsed xmltv data.

So, to get the whole guide is easy: First, I grab the first
page and use get_channels() to parse it and get the channels name
and url. Then for each channel, go for that URL and use
get_programs() to get the programs names, start time and url. If
the program has a url, grab it and use get_program_info() to get
its xmltv information.

Each get_{channels,programs,program_info}() has its get_url_
and process_ functions. This low coupling helps to print out
information and debug. For instance the main functions get the
url, download data, process it and return the results. All these
parts are separated.
When an error happens, the grabber dumps the file and exits
with -1.


NOTICE: You may define your own functions!!!

NOTICE: The re_clean and clear_html() are fully optional! It's
just something I use to get ride of bullshit and maybe fix
some html errors.
Release History

Release History

0.6.0

This version

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.5.0

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

Supported By

WebFaction WebFaction Technical Writing Elastic Elastic Search Pingdom Pingdom Monitoring Dyn Dyn DNS Sentry Sentry Error Logging CloudAMQP CloudAMQP RabbitMQ Heroku Heroku PaaS Kabu Creative Kabu Creative UX & Design Fastly Fastly CDN DigiCert DigiCert EV Certificate Rackspace Rackspace Cloud Servers DreamHost DreamHost Log Hosting