a module for polling urls and stats from homepages
Project description
pageone ====== a module for polling urls and stats from homepages
Install
mkvirtualenv pageone git clone https://github.com/newslnyx/pageone.git cd pageone pip install -r requirements.txt pip install .
Test
Requires nose
nosetests
Usage
pageone does two things: extract article urls from a site’s homepage and also uses selenium and phantomjs to find the relative positions of these urls.
To get stats about the positions of links, use link_stats:
from pageone import PageOne
p = PageOne(url='http://www.propublica.org/')
# get stats about links positions
for link in p.link_stats():
print link
This will return a list of dictionaries that look like this:
{'bucket': 4,
'datetime': datetime.datetime(2014, 6, 7, 16, 6, 3, 533818),
'font_size': 13,
'has_img': 1,
'headline': u'',
'homepage': 'http://www.propublica.org/',
'img_area': 3969,
'img_height': 63,
'img_src': u'http://www.propublica.org/images/ngen/gypsy_image_medium/mpmh_victory_drive_140x140_130514_1.jpg',
'img_width': 63,
'url': u'http://www.propublica.org/article/protect-service-members-defense-department-plans-broad-ban-high-cost-loans',
'x': 61,
'x_bucket': 1,
'y': 730,
'y_bucket': 4}
To get simply get all of the article urls on a homepage, use articles:
from pageone import PageOne
p = PageOne(url='http://www.propublica.org/')
for article in p.articles():
print article
If you want to get article urls from other sites, use incl_external:
from pageone import PageOne
p = PageOne(url='http://www.propublica.org/')
for article in p.articles(incl_external=True):
print article
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pageone-0.0.1.tar.gz
(4.8 kB
view hashes)
Built Distribution
Close
Hashes for pageone-0.0.1.macosx-10.9-intel.exe
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3196a4381615be7212f359bf538095077a8380c31dabc319eb44302df7af799f |
|
MD5 | 880b0a81b36a35aaa0189c342457b02b |
|
BLAKE2b-256 | a53280dc0d3e9df8a70401f5631d036a8ce80bc78425cf89006d0e0827f56064 |