linkGrabber

Scrape links from a single web page

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 4 - Beta
Environment
- Console
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Programming Language
- Python
- Python :: 3
Topic
- Internet :: WWW/HTTP

Project description

=====
Link Grabber
=====

Link Grabber provides a quick and easy way to grab links from
a single web page. This python package is a simple wrapper
around BeautifulSoup_, focusing on grabbing HTML's
hyperlink tag, "a."

.. _BeautifulSoup: http://www.crummy.com/software/BeautifulSoup/

.. _find_all: http://www.crummy.com/software/BeautifulSoup/bs4/doc/#find-all

pypi_

.. _pypi: https://pypi.python.org/pypi/linkGrabber/

GitHub_

.. _GitHub: https://github.com/detroit-media-partnership/link-grabber

Dependecies:

* BeautifulSoup
* Requests

How-To
======

.. code:: bash

$ python setup.py install

OR

.. code:: bash

$ pip install linkGrabber

Quickie
=======

.. code:: python

import re
import linkGrabber

seek = linkGrabber.Links("http://www.google.com")
seek.find()
# limit the number of "a" tags to 5
seek.find(limit=5)
# filter the "a" tag href attribute
seek.find({ "href": re.compile("plus.google.com") })

Documentation
=============

find
----------

Parameters:
* filters (dict): Beautiful Soup's filters as a dictionary
* limit (int): Limit the number of links in sequential order
* reverse (bool): Reverses how the list of <a> tags are sorted
* sort (function): Accepts a function that accepts which key to sort upon
within the List class

Find all links that have a style containing "11px"

.. code:: python

import re
from linkGrabber import Links

seek = Links("http://www.google.com")
seek.find({ "style": re.compile("11px") }, 5)

Reverse the sort before limiting links:

.. code:: python

from linkGrabber import Links

seek = Links("http://www.google.com")
seek.find(limit=2, reverse=True)

Sort by a link's attribute:

.. code:: python

from linkGrabber import Links

seek = Links("http://www.google.com")
seek.find(limit=3, sort=lambda key: key['text'])

Link Dictionary
---------------

All attrs from BeautifulSoup's Tag object are available in the dictionary
as well as a few extras:

* text (text inbetween the <a></a> tag)
* seo (parse all text after last "/" in URL and attempt to make it human readable)

=========
Changelog
=========

v0.2.4 (06/10/2014)
-------------------

* Updated documentation to be better read on pypi
* Removed scrape.py and moved it to __init__.py
* Now using nose for unit testing

v0.2.3 (05/22/2014)
-------------------

* Updated setup py file and some verbage

v0.2.2 (05/19/2014)
-------------------

* linkGrabber.Links.find() now responds with all Tag.attrs
from BeautifulSoup4 as well as 'text' and 'seo' keys

v0.2.1 (05/18/2014)
-------------------

* Added more tests

v0.2.0 (05/17/2014)
-------------------

* Modified naming convention, reduced codebase, more readable structure

v0.1.9 (05/17/2014)
-------------------

* Python 3.4 compatability

v0.1.8 (05/16/2014)
-------------------

* Changed paramerter names to better reflect functionality

v0.1.7 (05/16/2014)
-------------------

* Update README

v0.1.6 (05/16/2014)
-------------------

* Update README with more examples

v0.1.5 (05/16/2014)
-------------------

* Updated find_links to accept link_reverse=(bool) and link_sort=(function)

v0.1.0 (05/16/2014)
-------------------

* Initial release.

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 4 - Beta
Environment
- Console
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Programming Language
- Python
- Python :: 3
Topic
- Internet :: WWW/HTTP

Release history Release notifications | RSS feed

0.3.1

Nov 9, 2017

0.3.0

Jul 9, 2015

0.2.9

Jan 24, 2015

0.2.8

Oct 23, 2014

0.2.7

Jun 25, 2014

0.2.6

Jun 25, 2014

0.2.5

Jun 23, 2014

This version

0.2.4

Jun 10, 2014

0.2.3

May 22, 2014

0.2.2

May 19, 2014

0.2.1

May 19, 2014

0.2.0

May 17, 2014

0.1.9

May 17, 2014

0.1.8

May 17, 2014

0.1.7

May 17, 2014

0.1.6

May 17, 2014

0.1.5

May 17, 2014

0.1.0

May 16, 2014

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

linkGrabber-0.2.4.tar.gz (6.6 kB view hashes)

Uploaded Jun 10, 2014 Source

Hashes for linkGrabber-0.2.4.tar.gz

Hashes for linkGrabber-0.2.4.tar.gz
Algorithm	Hash digest
SHA256	`bd9315ba543553d40507728a21c971dbd46758e35dd0ddf0293c4c4f5c4a80aa`
MD5	`f80c7f982b36f8295fcf9a15767e00e8`
BLAKE2b-256	`d3e362b4e0414c393310197e9e3482ab9105dfc980d1ee65a91b75cd14fea442`