This is a pre-production deployment of Warehouse, however changes made here WILL affect the production instance of PyPI.
Latest Version Dependencies status unknown Test status unknown Test coverage unknown
Project Description
============
Link Grabber
============

.. image:: https://travis-ci.org/michigan-com/linkGrabber.svg?branch=master
:target: https://travis-ci.org/michigan-com/linkGrabber

`Documentation <http: linkgrabber.neurosnap.net="">`_

Link Grabber provides a quick and easy way to grab links from
a single web page. This python package is a simple wrapper
around `BeautifulSoup <http: www.crummy.com="" software="" beautifulsoup=""/>`_, focusing on grabbing HTML's
hyperlink tag, "a."

Dependecies:

* Python 2.7, 3.3, 3.4
* BeautifulSoup
* Requests
* Six

How-To
------

.. code:: bash

$ python setup.py install

OR

.. code:: bash

$ pip install linkGrabber

Quickie
-------

.. code:: python

import re
import linkGrabber

links = linkGrabber.Links("http://www.google.com")
links.find()
# limit the number of "a" tags to 5
links.find(limit=5)
# filter the "a" tag href attribute
links.find(href=re.compile("plus.google.com"))

Documentation
-------------

http://linkgrabber.neurosnap.net/

find
````

Parameters:
* filters (dict): Beautiful Soup's filters as a dictionary
* limit (int): Limit the number of links in sequential order
* reverse (bool): Reverses how the list of <a> tags are sorted
* sort (function): Accepts a function that accepts which key to sort upon
within the List class

Find all links that have a style containing "11px"

.. code:: python

import re
from linkGrabber import Links

links = Links("http://www.google.com")
links.find(style=re.compile("11px"), limit=5)

Reverse the sort before limiting links:

.. code:: python

from linkGrabber import Links

links = Links("http://www.google.com")
links.find(limit=2, reverse=True)

Sort by a link's attribute:

.. code:: python

from linkGrabber import Links

links = Links("http://www.google.com")
links.find(limit=3, sort=lambda key: key['text'])

Exclude text:

.. code:: python

import re

from linkGrabber import Links

links = Links("http://www.google.com")
links.find(exclude=[{ "text": re.compile("Read More") }])

Remove duplicate URLs and make the output pretty:

.. code:: python

from linkGrabber import Links

links = Links("http://www.google.com")
links.find(duplicates=False, pretty=True)

Link Dictionary
```````````````

All attrs from BeautifulSoup's Tag object are available in the dictionary
as well as a few extras:

* text (text inbetween the <a></a> tag)
* seo (parse all text after last "/" in URL and attempt to make it human readable)


=========
Changelog
=========

v.0.3.0 (7/09/2015)
-------------------

* Added parser parameter to Links class
* Default parser set to lxml

v.0.2.10 (7/09/2015
-------------------

* Added six as a dependency

v0.2.9 (1/24/2014)
------------------

* Updated documentation

v0.2.8 (10/23/2014)
-------------------

* Added better documentation

v0.2.7 (06/25/2014)
-------------------

* Fixed exclude for non-iterable strings

v0.2.6 (06/25/2014)
-------------------

* Exclude parameter is now a list of dictionaries
* Added pretty property
* Added duplicates property which will remove any identical URLs
* Added more tests
* Added better docs

v0.2.5 (06/23/2014)
-------------------

* Added exclude parameter to Links.find() which removes
links that match certain criteria

v0.2.4 (06/10/2014)
-------------------

* Updated documentation to be better read on pypi
* Removed scrape.py and moved it to __init__.py
* Now using nose for unit testing

v0.2.3 (05/22/2014)
-------------------

* Updated setup py file and some verbage

v0.2.2 (05/19/2014)
-------------------

* linkGrabber.Links.find() now responds with all Tag.attrs
from BeautifulSoup4 as well as 'text' and 'seo' keys

v0.2.1 (05/18/2014)
-------------------

* Added more tests

v0.2.0 (05/17/2014)
-------------------

* Modified naming convention, reduced codebase, more readable structure

v0.1.9 (05/17/2014)
-------------------

* Python 3.4 compatability

v0.1.8 (05/16/2014)
-------------------

* Changed paramerter names to better reflect functionality

v0.1.7 (05/16/2014)
-------------------

* Update README

v0.1.6 (05/16/2014)
-------------------

* Update README with more examples

v0.1.5 (05/16/2014)
-------------------

* Updated find_links to accept link_reverse=(bool) and link_sort=(function)

v0.1.0 (05/16/2014)
-------------------

* Initial release.
Release History

Release History

0.3.0

This version

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.2.9

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.2.8

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.2.7

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.2.6

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.2.5

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.2.4

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.2.3

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.2.2

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.2.1

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.2.0

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.1.9

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.1.8

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.1.7

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.1.6

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.1.5

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.1.0

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

Download Files

Download Files

TODO: Brief introduction on what you do with files - including link to relevant help section.

File Name & Checksum SHA256 Checksum Help Version File Type Upload Date
linkGrabber-0.3.0.tar.gz (9.1 kB) Copy SHA256 Checksum SHA256 Source Jul 9, 2015

Supported By

WebFaction WebFaction Technical Writing Elastic Elastic Search Pingdom Pingdom Monitoring Dyn Dyn DNS Sentry Sentry Error Logging CloudAMQP CloudAMQP RabbitMQ Heroku Heroku PaaS Kabu Creative Kabu Creative UX & Design Fastly Fastly CDN DigiCert DigiCert EV Certificate Rackspace Rackspace Cloud Servers DreamHost DreamHost Log Hosting