Skip to main content

Scrape links from a single web site

Project description

=====
Link Grabber
=====

Link Grabber provides a quick and easy way to grab links from
a single web page. This python package is a simple wrapper
around BeautifulSoup_, specifically focusing on grabbing HTML's
hyperlink tag, "a." It essentially wraps around find_all_ specifically
for the "a" tag and opens all the filters that you can apply in
Beautiful Soup into linkGrabber's filter parameter.

.. _BeautifulSoup: http://www.crummy.com/software/BeautifulSoup/

.. _find_all: http://www.crummy.com/software/BeautifulSoup/bs4/doc/#find-all

pypi_

.. _pypi: https://pypi.python.org/pypi/linkGrabber/

Dependecies:

* BeautifulSoup

How-To
======

.. code:: bash

$ python setup.py install

OR

.. code:: bash

$ pip install linkGrabber

Quickie
=======

.. code:: python

import re
import linkGrabber

seek = linkGrabber.ScrapeLinks("http://www.google.com")
seek.find_links()
# limit the number of "a" tags to 5
seek.find_links(limit=5)
# filter the "a" tag href attribute
seek.find_links({ "href": re.compile("plus.google.com") })

Documentation
=============

find_links
----------

Parameters:
* filters: Beautiful Soup's filters as a dictionary
* limit: Limit the number of links in sequential order
* limit_reverse: Reverses how the list of <a> tags are sorted
* limit_sort: Accepts a function that accepts which key to sort upon
within the List class

.. code:: python

import re
from linkGrabber import ScrapeLinks

seek = linkGrabber.ScrapeLinks("http://www.google.com")
seek.find_links({ "style": re.compile("11px") }, 5)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

linkGrabber-0.1.5.tar.gz (3.2 kB view details)

Uploaded Source

File details

Details for the file linkGrabber-0.1.5.tar.gz.

File metadata

  • Download URL: linkGrabber-0.1.5.tar.gz
  • Upload date:
  • Size: 3.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for linkGrabber-0.1.5.tar.gz
Algorithm Hash digest
SHA256 c08bf06ffc013cd3d532db0d442b2055ea034a6a9e918b292c28f9a165d18b03
MD5 b047adcf94d3de073ea311f466630ca1
BLAKE2b-256 8419b5567af984157f7bf40e89aa885cc247f91a3a374b3a0596653d5c2da10e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page