Skip to main content
Help us improve PyPI by participating in user testing. All experience levels needed!

Web Scraping Framework

Project description

https://travis-ci.org/lorien/grab.png?branch=master https://coveralls.io/repos/lorien/grab/badge.svg?branch=master https://pypip.in/download/grab/badge.svg?period=month https://landscape.io/github/lorien/grab/master/landscape.png

Grab is a python web scraping framework. Grab provides tons of helpful methods to scrape web sites and to process the scraped content:

  • Automatic cookies (session) support
  • HTTP and SOCKS proxy with and without authorization
  • Keep-Alive support
  • IDN support
  • Tools to work with web forms
  • Easy multipart file uploading
  • Flexible customization of HTTP requests
  • Automatic charset detection
  • Powerful API of extracting info from HTML documents with XPATH queries
  • Asynchronous API to make thousands of simultaneous queries. This part of library called Spider and it is too big to even list its features in this README.
  • Python 3 ready
  • And much, much more
  • Grab has written by the guy who is doing site scraping since 2005

Check out docs (RU): https://github.com/lorien/grab/tree/master/docs Check out docs (EN): https://github.com/lorien/grab/tree/master/docs2/source

Example of Grab usage:

from grab import Grab

g = Grab()
g.go('https://github.com/login')
g.set_input('login', 'lorien')
g.set_input('password', '***')
g.submit()
for elem in g.doc.select('//ul[@id="repo_listing"]/li/a'):
    print '%s: %s' % (elem.text(), elem.attr('href'))

Example of Grab::Spider usage:

from grab.spider import Spider, Task
import logging

class ExampleSpider(Spider):
    def task_generator(self):
        for lang in ('python', 'ruby', 'perl'):
            url = 'https://www.google.com/search?q=%s' % lang
            yield Task('search', url=url)

    def task_search(self, grab, task):
        print grab.doc.select('//div[@class="s"]//cite').text()


logging.basicConfig(level=logging.DEBUG)
bot = ExampleSpider()
bot.run()

Installation

Pip is recommended way to install Grab and its dependencies:

$ pip install grab

See details here https://github.com/lorien/grab/blob/master/docs2/source/grab_installation.rst

Documentation

Russian docs: http://docs.grablib.org

English docs in progress: https://github.com/lorien/grab/tree/master/docs2/source

Mailing List (Ru/En languages): http://groups.google.com/group/python-grab/

Contribution

If you have found a bug or wish a new feature please open new issue on github:

Project details


Release history Release notifications

History Node

0.6.40

History Node

0.6.39

History Node

0.6.38

History Node

0.6.37

History Node

0.6.36

History Node

0.6.35

History Node

0.6.34

History Node

0.6.33

History Node

0.6.32

History Node

0.6.31

History Node

0.6.30

History Node

0.6.29

History Node

0.6.28

History Node

0.6.27

History Node

0.6.26

History Node

0.6.25

History Node

0.6.24

History Node

0.6.23

History Node

0.6.22

History Node

0.6.21

History Node

0.6.20

History Node

0.6.19

History Node

0.6.18

History Node

0.6.17

History Node

0.6.16

History Node

0.6.15

History Node

0.6.14

History Node

0.6.13

History Node

0.6.12

History Node

0.6.11

History Node

0.6.10

History Node

0.6.9

History Node

0.6.8

History Node

0.6.7

History Node

0.6.6

History Node

0.6.5

History Node

0.6.4

History Node

0.6.3

History Node

0.6.2

History Node

0.6.1

History Node

0.6.0

History Node

0.5.5

History Node

0.5.4

This version
History Node

0.5.3

History Node

0.5.2

History Node

0.5.1

History Node

0.5.0

History Node

0.4.13

History Node

0.4.12

History Node

0.4.11

History Node

0.4.10

History Node

0.4.9

History Node

0.4.8

History Node

0.4.7

History Node

0.4.5

History Node

0.4.4

History Node

0.4.3

History Node

0.4.2

History Node

0.4.1

History Node

0.4.0

History Node

0.3.33

History Node

0.3.32

History Node

0.3.31

History Node

0.3.30

History Node

0.3.29

History Node

0.3.28

History Node

0.3.27

History Node

0.3.26

History Node

0.3.25

History Node

0.3.24

History Node

0.3.23

History Node

0.3.22

History Node

0.3.21

History Node

0.3.20

History Node

0.3.19

History Node

0.3.18

History Node

0.3.17

History Node

0.3.16

History Node

0.3.15

History Node

0.3.14

History Node

0.3.13

History Node

0.3.12

History Node

0.3.11

History Node

0.3.10

History Node

0.3.9

History Node

0.3.8

History Node

0.3.7

History Node

0.3.6

History Node

0.3.4

History Node

0.3.3

History Node

0.3.2

History Node

0.3.1

History Node

0.3

History Node

0.2.20

History Node

0.2.19

History Node

0.2.18

History Node

0.2.17

History Node

0.2.16

History Node

0.2.15

History Node

0.2.12

History Node

0.2.11

History Node

0.2.10

History Node

0.2.9

History Node

0.2.8

History Node

0.2.7

History Node

0.2.6

History Node

0.2.5

History Node

0.2.4

History Node

0.2.3

History Node

0.2.2

History Node

0.2.1

History Node

0.2.0

History Node

0.1.7

History Node

0.1.6

History Node

0.1.5

History Node

0.1.4

History Node

0.1.3

History Node

0.1.2

History Node

0.1.1

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Filename, size & hash SHA256 hash help File type Python version Upload date
grab-0.5.3.tar.gz (161.5 kB) Copy SHA256 hash SHA256 Source None Mar 7, 2015

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging CloudAMQP CloudAMQP RabbitMQ AWS AWS Cloud computing Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page