Skip to main content

a lightweight Javascript-aware, headless web scraping library for Python

Project description

# Overview

Author: Niklas Baumstark

dryscrape is a lightweight web scraping library for Python. It uses a headless Webkit instance to evaluate Javascript on the visited pages. This enables painless scraping of plain web pages as well as Javascript-heavy “Web 2.0” applications like Facebook.

It is built on the shoulders of [capybara-webkit](https://github.com/thoughtbot/capybara-webkit)’s [webkit-server](https://github.com/niklasb/webkit-server). A big thanks goes to thoughtbot, inc. for building this excellent piece of software!

# Changelog

  • 1.0: Added Python 3 support, small performance fixes, header names are now properly normalized. Also added the function dryscrape.start_xvfb() to easily start Xvfb.

  • 0.9.1: Changed semantics of the headers function in a backwards-incompatible way: It now returns a list of (key, value) pairs instead of a dictionary.

# Supported Platforms

The library has been confirmed to work on the following platforms:

  • Mac OS X 10.9 Mavericks and 10.10 Yosemite

  • Ubuntu Linux

  • Arch Linux

Other unixoid systems should work just fine.

Windows is not officially supported, although dryscrape should work with [cygwin](https://www.cygwin.com/).

# Installation, Usage, API Docs

Documentation can be found at [dryscrape’s ReadTheDocs page](http://readthedocs.org/docs/dryscrape/).

Quick installation instruction:

# pip install dryscrape

# Contact, Bugs, Contributions

If you have any problems with this software, don’t hesitate to open an issue on [Github](https://github.com/niklasb/dryscrape) or open a pull request or write a mail to niklas baumstark at Gmail.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dryscrape-1.0.tar.gz (5.5 kB view details)

Uploaded Source

File details

Details for the file dryscrape-1.0.tar.gz.

File metadata

  • Download URL: dryscrape-1.0.tar.gz
  • Upload date:
  • Size: 5.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for dryscrape-1.0.tar.gz
Algorithm Hash digest
SHA256 a99858786434947266cb81d5634cb1722de48aaf6b9cdffda15b7cd4a8e07340
MD5 267e380a8efaf9cd8fd94de1639d3198
BLAKE2b-256 b575c45f796ec5bc7f98c38b9ae425390ef5f4a76153c8b5af946adb97e7e622

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page