a lightweight Javascript-aware, headless web scraping library for Python
Project description
# Overview
Author: Niklas Baumstark
dryscrape is a lightweight web scraping library for Python. It uses a headless Webkit instance to evaluate Javascript on the visited pages. This enables painless scraping of plain web pages as well as Javascript-heavy “Web 2.0” applications like Facebook.
It is built on the shoulders of [capybara-webkit](https://github.com/thoughtbot/capybara-webkit)’s [webkit-server](https://github.com/niklasb/webkit-server). A big thanks goes to thoughtbot, inc. for building this excellent piece of software!
# Changelog
1.0: Added Python 3 support, small performance fixes, header names are now properly normalized. Also added the function dryscrape.start_xvfb() to easily start Xvfb.
0.9.1: Changed semantics of the headers function in a backwards-incompatible way: It now returns a list of (key, value) pairs instead of a dictionary.
# Supported Platforms
The library has been confirmed to work on the following platforms:
Mac OS X 10.9 Mavericks and 10.10 Yosemite
Ubuntu Linux
Arch Linux
Other unixoid systems should work just fine.
Windows is not officially supported, although dryscrape should work with [cygwin](https://www.cygwin.com/).
# Installation, Usage, API Docs
Documentation can be found at [dryscrape’s ReadTheDocs page](http://readthedocs.org/docs/dryscrape/).
Quick installation instruction:
# pip install dryscrape
# Contact, Bugs, Contributions
If you have any problems with this software, don’t hesitate to open an issue on [Github](https://github.com/niklasb/dryscrape) or open a pull request or write a mail to niklas baumstark at Gmail.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file dryscrape-1.0.tar.gz
.
File metadata
- Download URL: dryscrape-1.0.tar.gz
- Upload date:
- Size: 5.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a99858786434947266cb81d5634cb1722de48aaf6b9cdffda15b7cd4a8e07340 |
|
MD5 | 267e380a8efaf9cd8fd94de1639d3198 |
|
BLAKE2b-256 | b575c45f796ec5bc7f98c38b9ae425390ef5f4a76153c8b5af946adb97e7e622 |