A Python library for automating interaction with websites
Project description
MechanicalSoup
==============
A Python library for automating interaction with websites. MechanicalSoup automatically stores and sends cookies, follows redirects, and can follow links and submit forms. It doesn't do Javascript.
I was a fond user of the [Mechanize](https://github.com/jjlee/mechanize) library, but unfortunately it's [incompatible with Python 3](https://github.com/jjlee/mechanize/issues/96) and development is inactive. MechanicalSoup provides a similar API, built on Python giants [Requests](http://docs.python-requests.org/en/latest/) (for http sessions) and [BeautifulSoup](http://www.crummy.com/software/BeautifulSoup/) (for document navigation).
Installation
------
[![Latest Version](https://img.shields.io/pypi/v/MechanicalSoup.svg)](https://pypi.python.org/pypi/MechanicalSoup/)
From [PyPI](https://pypi.python.org/pypi/MechanicalSoup/)
pip install MechanicalSoup
Python versions 2.6-2.7, 3.3-3.6, PyPy and PyPy3 are supported (and tested against).
Example
------
From [`example.py`](example.py), code to log into the GitHub website:
```python
"""Example app to login to GitHub using the StatefulBrowser class."""
from __future__ import print_function
import argparse
import mechanicalsoup
from getpass import getpass
parser = argparse.ArgumentParser(description="Login to GitHub.")
parser.add_argument("username")
args = parser.parse_args()
args.password = getpass("Please enter your GitHub password: ")
browser = mechanicalsoup.StatefulBrowser()
# Uncomment for a more verbose output:
# browser.set_verbose(2)
browser.open("https://github.com")
browser.follow_link("login")
browser.select_form('#login form')
browser["login"] = args.username
browser["password"] = args.password
resp = browser.submit_selected()
# Uncomment to launch a web browser on the current page:
# browser.launch_browser()
# verify we are now logged in
page = browser.get_current_page()
messages = page.find("div", class_="flash-messages")
if messages:
print(messages.text)
assert page.select(".logout-form")
print(page.title.text)
# verify we remain logged in (thanks to cookies) as we browse the rest of
# the site
page3 = browser.open("https://github.com/hickford/MechanicalSoup")
assert page3.soup.select(".logout-form")
```
For an example with a more complex form (checkboxes, radio buttons and textareas), read [`tests/test_browser.py`](tests/test_browser.py) and [`tests/test_form.py`](tests/test_form.py).
Common problems
---
### "No parser was explicitly specified"
> UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("lxml"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.
Recent versions of BeautifulSoup show a harmless warning to encourage you to specify which HTML parser to use. You can do this in MechanicalSoup:
mechanicalsoup.Browser(soup_config={'features':'html.parser'})
Or if you have the parser [lxml](http://lxml.de/installation.html) installed:
mechanicalsoup.Browser(soup_config={'features':'lxml'})
See also https://www.crummy.com/software/BeautifulSoup/bs4/doc/#you-need-a-parser
Development
---------
[![Build Status](https://travis-ci.org/hickford/MechanicalSoup.svg?branch=master)](https://travis-ci.org/hickford/MechanicalSoup)
### Tests
py.test
### Roadmap
* Draw [Substack-style](http://substack.net/art) readme art (imagine a steaming bowl of cogs and noodles)
* [Write docs and publish website](https://github.com/hickford/MechanicalSoup/issues/6)
See also
------
* [RoboBrowser](https://github.com/jmcarp/robobrowser): a similar library, also based on Requests and BeautifulSoup.
* [Hacker News post](https://news.ycombinator.com/item?id=8012103)
* [Reddit discussion](http://www.reddit.com/r/programming/comments/2aa13s/mechanicalsoup_a_python_library_for_automating/)
==============
A Python library for automating interaction with websites. MechanicalSoup automatically stores and sends cookies, follows redirects, and can follow links and submit forms. It doesn't do Javascript.
I was a fond user of the [Mechanize](https://github.com/jjlee/mechanize) library, but unfortunately it's [incompatible with Python 3](https://github.com/jjlee/mechanize/issues/96) and development is inactive. MechanicalSoup provides a similar API, built on Python giants [Requests](http://docs.python-requests.org/en/latest/) (for http sessions) and [BeautifulSoup](http://www.crummy.com/software/BeautifulSoup/) (for document navigation).
Installation
------
[![Latest Version](https://img.shields.io/pypi/v/MechanicalSoup.svg)](https://pypi.python.org/pypi/MechanicalSoup/)
From [PyPI](https://pypi.python.org/pypi/MechanicalSoup/)
pip install MechanicalSoup
Python versions 2.6-2.7, 3.3-3.6, PyPy and PyPy3 are supported (and tested against).
Example
------
From [`example.py`](example.py), code to log into the GitHub website:
```python
"""Example app to login to GitHub using the StatefulBrowser class."""
from __future__ import print_function
import argparse
import mechanicalsoup
from getpass import getpass
parser = argparse.ArgumentParser(description="Login to GitHub.")
parser.add_argument("username")
args = parser.parse_args()
args.password = getpass("Please enter your GitHub password: ")
browser = mechanicalsoup.StatefulBrowser()
# Uncomment for a more verbose output:
# browser.set_verbose(2)
browser.open("https://github.com")
browser.follow_link("login")
browser.select_form('#login form')
browser["login"] = args.username
browser["password"] = args.password
resp = browser.submit_selected()
# Uncomment to launch a web browser on the current page:
# browser.launch_browser()
# verify we are now logged in
page = browser.get_current_page()
messages = page.find("div", class_="flash-messages")
if messages:
print(messages.text)
assert page.select(".logout-form")
print(page.title.text)
# verify we remain logged in (thanks to cookies) as we browse the rest of
# the site
page3 = browser.open("https://github.com/hickford/MechanicalSoup")
assert page3.soup.select(".logout-form")
```
For an example with a more complex form (checkboxes, radio buttons and textareas), read [`tests/test_browser.py`](tests/test_browser.py) and [`tests/test_form.py`](tests/test_form.py).
Common problems
---
### "No parser was explicitly specified"
> UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("lxml"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.
Recent versions of BeautifulSoup show a harmless warning to encourage you to specify which HTML parser to use. You can do this in MechanicalSoup:
mechanicalsoup.Browser(soup_config={'features':'html.parser'})
Or if you have the parser [lxml](http://lxml.de/installation.html) installed:
mechanicalsoup.Browser(soup_config={'features':'lxml'})
See also https://www.crummy.com/software/BeautifulSoup/bs4/doc/#you-need-a-parser
Development
---------
[![Build Status](https://travis-ci.org/hickford/MechanicalSoup.svg?branch=master)](https://travis-ci.org/hickford/MechanicalSoup)
### Tests
py.test
### Roadmap
* Draw [Substack-style](http://substack.net/art) readme art (imagine a steaming bowl of cogs and noodles)
* [Write docs and publish website](https://github.com/hickford/MechanicalSoup/issues/6)
See also
------
* [RoboBrowser](https://github.com/jmcarp/robobrowser): a similar library, also based on Requests and BeautifulSoup.
* [Hacker News post](https://news.ycombinator.com/item?id=8012103)
* [Reddit discussion](http://www.reddit.com/r/programming/comments/2aa13s/mechanicalsoup_a_python_library_for_automating/)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
MechanicalSoup-0.7.0.tar.gz
(10.6 kB
view details)
Built Distribution
File details
Details for the file MechanicalSoup-0.7.0.tar.gz
.
File metadata
- Download URL: MechanicalSoup-0.7.0.tar.gz
- Upload date:
- Size: 10.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0d1523f8e14957d0cce1dae048ff35c4aabb4a2cbfd8014cbfd110426b1e0972 |
|
MD5 | 7cb538573e7bfc1afd55262373fc6b58 |
|
BLAKE2b-256 | f4f5359baf84062bfe3df1ac89456cbfc4f674941964f875caae4e316a8c59ef |
File details
Details for the file MechanicalSoup-0.7.0-py2.py3-none-any.whl
.
File metadata
- Download URL: MechanicalSoup-0.7.0-py2.py3-none-any.whl
- Upload date:
- Size: 7.6 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | fa008ce2e957fc1add6d7348d22096ee9fdd45d2f96710d993b4b26162763241 |
|
MD5 | 6b2d0d750737030ddc11294a5017f2c3 |
|
BLAKE2b-256 | 3837d031b1a396b344996c3d296f29037bfc92f24fa54c5162d0cf4281baaca2 |