Skip to main content

Headless programmatic web browser on top of Requests and Beautiful Soup

Project description

Pynav2

Headless programmatic web browser on top of Requests and Beautiful Soup

Requirements

Python 3.4+

Unittest tested from Python 3.4 to 3.7

Installation

If python3 is the default python binary

pip install pynav2

If python2 is the default python binary

pip3 install pynav2

Licence

GNU LGPLv3 (GNU Lesser General Public License Version 3)

Interactive mode examples

Required for all examples

from pynav2 import Browser
b = Browser()

HTTP GET request and print the response

Get http://example.com (use https if available on server)

>>> b.get('example.com')
<Response [200]>
>>> b.text  # alias for b.response.text
'<!DOCTYPE html>\n<html lang="mul" class="no-js">\n<head>\n<meta charset="utf-8">\n<title>example.com</title>...'

HTTP GET request and print the json response

Get http://example.com/user-agent/json wich return a the json-encoded content of a response if nay

>>> b.get('example.com/user-agent/json')
<Response [200]>
>>> b.json  # alias for b.response.json()
{'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64; rv:56.0) Gecko/20100101 Firefox/56.0'}

HTTP POST request and print the response

>>> data = {'q': 'python'}
>>> b.post('example.com/search', data=data)
<Response [200]>
>>> b.text
'<!DOCTYPE html>\n<html lang="mul" class="no-js">\n<head>\n<meta charset="utf-8">\n<title>example.com</title>...'

HTTP POST json request and print the json response

>>> import json
>>> data = {'login': 'user', 'password': 'pass'}
>>> b.post('example.com/login', json=json.dumps(data))  # json to send in the body of the request
<Response [200]>
>>> b.json
{'login': 'success'}

HTTP HEAD request and print response headers

>>> b.head('example.com')
<Response [200]>
>>> b.response.headers
{'Server': 'nginx', 'Content-Type': 'text/html; charset=utf-8', 'Content-Length': '48842', 'Age': '3154', 'Connection': 'keep-alive'}

HTTP PUT request and print the json response

>>> data = {'version': '2.1', 'licence': 'LGPL'}
>>> b.put('example.com/api/about/', data=data)
<Response [200]>
>>> b.json
{'update': 'success'}

HTTP PATCH request and print the json response

>>> data = {'version': '2.1'}
>>> b.patch('example.com/api/about/', data=data)
<Response [200]>
>>> b.json
{'patch': 'success'}

HTTP DELETE request and print the json response

>>> b.delete('example.com/api/user/102')
<Response [200]>
>>> b.json
{'delete': 'success'}

HTTP OPTIONS request and print the json response

>>> b.options('example.com/api/user')
<Response [200]>
>>> b.json
{'options': '...'}

Get all links

>>> b.get('example.com')
<Response [200]>
>>> b.links
['http://example.com/news', 'http://example.com/forum', 'http://example.com/contact']
>>> for link in b.links:
...   print(link)
...
http://example.com/news
http://example.com/forum
http://example.com/contact

Filter links

Any beautifulSoup.find_all() parameter can be added, see Beautiful Soup documentation

>>> import re
>>> b.get('example.com')
<Response [200]>
>>> b.get_links(text='Python Events')  # regular expression
>>> b.get_links(class_="jump-link")  # no regular expression for class attribute
>>> b.get_links(href="windows")   # regular expression
>>> b.get_links(title=re.compile('success'))  # manual regular expression

Get all images

>>> b.get('example.com')
<Response [200]>
>>> b.images
['http://example.com/img/logo.png', 'http://example.com/img/picture.jpg', 'http://there.com/news.gif']

Filter images

Any beautifulSoup.find_all() parameter can be added, see Beautiful Soup documentation

>>> b.get('example.com')
<Response [200]>
>>> b.get_images(src='logo')  # regular expression
>>> b.get_images(class_='python-logo')  # no regular expression for class attribute
>>> b.get_images(alt='yth')  # regular expression

Download file

>>> b.verbose=True
>>> b.download('http://example.com/ubuntu-amd64', '/tmp')  # it will follow redirect and look for headers content-disposition to find filename
downloading ubuntu-18.04.1-desktop-amd64.iso (1.8 GB) to: /tmp/ubuntu-18.04.1-desktop-amd64.iso
download completed in 12 minutes 5 seconds (1.8 GB)

Handle referer

>>> b.handle_referer = True
>>> b.get('somewhere.com')
>>> b.get('example.com')  # request headers will have http://somewhere.com as referer
>>> b.get('there.com')  # request headers will have http://example.com as referer

Set referer manually

>>> b.referer = 'http://www.here.com'
>>> b.get('example.com') # request headers will have http://here.com as referer

Set user-agent

useragent module include a list of user-agents :

firefox_windows, chrome_windows, edge_windows, ie_windows, firefox_linux, chrome_linux, safari_mac

Default user-agent is firefox_windows

>>> from pynav2 import useragent
>>> b.user_agent = useragent.firefox_linux
>>> b.get('example.com')  # request headers will have 'Mozilla/5.0 (X11; Linux x86_64; rv:56.0) Gecko/20100101 Firefox/56.0' as User-Agent
>>> b.user_agent = 'my_app/v1.0'
>>> b.get('example.com')  # request headers will have my_app/v1.0 as User-Agent 

Set sleep time before a request

>>> b.set_sleep_time(0.5, 1.5)  # random x seconds between 0.5 to 1.5 seconds and wait x before each request
>>> b.get('example.com') # wait x seconds before request

Define request timeout

10 seconds timeout

>>> b.timeout = 10

Close all opened TCP sessions

>>> b.get('example1.com')
>>> b.get('example2.com')
>>> b.get('example3.com')
>>> b.session.close()

Set HTTP proxy working with HTTPS request for one request

For SOCKS proxies see Requests documentation

>>> b.get('https://httpbin.org/ip').json()['origin']
111.111.111.111
>>> proxies = {'https':'10.0.0.0:1234'}
>>> b.timeout = 10  # could be useful to wait 10 seconds if proxies are slow
>>> b.get('https://httpbin.org/ip', proxies=proxies).json()['origin']
10.0.0.0

Set HTTP proxy working with HTTPS request for all requests

For SOCKS proxies see Requests documentation

>>> b.get('https://httpbin.org/ip').json()['origin']
111.111.111.111
>>> b.proxies = {'https':'10.0.0.0:1234'}
>>> b.timeout = 10  # could be useful to wait 10 seconds if proxies are slow
>>> b.get('https://httpbin.org/ip').json()['origin']
10.0.0.0

Set HTTP proxy working with HTTPS request for all request and another proxy for a specific domain

For SOCKS proxies see Requests documentation

>>> b.get('https://httpbin.org/ip').json()['origin']
111.111.111.111
>>> b.proxies = {'https':'10.0.0.0:1234', 'https://specific-domain.com' : '10.11.12.13:1234'}
>>> b.timeout = 10  # could be useful to wait 10 seconds if proxies are slow
>>> b.get('https://httpbin.org/ip').json()['origin']
10.0.0.0
>>> b.get('https://specific-domain.com/ip').json()['origin']
10.11.12.13

Get beautifulsoup instance

After a get or post request, Browser.bs (beautifulsoup) is automatically initiated with b.response.text

See Beautifll Soup documentation

>>> b.get('example.com')
>>> b.bs.find_all('a')

Get requests objects instances

See Requests documentation

>>> b.get('example.com')
>>> b.session
>>> b.request
>>> b.response

Get browser history

>>> b.get('example1.com')
>>> b.get('example2.com')
>>> b.get('example3.com')
>>> print b.history
['example1.com', 'example2.com', 'example3.com']

Disable "InsecureRequestWarning: Unverified HTTPS request is being made"

>>> import urllib3
>>> urllib3.disable_warnings()
>>> b.get('example.com')  # no warnings 

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pynav2-2.1.tar.gz (12.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pynav2-2.1-py3-none-any.whl (11.5 kB view details)

Uploaded Python 3

File details

Details for the file pynav2-2.1.tar.gz.

File metadata

  • Download URL: pynav2-2.1.tar.gz
  • Upload date:
  • Size: 12.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.19.1 setuptools/40.4.3 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/3.6.6

File hashes

Hashes for pynav2-2.1.tar.gz
Algorithm Hash digest
SHA256 403222943708fa79ecc4453dfe81ffa05ec841056295a581667e94d8045fc5c8
MD5 35dea6598ee3933c0a08833a5b03147b
BLAKE2b-256 411731b4fd7f4474911b9b5d0d50bad87d163539d35835e3c0f22ddcc75a21c4

See more details on using hashes here.

File details

Details for the file pynav2-2.1-py3-none-any.whl.

File metadata

  • Download URL: pynav2-2.1-py3-none-any.whl
  • Upload date:
  • Size: 11.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.19.1 setuptools/40.4.3 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/3.6.6

File hashes

Hashes for pynav2-2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 c8c3c5be9821280bc418a88ab036ee9e38ebee22ef50a5663c3f8f862f5ef5a0
MD5 a757633dc00d99b388cfc32370e9d8cb
BLAKE2b-256 bae05be9087f82aae9a0ca988cef6b6c375c64a4da01ce118c0639d8a13cdd2e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page