Abstractions of web interactions
Project description
Web Traversal Library
The Web Traversal Library (WTL) is a Python library for abstracting web interactions on top of a base execution layer such as Selenium.
Installation
Run pip install webtraversallibrary
. That's it.
Usage example
Glossary
You will find more information in the API docs. As a high-level overview, common terms in the documentation are:
-
Workflow: The main orhcestrating class handling the main "event loop". Sometimes "schema" is also used for the specification of a certain workflow.
-
View: A static snapshot of a current website in a tab, with metadata associated to the page and its elements, possibly augmented with certain ML classifiers.
-
Policy: WTL is based on principles of reinforcement learning, where a policy is a function of the current state (here, the snapshots of current open tabs) to a set of actions.
-
Classifier: These, along with
preload_callbacks
andpostload_callbacks
are arbitray code that is executed on each workflow iteration. A classifier takes a set of elements and returns either a subset or a mapping from elements to numeric scores. -
Config: A helper class containing string, numeric, or boolean values for a number of configurations related to WTL. Some are pregrouped under certain umbrella names, such as
desktop
(running as a Desktop browser, the default is mobile emulation), but all values can be arbitrarily set. See the documentation for theConfig
class for more information.
Getting started
See the documentation at webtraversallibrary.readthedocs.io!
Also watch "Machine Learning to Auto-Navigate Websites" given at PyCon SE 2020 for an introduction and examples.
General architecture
The flow in a workflow is as follows:
- Initialize the workflow with given config
- Navigate to given URLs
- Snapshot the pages
- Run all classifiers
- Check if the goal is fulfilled, if so exit
- Call policy with the current view(s)
- Execute the returned action(s)
- Goto 3
For more examples and usage, please run make docs
and look at docs/build/html/index.html
.
Development setup
All development requirements are in requirements.txt. Install the packages from pip. Helper commands are available to create a virtual environment - make env-create
and make env-update
.
To lint the JavaScript files (not required unless you're editing them) you need jshint
which is available from npm.
How to contribute
See our guide on contributing.
Release History
See our changelog.
License
Copyright © 2020 Klarna Bank AB
For license details, see the LICENSE file in the root of this project.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file webtraversallibrary-0.13.2.tar.gz
.
File metadata
- Download URL: webtraversallibrary-0.13.2.tar.gz
- Upload date:
- Size: 210.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.9.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2d44f3148739da6eca3453ca953c0b40fedc99acd634d875cda607cb76565f54 |
|
MD5 | 33e49894a231690ec889561e8c22a63c |
|
BLAKE2b-256 | 2b820ad7324024e9711db2d6ebf78f9460017e0333a11b5066c37a3ecf8ccce7 |
File details
Details for the file webtraversallibrary-0.13.2-py3-none-any.whl
.
File metadata
- Download URL: webtraversallibrary-0.13.2-py3-none-any.whl
- Upload date:
- Size: 241.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.9.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 36c5c780a2d0b908a70a10f11ddd591221637a23e45fa0df3188be5235b85f7f |
|
MD5 | 7e394e8d08bfcd7bca689c812106504d |
|
BLAKE2b-256 | 89ae8ec2882459a9012dd773da51b9f5b826dcb7ef732400744a42f6ee6679ec |