No project description provided

Project description

Banner

🍌 Open source AI Agent evaluations for web tasks 🍌

Python

Banana-lyzer

Introduction

Banana-lyzer is an open source AI Agent evaluation framework and dataset for web tasks with Playwright. We've created our own evals repo because:

Websites change overtime, are affected by latency, and may have anti bot protections. We need a system that can reliably save and deploy historic/static snapshots of websites.
Standard web practices are loose and there is an abundance of different underlying ways to represent a single individual website. For an agent to best generalize, we require building a diverse dataset of websites across industries and use-cases.
We have specific evaluation criteria and agent use cases focusing on structured and direct information retrieval across websites.
There exists valuable web task datasets and evaluations that we'd like to unify in a single repo (Mind2Web, WebArena, etc).

How does it work?

Banana-lyzer is a CLI tool that runs a set of evaluations against a set of websites. It will run each evaluation multiple times and output the results to a JSON file. The results can then be used to train an AI agent. The package is separated into two parts, a web server that serves websites We currently support the following types of websites:

Local static sites:
Remote static sites:
Remote dynamic sites: Typical consumer facing websites today.

Note that this repo is very much a work in progress,

Getting Started

Local testing installation

pip install ___
Implement the agent_runner.py interface and make a banalyzer.py test file
Run bananalyze ./tests/banalyzer.py to run the test suite again

Arguments

-h or --headless: Run Playwright headless mode

Adding evaluations

To add a snaps

Roadmap

Launch

Functions to serve local MHTML sites
Agent interface required for running the tool
Pytest wrapper to enable CLI testing with additional arguments
Document a majority of the repo

Features

Ability to save
Translate WebArena evals
Translate Mind2Web evals
Lag and bot detection emulation
Updated test visualization with separation of categories and outputs

Dataset updates

15 additional data retrieval evals
15 click evals
15 navigation evals
Tests requiring multi-step navigation
Tests requiring both navigation and data retrieval
Tests requiring pop-up closing
Tests requiring sign-in
Tests requiring captcha solving

Citations

bibtex
@misc{reworkd2023bananalyzer,
  title        = {Bananalyzer},
  author       = {Asim Shrestha and Adam Watkins and Rohan Pandey and Srijan Subedi},
  year         = {2023},
  howpublished = {GitHub},
  url          = {https://github.com/reworkd/bananalyzer}
}

Project details

Release history Release notifications | RSS feed

0.8.74

Apr 5, 2024

0.8.73

Apr 3, 2024

0.8.72

Mar 19, 2024

0.8.70

Feb 20, 2024

0.8.69

Feb 20, 2024

0.8.68

Feb 19, 2024

0.8.67

Feb 19, 2024

0.8.66

Feb 19, 2024

0.8.65

Feb 16, 2024

0.8.64

Feb 16, 2024

0.8.63

Feb 14, 2024

0.8.62

Feb 11, 2024

0.8.61

Feb 6, 2024

0.8.6

Jan 30, 2024

0.8.5

Jan 19, 2024

0.8.3

Jan 17, 2024

0.8.2

Jan 17, 2024

0.8.1

Jan 17, 2024

0.8.0

Jan 17, 2024

0.7.5

Jan 15, 2024

0.7.4

Jan 15, 2024

0.7.3

Jan 15, 2024

0.7.2

Jan 14, 2024

0.7.1

Jan 13, 2024

0.7.0

Dec 14, 2023

0.6.20

Dec 13, 2023

0.6.19

Dec 11, 2023

0.6.18

Dec 5, 2023

0.6.17

Dec 5, 2023

0.6.16

Dec 5, 2023

0.6.15

Dec 5, 2023

0.6.13

Dec 4, 2023

0.6.12

Dec 4, 2023

0.6.11

Dec 2, 2023

0.6.10

Dec 1, 2023

0.6.9

Nov 30, 2023

0.6.8

Nov 29, 2023

0.6.7

Nov 29, 2023

0.6.6

Nov 29, 2023

0.6.5

Nov 29, 2023

0.6.4

Nov 28, 2023

0.6.3

Nov 28, 2023

0.6.2

Nov 28, 2023

0.6.1

Nov 28, 2023

0.5.7

Nov 28, 2023

0.5.6

Nov 27, 2023

0.5.5

Nov 27, 2023

0.5.4

Nov 24, 2023

0.5.3

Nov 23, 2023

0.5.2

Nov 23, 2023

0.5.1

Nov 22, 2023

0.5.0

Nov 22, 2023

0.3.5

Nov 16, 2023

0.3.4

Nov 16, 2023

0.3.3

Nov 16, 2023

0.3.2

Nov 16, 2023

0.3.1

Nov 15, 2023

0.3.0

Nov 15, 2023

0.2.9

Nov 15, 2023

0.2.8

Nov 15, 2023

0.2.7

Nov 15, 2023

0.2.6

Nov 15, 2023

0.2.5

Nov 15, 2023

0.2.4

Nov 14, 2023

0.2.3

Nov 14, 2023

0.2.2

Nov 13, 2023

0.2.1

Nov 14, 2023

0.2.0

Nov 13, 2023

0.1.9

Nov 13, 2023

0.1.8

Nov 10, 2023

0.1.7

Nov 10, 2023

0.1.6

Nov 10, 2023

0.1.5

Nov 9, 2023

0.1.4

Nov 8, 2023

0.1.3

Nov 8, 2023

0.1.2

Nov 8, 2023

This version

0.1.1

Nov 7, 2023

0.1.0

Nov 7, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bananalyzer-0.1.1.tar.gz (5.8 MB view hashes)

Uploaded Nov 7, 2023 Source

Built Distribution

bananalyzer-0.1.1-py3-none-any.whl (5.9 MB view hashes)

Uploaded Nov 7, 2023 Python 3

Hashes for bananalyzer-0.1.1.tar.gz

Hashes for bananalyzer-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`a5cd5e6f199cfef19fa631a5a6f559778c0b77f214a800a3f996cd5bfd49ad29`
MD5	`b72f6e15225a3b360355a35363008cde`
BLAKE2b-256	`cd4830c7884f2dd15be6bba9bc2a20ef2b33a65e647d891f58b642a815ee78a8`

Hashes for bananalyzer-0.1.1-py3-none-any.whl

Hashes for bananalyzer-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`16cbf8e98a0b4dbf45e8a284b1da09149c9f96252b089812ca99c5f84e9e62a5`
MD5	`22e0f6b2c888e8a358d10fd335698cb3`
BLAKE2b-256	`209087ba2051d2887e429164c60e7e36cac8d8afc396bc880cfcc9ad74672f5d`