Skip to main content

The official weblinx library

Project description

Intro

Welcome to WebLINX's official repository! In addition to providing code used to train the models reported in our WebLINX paper, we also provide a comprehensive Python library (aka API) to help you work with the WebLINX dataset.

If you want to get started with weblinx, please check out the following places:

🌐 Website If you want a quick overview of the project, this is the best place to start.
📓 Colab Eager to try it out? Start by running this colab notebook!
🗄️ Docs You can find quickstart instructions, the official user guide, and all relevant API specifications in the docs.
📄 Paper If you want to get more in-depth, please read our paper, which provides comprehensive description of the project and report relevant results.
🤗 Dataset The official dataset page, you can download preprocessed dataset and follow instructions to get started.

If you want to learn more about the codebase itself, please keep on reading!

Installation

# Install the base package
pip install weblinx

# Install all dependencies
pip install weblinx[all]

# Install specific dependencies for...
# ...processing HTML 🖥️
pip install weblinx[processing]
# ...video processing 📽️
pip install weblinx[video]
# ...evaluating models 🔬
pip install weblinx[eval]
# ...development of this library 🛠️
pip install weblinx[dev]

Structure

This repository is structured in the following way:

Module Description
weblinx The __init__.py provides many useful abstractions to provide a Pythonic experience when working with the dataset. For example, you can use weblinx.Demonstration to manipulate a demonstration at a high-level, weblinx.Replay to focus on more finegrained details of the demonstration, including iterating over turns, or weblinx.Turn to focus on a specific turn. All relevant information is included in the documentations!
weblinx.eval Code for evaluating action models trained with WebLINX, it has both importable functions/metrics, but can also be accessed via command line
weblinx.processing Code for processing various inputs or outputs used by the models, it is extensively used in the models' processing code
weblinx.utils Miscellaneous utility functions used across the codebase.

Modeling

Our modeling/ repo-level directory has code for processing, training and evaluating the models reported in the paper (DMR, LLaMA, MindAct, Pix2Act, Flan-T5). It is separate from the weblinx library, which focuses on data processing and evaluation. You can use it by cloning this repository, and it is recommended to edit the files in modeling/ directly for your own needs. Our modeling code is separate from the weblinx library, but requires it as a dependency. You can install the modeling code by running:

# First, install the base package
pip install weblinx

# Then, clone this repo
git clone https://github.com/McGill-NLP/weblinx
cd weblinx/modeling

For the rest of the instructions, please take a look at the modeling README.

Evaluation

To install packages necessary for evaluation, run:

pip install weblinx[eval]

You can now access the evaluation module by importing in Python:

import weblinx.eval

Use weblinx.eval.metrics for evaluation metrics, weblinx.eval.__init__ for useful evaluation-related functions. You may also find it useful to take a look at weblinx.processing.outputs to get an idea of how to use the outputs of the model for evaluation.

To run the automatic evaluation, you can use the following command:

python -m weblinx.eval --help

For more examples on how to use weblinx.eval, take a look at the modeling README.

Note: We are still working on the code for weblinx.eval and weblinx.processing.outputs. If you have any questions or would like to contribute docs, please feel free to open an issue or a pull request.

Citations

If you use this library, please cite our work using the following:

@misc{lù2024weblinx,
      title={WebLINX: Real-World Website Navigation with Multi-Turn Dialogue}, 
      author={Xing Han Lù and Zdeněk Kasner and Siva Reddy},
      year={2024},
      eprint={2402.05930},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

License

This project's license can be found at LICENSE. Please note that the license of the data in tests/data follow the license from the official dataset, not the license of this repository. The official dataset's license can be found in the official dataset page. The license of the models trained using this repo might also differ - please find them in the respective model cards.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

weblinx-0.3.2.tar.gz (80.9 kB view details)

Uploaded Source

Built Distribution

weblinx-0.3.2-py3-none-any.whl (82.9 kB view details)

Uploaded Python 3

File details

Details for the file weblinx-0.3.2.tar.gz.

File metadata

  • Download URL: weblinx-0.3.2.tar.gz
  • Upload date:
  • Size: 80.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.6

File hashes

Hashes for weblinx-0.3.2.tar.gz
Algorithm Hash digest
SHA256 259946c2b08cf50b48929fdd1c17f09ff5808b7d53528d728d25b583b8a06c85
MD5 0bb624c4d318c4647dddc00244b416bf
BLAKE2b-256 4ac65d65086c948f8b0b874f5d94d33e87d7c698049c6aba37124a67aac5d7f2

See more details on using hashes here.

File details

Details for the file weblinx-0.3.2-py3-none-any.whl.

File metadata

  • Download URL: weblinx-0.3.2-py3-none-any.whl
  • Upload date:
  • Size: 82.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.6

File hashes

Hashes for weblinx-0.3.2-py3-none-any.whl
Algorithm Hash digest
SHA256 9ab8de1c631617827955debaeb76864b8b3122d230185af1f9c30f5c793e7213
MD5 019379d9b62cd18740b9f3924f1c4066
BLAKE2b-256 ea3c940850b54ea8b5927e0b794fda6839a7cf2cc542c03af2388a55f0483d7b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page