9 projects
llm2vec-gen
LLM2Vec-Gen: Generative Embeddings from Large Language Models
agent-as-annotators
Agent-as-Annotators: Structured Distillation of Web Agent Capabilities
instruct-qa
Empirical evaluation of retrieval-augmented instruction-following models.
weblinx
The official weblinx library
webllama
Llama-powered agents for automatic web browsing
statcan-dialogue-dataset
The Statcan Dialogue Dataset
weblinx-browsergym
BrowserGym integration for the WebLINX benchmark
safearena
SafeArena is a benchmark for agent safety
agent-reward-bench
Official library for AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories