Skip to main content

A Toolkit for automatic LLM Evaluation.

Project description

TAIL

📄 Documentation

See our full documentation at https://yale-nlp.github.io/TAIL/.

💡 Introduction

TAIL is an automatic toolkit for creating realistic evaluation benchmarks and assessing the performance of long-context LLMs. With TAIL, users can customize the building of a long-context, document-grounded QA benchmark and obtain visualized performance metrics of evaluated models.

🚀 Quickstart

  1. install the package from PyPi:

    # (Recommended) Create a new conda environment.
    conda create -n tail python=3.10 -y
    conda activate tail
    
    # Install tailtest
    pip install tailtest
    

    set yout OPENAI_API_KEY:

    export OPENAI_API_KEY="..."
    
  2. Prepare a source document you want to use to generate benchmark and organize in the format of json. [{"text": "Content of your document"}]

  3. Benchmark Generation:

    tail-cli.build --raw_document_path "/data/raw.json" --QA_save_path "/data/QA.json" --document_length 8000 32000 64000 --depth_list 25 50 75
    
  4. Model Evaluation & Testing:

    tail-cli.eval --QA_save_path "/data/QA.json" --test_model_name "gpt-4o" --test_depth_list 25 75 --test_doc_length 8000 32000 --test_result_save_dir "/data/result/"
    

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tail_test-0.1.0.tar.gz (9.0 kB view details)

Uploaded Source

Built Distribution

tail_test-0.1.0-py3-none-any.whl (9.8 kB view details)

Uploaded Python 3

File details

Details for the file tail_test-0.1.0.tar.gz.

File metadata

  • Download URL: tail_test-0.1.0.tar.gz
  • Upload date:
  • Size: 9.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.2

File hashes

Hashes for tail_test-0.1.0.tar.gz
Algorithm Hash digest
SHA256 8c8e9889dfe2b5868a3217d5202fa481138c775afa914b2a6a55e25064ab5d97
MD5 7b042eb0f8d3999a5e348734a11dcf34
BLAKE2b-256 d47117688269eec2e9f24cbb57c70d7e71a9b9f29dfe686177569566772808c8

See more details on using hashes here.

File details

Details for the file tail_test-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: tail_test-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 9.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.2

File hashes

Hashes for tail_test-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 345b7d58641a3bfe330a456d22a321cf5d4bdd25fdd9febad13b0a76a8ff43b8
MD5 0f7a2553ec3ed3c5857d51178f550005
BLAKE2b-256 85817ae4d8f509b7c91dd83427dff53232118b14488c36cb9d50b59da35d3293

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page