Skip to main content

A Toolkit for automatic LLM Evaluation.

Project description

TAIL

📄 Documentation

See our full documentation at https://yale-nlp.github.io/TAIL/.

💡 Introduction

TAIL is an automatic toolkit for creating realistic evaluation benchmarks and assessing the performance of long-context LLMs. With TAIL, users can customize the building of a long-context, document-grounded QA benchmark and obtain visualized performance metrics of evaluated models.

🚀 Quickstart

  1. install the package from PyPi:

    # (Recommended) Create a new conda environment.
    conda create -n tail python=3.10 -y
    conda activate tail
    
    # Install tailtest
    pip install tailtest
    

    set yout OPENAI_API_KEY:

    export OPENAI_API_KEY="..."
    
  2. Prepare a source document you want to use to generate benchmark and organize in the format of json. [{"text": "Content of your document"}]

  3. Benchmark Generation:

    tail-cli.build --raw_document_path "/data/raw.json" --QA_save_path "/data/QA.json" --document_length 8000 32000 64000 --depth_list 25 50 75
    
  4. Model Evaluation & Testing:

    tail-cli.eval --QA_save_path "/data/QA.json" --test_model_name "gpt-4o" --test_depth_list 25 75 --test_doc_length 8000 32000 --test_result_save_dir "/data/result/"
    

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tail_test-0.0.5.tar.gz (9.1 kB view details)

Uploaded Source

Built Distribution

tail_test-0.0.5-py3-none-any.whl (10.0 kB view details)

Uploaded Python 3

File details

Details for the file tail_test-0.0.5.tar.gz.

File metadata

  • Download URL: tail_test-0.0.5.tar.gz
  • Upload date:
  • Size: 9.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.2

File hashes

Hashes for tail_test-0.0.5.tar.gz
Algorithm Hash digest
SHA256 7622c75e272d739c453086d7b029cbc803d9912f856a1390fa565d73b8b84fe3
MD5 0844346ca4b91b218a230bf0bce97ad7
BLAKE2b-256 b7db58ba8e697d0fcddcdaf877052966234030b870866f82c04bae7d34353c53

See more details on using hashes here.

File details

Details for the file tail_test-0.0.5-py3-none-any.whl.

File metadata

  • Download URL: tail_test-0.0.5-py3-none-any.whl
  • Upload date:
  • Size: 10.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.2

File hashes

Hashes for tail_test-0.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 02e96793a3231bc520786d2572ebef2bb88c154bdb40e388a32fa83fe0206ab8
MD5 807a1ce383a05fe95f67131bee978044
BLAKE2b-256 b0db6ebaf1451ca6ee8f19fa88dbe5680c49e85ccf11731f3da690357fcec88e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page