Skip to main content

A Toolkit for automatic LLM Evaluation.

Project description

TAIL

📄 Documentation

See our full documentation at https://yale-nlp.github.io/TAIL/.

💡 Introduction

TAIL is an automatic toolkit for creating realistic evaluation benchmarks and assessing the performance of long-context LLMs. With TAIL, users can customize the building of a long-context, document-grounded QA benchmark and obtain visualized performance metrics of evaluated models.

🚀 Quickstart

  1. install the package from PyPi:

    # (Recommended) Create a new conda environment.
    conda create -n tail python=3.10 -y
    conda activate tail
    
    # Install tailtest
    pip install tailtest
    

    set yout OPENAI_API_KEY:

    export OPENAI_API_KEY="..."
    
  2. Prepare a source document you want to use to generate benchmark and organize in the format of json. [{"text": "Content of your document"}]

  3. Benchmark Generation:

    tail-cli.build --raw_document_path "/data/raw.json" --QA_save_path "/data/QA.json" --document_length 8000 32000 64000 --depth_list 25 50 75
    
  4. Model Evaluation & Testing:

    tail-cli.eval --QA_save_path "/data/QA.json" --test_model_name "gpt-4o" --test_depth_list 25 75 --test_doc_length 8000 32000 --test_result_save_dir /data/result/
    

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tail_test-0.0.2.tar.gz (8.0 kB view details)

Uploaded Source

Built Distribution

tail_test-0.0.2-py3-none-any.whl (9.0 kB view details)

Uploaded Python 3

File details

Details for the file tail_test-0.0.2.tar.gz.

File metadata

  • Download URL: tail_test-0.0.2.tar.gz
  • Upload date:
  • Size: 8.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.14

File hashes

Hashes for tail_test-0.0.2.tar.gz
Algorithm Hash digest
SHA256 761a01e7e55a77068f758ffa786d2f1d218965608de40abcc26e6a90afdccf15
MD5 c8da67b3951bfea99e946c4ba6051949
BLAKE2b-256 51f5b76eeda14b34721328c1a254057c5e67201b13f4da5837d70f33c3039883

See more details on using hashes here.

File details

Details for the file tail_test-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: tail_test-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 9.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.14

File hashes

Hashes for tail_test-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 7482a47dc0f3f59a8c4713d02dc62029230b5e959ee3e55701db1fae5e279c46
MD5 8445c2e4b9adf73f121cb197a9057e5a
BLAKE2b-256 b8ba9d89ed2c87e5bb5242b545df10c0a1a4de3c8e2df81de923e21e957e3b7a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page