Skip to main content

A Toolkit for automatic LLM Evaluation.

Project description

TAIL

📄 Documentation

See our full documentation at https://yale-nlp.github.io/TAIL/.

💡 Introduction

TAIL is an automatic toolkit for creating realistic evaluation benchmarks and assessing the performance of long-context LLMs. With TAIL, users can customize the building of a long-context, document-grounded QA benchmark and obtain visualized performance metrics of evaluated models.

🚀 Quickstart

  1. install the package from PyPi:

    # (Recommended) Create a new conda environment.
    conda create -n tail python=3.10 -y
    conda activate tail
    
    # Install tailtest
    pip install tailtest
    

    set yout OPENAI_API_KEY:

    export OPENAI_API_KEY="..."
    
  2. Prepare a source document you want to use to generate benchmark and organize in the format of json. [{"text": "Content of your document"}]

  3. Benchmark Generation:

    tail-cli.build --raw_document_path "/data/raw.json" --QA_save_path "/data/QA.json" --document_length 8000 32000 64000 --depth_list 25 50 75
    
  4. Model Evaluation & Testing:

    tail-cli.eval --QA_save_path "/data/QA.json" --test_model_name "gpt-4o" --test_depth_list 25 75 --test_doc_length 8000 32000 --test_result_save_dir "/data/result/"
    

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tail_test-0.1.1.tar.gz (9.0 kB view details)

Uploaded Source

Built Distribution

tail_test-0.1.1-py3-none-any.whl (9.8 kB view details)

Uploaded Python 3

File details

Details for the file tail_test-0.1.1.tar.gz.

File metadata

  • Download URL: tail_test-0.1.1.tar.gz
  • Upload date:
  • Size: 9.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.2

File hashes

Hashes for tail_test-0.1.1.tar.gz
Algorithm Hash digest
SHA256 3368268128accc55ecd3f7e2ee1da61e661b97b654cf1b9d2ecf4bc99ad4a3e4
MD5 45aa3f9db1f4d75b435a8a5c3abd56a4
BLAKE2b-256 90edfc69193b33315085e7411764e20c503aab55fbeb1d61fd26b6efa6e1d136

See more details on using hashes here.

File details

Details for the file tail_test-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: tail_test-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 9.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.2

File hashes

Hashes for tail_test-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 6da94ee8eabd3de46bdfd2920187f82fd04b16b5b6c23500627d5fbff808f11d
MD5 c8c8b691f3c34144580ba76f77b43360
BLAKE2b-256 80e11ba955ff9a941900005b42fe23697e06457f869bac05fef48645df6a7484

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page