Skip to main content

A Toolkit for automatic LLM Evaluation.

Project description

TAIL

📄 Documentation

See our full documentation at https://yale-nlp.github.io/TAIL/.

💡 Introduction

TAIL is an automatic toolkit for creating realistic evaluation benchmarks and assessing the performance of long-context LLMs. With TAIL, users can customize the building of a long-context, document-grounded QA benchmark and obtain visualized performance metrics of evaluated models.

🚀 Quickstart

  1. install the package from PyPi:

    # (Recommended) Create a new conda environment.
    conda create -n tail python=3.10 -y
    conda activate tail
    
    # Install tailtest
    pip install tailtest
    

    set yout OPENAI_API_KEY:

    export OPENAI_API_KEY="..."
    
  2. Prepare a source document you want to use to generate benchmark and organize in the format of json. [{"text": "Content of your document"}]

  3. Benchmark Generation:

    tail-cli.build --raw_document_path "/data/raw.json" --QA_save_path "/data/QA.json" --document_length 8000 32000 64000 --depth_list 25 50 75
    
  4. Model Evaluation & Testing:

    tail-cli.eval --QA_save_path "/data/QA.json" --test_model_name "gpt-4o" --test_depth_list 25 75 --test_doc_length 8000 32000 --test_result_save_dir "/data/result/"
    

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tail_test-0.1.2.tar.gz (9.0 kB view details)

Uploaded Source

Built Distribution

tail_test-0.1.2-py3-none-any.whl (9.8 kB view details)

Uploaded Python 3

File details

Details for the file tail_test-0.1.2.tar.gz.

File metadata

  • Download URL: tail_test-0.1.2.tar.gz
  • Upload date:
  • Size: 9.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.2

File hashes

Hashes for tail_test-0.1.2.tar.gz
Algorithm Hash digest
SHA256 e1ea54782c5d6382bf4aa4d59d98ac57018efe5c23fb14c013dfad1811c1ffd3
MD5 2282a0da27edd29ef2232bde70c5be47
BLAKE2b-256 524e764ab41a0674c6de2730887832107faad631613d175f6fcb7cfef594f670

See more details on using hashes here.

File details

Details for the file tail_test-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: tail_test-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 9.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.2

File hashes

Hashes for tail_test-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 1a59b550fc4a2d7e751677c41ce66366d9046ccf5bb2cd48d17438c262d1a9a4
MD5 22e44c47b70776e176488d14db13b534
BLAKE2b-256 560a2586ca6d29d316fa1461d738061070ad68d4df1146bdac27b5f1353501f6

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page