Skip to main content

Find the Best Generation Parameters for your LLM & Dataset

Project description

llmsearch

llmsearch

Netlify Status Hits

Conduct hyperparameter search over generation parameters of large language models (LLMs). This tool is designed for ML practitioners looking to optimize their sampling strategies to improve model performance. Simply provide a model, dataset, and performance metric, llmsearch handles the rest.

Release blog

Contents

Installation

The package works best with python>=3.8.1, torch>=1.1 and transformers>=4.27.4. Hardware Stats from pynvml is used for caching batch size while running the search.

pip install llmsearch[pynvml]

If not running on a CUDA environment

pip install llmsearch

Getting Started

QuickStart

  • llama-3-8b Example Open in Colab - A quickstart notebook which shows the basic functionality of llmsearch. This notebook will help you understand how to quickly set up and run hyperparameter searches.

End-to-End Model Examples

  1. GSM8K Example - Shows a GridSearchCV ran on the GSM8K Dataset using the TheBloke/CapybaraHermes-2.5-Mistral-7B-AWQ model.
  2. Samsum Example - Shows a GridSearchCV ran on the samsum Dataset using a finetuned(on the same dataset) version of cognitivecomputations/dolphin-2.2.1-mistral-7b.

llmsearch llmsearch

Table shows metrics as a result of the search from the e2e examples above. Before After shows the metric before and after running the hyperparameter search.

Model Dataset Before After Samples Metric Best Parameters Metric File
TheBloke/CapybaraHermes-2.5-Mistral-7B-AWQ gsm8k 54.5 55.25 500 accuracy {'do_sample': True, 'generation_seed': 42, 'max_new_tokens': 500, 'no_repeat_ngram_size': 0, 'stopping_criteria': [<llmsearch.scripts.stopping_criteria.MultiTokenStoppingCriteria object at 0x7f8f9e357c40>], 'top_k': 10, 'top_p': 0.7} metric_file
Praful932/dolphin-2.2.1-mistral-7b-samsum-ft-v1-awq samsum 0.2543 0.2564 500 rouge_2 {'do_sample': True, 'generation_seed': 42, 'max_new_tokens': 70, 'no_repeat_ngram_size': 0, 'stopping_criteria': [<llmsearch.scripts.stopping_criteria.MultiTokenStoppingCriteria object at 0x7f3b38303610>], 'temperature': 0.1, 'top_k': 50} metric_file

Recommendations

  1. Generative Tasks : Searching for generation parameters is generally useful for tasks involving variable-length text outputs rather than tasks with constrained, discrete outputs. The impact of generation parameters on discrete outputs is very limited.
  2. Batch Size : The right batch size can significantly influence model performance. Experiment with different sizes to find the optimal setting that balances speed and accuracy without excessive padding.
  3. Stopping Criteria : Use stopping criteria while evaluating models so that the model does not endlessly generate tokens until max_new_tokens is reached. All of the examples in the repo use a stopping criteria.

Reproducibility

  • Achieving consistent results is crucial, especially when comparing different hyperparameter settings.
  • Quantization frameworks, such as exllama, are designed to significantly enhance inference speed at the cost of reduced reproducibility. However, AWQ is another quantization method that not only delivers high inference speed but also maintains a high degree of reproducibility. The e2e examples employ AWQ. Note that for certain applications where specific generation parameters are more critical than absolute reproducibility, exllama might still be preferable."
  • To ensure reproducibility during the generation process, we've introduced a generation_seed parameter in the model.generate method of the transformers module via monkey patching. This parameter allows you to seed the model's output generation, ensuring that results can be consistently replicated across different runs. Treating the generation_seed as a hyperparameter also allows for fine-tuning the stochastic nature of the model's output, providing another layer of control during experiments.
  • Batch size can affect performance during evaluation, most decoder models use left padding, The presence of pad token can affect the next token samples that are generated although very subtly, the effect becomes more pronounced over long sequences. So be sure to evaluate your model at the right batch size.

Monkey Patches

  • llmsearch modifies certain modules of the transformers library to work with scikit-learn, including:
    • Generation Modules - To support additional generation strategies (tfs, top_a) and the generation_seed parameter for reproducibility.
      • transformers.GenerationMixin._get_logits_warper - Older module available at transformers.GenerationMixin._get_logits_warper_old
      • transformers.GenerationConfig.__init__ - Older constructor available via transformers.GenerationConfig.__init__
    • Stopping Criteria - Added attribute to avoid cloning issues during searches.
      • StoppingCriteriaList.__sklearn_clone__
    • Generation - Added tfs & top_a support & generation_seed support in model.generate

References

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llmsearch-0.1.0.tar.gz (33.4 kB view details)

Uploaded Source

Built Distribution

llmsearch-0.1.0-py3-none-any.whl (36.5 kB view details)

Uploaded Python 3

File details

Details for the file llmsearch-0.1.0.tar.gz.

File metadata

  • Download URL: llmsearch-0.1.0.tar.gz
  • Upload date:
  • Size: 33.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.9.19 Linux/6.5.0-1021-azure

File hashes

Hashes for llmsearch-0.1.0.tar.gz
Algorithm Hash digest
SHA256 bfb9474a307ffbf0154b03cb4e864ac420ffacf9c06cae8314675a8f4d966232
MD5 0ca970b34a8885dfbbad8c353744071c
BLAKE2b-256 c728267ab85983b1babe97c85d7d615d8fe215b324cf0501f976c2250911b857

See more details on using hashes here.

File details

Details for the file llmsearch-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: llmsearch-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 36.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.9.19 Linux/6.5.0-1021-azure

File hashes

Hashes for llmsearch-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 49321160c95a606bacd105d5d004fcdf858f9f46d2584fcdcb8050a9efad8c6d
MD5 b1d820e73f8aa1b1d2cfd80643fa4483
BLAKE2b-256 a904bca18fbf6be5adc9ee1d9258f6870fc721f4770143fd553bd25ac8057f81

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page