TODO
Project description
Standalone HFLM
This is a standalone version the HFLM
class from Eleuther AI's Language Model Evaluation Harness. The goal is to support the Evaluation Harness's LM
interface with minimal dependencies.
The Interface
class LM(abc.ABC):
@abc.abstractmethod
def loglikelihood(self, requests) -> List[Tuple[float, bool]]:
pass
@abc.abstractmethod
def loglikelihood_rolling(self, requests) -> List[Tuple[float]]:
pass
@abc.abstractmethod
def generate_until(self, requests) -> List[str]:
pass
The type of requests
depends on which of the functions you call.
Using the Model
Example usage. All three of the LM
functions support disable_tqdm
.
In [1]: from huggingface_model import HFLM
In [2]: m = HFLM(model="meta-llama/Meta-Llama-3.1-8B-Instruct", device="cuda", batch_size=16)
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:02<00:00, 1.69it/s]
In [3]: m.loglikelihood_rolling(["This is a test."])
0%| | 0/1 [00:00<?, ?it/s]We detected that you are passing `past_key_values` as a tuple and this is deprecated and will be removed in v4.43. Please use an appropriate `Cache` class (https://huggingface.co/docs/transformers/v4.41.3/en/internal/generation_utils#transformers.Cache)
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 1.52it/s]
Out[3]: [-35.796539306640625]
In [4]: m.loglikelihood_rolling(["This is a test."], disable_tqdm=True)
Out[4]: [-35.796539306640625]
In [5]: m.loglikelihood([("a"*n, "b"*n) for n in range(2,4)], disable_tqdm=False)
Running loglikelihood requests: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 67.60it/s]
Out[5]: [(-7.311624526977539, False), (-8.69050407409668, False)]
In [6]: m.generate_until([("Who are you?", {'max_new_tokens':16}), ("What do you want?", {})])
Running generate_until requests: 0%| | 0/2 [00:00<?, ?it/s]2024-08-05 19:10:26.746184: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-08-05 19:10:26.756972: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-08-05 19:10:26.760308: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-08-05 19:10:26.768061: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-08-05 19:10:27.298648: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
/home/richard/miniforge3/envs/ml/lib/python3.11/site-packages/transformers/generation/configuration_utils.py:567: UserWarning: `do_sample` is set to `False`. However, `temperature` is set to `0.6` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `temperature`.
warnings.warn(
/home/richard/miniforge3/envs/ml/lib/python3.11/site-packages/transformers/generation/configuration_utils.py:572: UserWarning: `do_sample` is set to `False`. However, `top_p` is set to `0.9` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_p`.
warnings.warn(
Both `max_new_tokens` (=16) and `max_length`(=261) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
Running generate_until requests: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:01<00:00, 1.21it/s]
Out[6]:
[' What do you want?"\n"I am a messenger from the Lord of the Elements,"',
"'\n'You know what I want,' he said, his voice low and menacing"]
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
hflm-0.0.1.tar.gz
(16.6 kB
view details)
Built Distribution
hflm-0.0.1-py3-none-any.whl
(16.2 kB
view details)
File details
Details for the file hflm-0.0.1.tar.gz
.
File metadata
- Download URL: hflm-0.0.1.tar.gz
- Upload date:
- Size: 16.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.14
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b339423904003e0ac134f02b0330c606007d31aaecc5de7be17698b113e87db8 |
|
MD5 | 5c955badae8758f9a101aa768ce6cf00 |
|
BLAKE2b-256 | 61bb29a55ab0a4490eaa1cca9e0ebdc7f4a1f056090d3c691ae5f660e4c86a60 |
File details
Details for the file hflm-0.0.1-py3-none-any.whl
.
File metadata
- Download URL: hflm-0.0.1-py3-none-any.whl
- Upload date:
- Size: 16.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.14
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0d270218e3fe47a5abaf74bc9b73e68cbc509240e433335709a69dfa38f9c5e0 |
|
MD5 | 1e3ab4f7b6708bea092bd198e3f87a9b |
|
BLAKE2b-256 | f4a209c51dd48b0956a29bdfb1c6aa0c33c9208858f71fc3f4a94dbb5eb6db0d |