AnglE-optimize Text Embeddings
Project description
AnglE📐: Angle-optimized Text Embeddings
It is Angle 📐, not Angel 👼.
🔥 A New SOTA for Semantic Textual Similarity!
📊 Click to show main results of AnglE
🤗 Pretrained Models
🤗 HF | Backbone | LLM | Language | Use Prompt | Avg Score. |
---|---|---|---|---|---|
SeanLee97/angle-llama-7b-nli-v2 | NousResearch/Llama-2-7b-hf | Y | EN | Y | 85.96 |
SeanLee97/angle-llama-7b-nli-20231027 | NousResearch/Llama-2-7b-hf | Y | EN | Y | 85.90 |
💬 The model above was trained using BERT's hyperparameters. Currently, We are working on searching for even better hyperparameters for Angle-LLaMA. We plan to release more advanced pre-trained models that will further enhance performance. Stay tuned ;)😉
📝 Training Details:
1) SeanLee97/angle-llama-7b-nli-20231027
We fine-tuned AnglE-LLaMA using 4 RTX 3090 Ti (24GB), the training script is as follows:
CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node=4 --master_port=1234 train_angle.py \
--task NLI-STS --save_dir ckpts/NLI-STS-angle-llama-7b \
--w2 35 --learning_rate 2e-4 --maxlen 45 \
--lora_r 32 --lora_alpha 32 --lora_dropout 0.1 \
--save_steps 200 --batch_size 160 --seed 42 --do_eval 0 --load_kbit 4 --gradient_accumulation_steps 4 --epochs 1
The evaluation script is as follows:
CUDA_VISIBLE_DEVICES=0,1 python eval.py \
--load_kbit 16 \
--model_name_or_path NousResearch/Llama-2-7b-hf \
--lora_weight SeanLee97/angle-llama-7b-nli-20231027
Results
English STS Results
Model | STS12 | STS13 | STS14 | STS15 | STS16 | STSBenchmark | SICKRelatedness | Avg. |
---|---|---|---|---|---|---|---|---|
SeanLee97/angle-llama-7b-nli-20231027 | 78.68 | 90.58 | 85.49 | 89.56 | 86.91 | 88.92 | 81.18 | 85.90 |
SeanLee97/angle-llama-7b-nli-v2 | 79.00 | 90.56 | 85.79 | 89.43 | 87.00 | 88.97 | 80.94 | 85.96 |
Usage
Angle-LLaMA
- AnglE
Install AnglE first
python pip install -U angle-emb
from angle_emb import AnglE
angle = AnglE.from_pretrained('NousResearch/Llama-2-7b-hf', pretrained_lora_path='SeanLee97/angle-llama-7b-nli-v2')
angle.set_prompt()
print('prompt:', angle.prompt)
vec = angle.encode({'text': 'hello world'}, to_numpy=True)
print(vec)
vecs = angle.encode([{'text': 'hello world1'}, {'text': 'hello world2'}], to_numpy=True)
print(vecs)
- transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel, PeftConfig
peft_model_id = 'SeanLee97/angle-llama-7b-nli-v2'
config = PeftConfig.from_pretrained(peft_model_id)
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path).bfloat16().cuda()
model = PeftModel.from_pretrained(model, peft_model_id).cuda()
def decorate_text(text: str):
return f'Summarize sentence "{text}" in one word:"'
inputs = 'hello world!'
tok = tokenizer([decorate_text(inputs)], return_tensors='pt')
for k, v in tok.items():
tok[k] = v.cuda()
vec = model(output_hidden_states=True, **tok).hidden_states[-1][:, -1].float().detach().cpu().numpy()
print(vec)
Train Custom AnglE Model
1. Train NLI
-
Prepare your gpu environment
-
Install python dependencies
python -m pip install -r requirements.txt
- Download data
- Download multi_nli + snli:
$ cd data
$ sh download_data.sh
- Download sts datasets
$ cd SentEval/data/downstream
$ bash download_dataset.sh
2. Train w/ train_angle.py
The training interface is still messy, we are working on making it better. Currently you can modify train_angle.py
to train your own models.
3. Custom Train
Coming soon!
Citation
You are welcome to use our code and pre-trained models. If you use our code and pre-trained models, please support us by citing our work as follows:
@article{li2023angle,
title={AnglE-Optimized Text Embeddings},
author={Li, Xianming and Li, Jing},
journal={arXiv preprint arXiv:2309.12871},
year={2023}
}
When using our pre-trained LLM-based models and using xxx in one word:
prompt, it is recommended to cite the following work in addition to the above citation:
@article{jiang2023scaling,
title={Scaling Sentence Embeddings with Large Language Models},
author={Jiang, Ting and Huang, Shaohan and Luan, Zhongzhi and Wang, Deqing and Zhuang, Fuzhen},
journal={arXiv preprint arXiv:2307.16645},
year={2023}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file angle_emb-0.1.0.tar.gz
.
File metadata
- Download URL: angle_emb-0.1.0.tar.gz
- Upload date:
- Size: 14.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2fb35f737896fb02222c537880091d415da58347ba29f2d3c0ab84c9333c55ee |
|
MD5 | 3f6cae6f46e13394ccf0033571eb9ff2 |
|
BLAKE2b-256 | 89d06d0648961a67d6a8f7017095088c97adbdd31a6d479a6eb3664a32ee7760 |
File details
Details for the file angle_emb-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: angle_emb-0.1.0-py3-none-any.whl
- Upload date:
- Size: 12.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1946fb5473a3fb817c76572c5ac0d68e0fad162bb852a291df6d8388c72b66c6 |
|
MD5 | 65553005bc892276a3d98d4d9f4f3e1f |
|
BLAKE2b-256 | 43ebecb771ca6188e9be3369550c03eaac1f0753d37f361f79786c0f3c358242 |