Skip to main content

AnglE-optimize Text Embeddings

Project description

AnglE📐: Angle-optimized Text Embeddings

It is Angle 📐, not Angel 👼.

🔥 A New SOTA for Semantic Textual Similarity!

https://arxiv.org/abs/2309.12871

PWC PWC PWC PWC PWC PWC PWC

📊 Click to show main results of AnglE

🤗 Pretrained Models

🤗 HF Backbone LLM Language Use Prompt Avg Score.
SeanLee97/angle-llama-7b-nli-v2 NousResearch/Llama-2-7b-hf Y EN Y 85.96
SeanLee97/angle-llama-7b-nli-20231027 NousResearch/Llama-2-7b-hf Y EN Y 85.90

💬 The model above was trained using BERT's hyperparameters. Currently, We are working on searching for even better hyperparameters for Angle-LLaMA. We plan to release more advanced pre-trained models that will further enhance performance. Stay tuned ;)😉

📝 Training Details:

1) SeanLee97/angle-llama-7b-nli-20231027

We fine-tuned AnglE-LLaMA using 4 RTX 3090 Ti (24GB), the training script is as follows:

CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node=4 --master_port=1234 train_angle.py \
--task NLI-STS --save_dir ckpts/NLI-STS-angle-llama-7b \
--w2 35 --learning_rate 2e-4 --maxlen 45 \
--lora_r 32 --lora_alpha 32 --lora_dropout 0.1 \
--save_steps 200 --batch_size 160 --seed 42 --do_eval 0 --load_kbit 4 --gradient_accumulation_steps 4 --epochs 1 

The evaluation script is as follows:

CUDA_VISIBLE_DEVICES=0,1 python eval.py \
    --load_kbit 16 \
    --model_name_or_path NousResearch/Llama-2-7b-hf \
    --lora_weight SeanLee97/angle-llama-7b-nli-20231027

Results

English STS Results

Model STS12 STS13 STS14 STS15 STS16 STSBenchmark SICKRelatedness Avg.
SeanLee97/angle-llama-7b-nli-20231027 78.68 90.58 85.49 89.56 86.91 88.92 81.18 85.90
SeanLee97/angle-llama-7b-nli-v2 79.00 90.56 85.79 89.43 87.00 88.97 80.94 85.96

Usage

Angle-LLaMA

  1. AnglE

Install AnglE first

python pip install -U angle-emb
from angle_emb import AnglE

angle = AnglE.from_pretrained('NousResearch/Llama-2-7b-hf', pretrained_lora_path='SeanLee97/angle-llama-7b-nli-v2')
angle.set_prompt()
print('prompt:', angle.prompt)
vec = angle.encode({'text': 'hello world'}, to_numpy=True)
print(vec)
vecs = angle.encode([{'text': 'hello world1'}, {'text': 'hello world2'}], to_numpy=True)
print(vecs)
  1. transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel, PeftConfig

peft_model_id = 'SeanLee97/angle-llama-7b-nli-v2'
config = PeftConfig.from_pretrained(peft_model_id)
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path).bfloat16().cuda()
model = PeftModel.from_pretrained(model, peft_model_id).cuda()

def decorate_text(text: str):
    return f'Summarize sentence "{text}" in one word:"'

inputs = 'hello world!'
tok = tokenizer([decorate_text(inputs)], return_tensors='pt')
for k, v in tok.items():
    tok[k] = v.cuda()
vec = model(output_hidden_states=True, **tok).hidden_states[-1][:, -1].float().detach().cpu().numpy()
print(vec)

Train Custom AnglE Model

1. Train NLI

  1. Prepare your gpu environment

  2. Install python dependencies

python -m pip install -r requirements.txt
  1. Download data
  • Download multi_nli + snli:
$ cd data
$ sh download_data.sh
  • Download sts datasets
$ cd SentEval/data/downstream
$ bash download_dataset.sh

2. Train w/ train_angle.py

The training interface is still messy, we are working on making it better. Currently you can modify train_angle.py to train your own models.

3. Custom Train

Coming soon!

Citation

You are welcome to use our code and pre-trained models. If you use our code and pre-trained models, please support us by citing our work as follows:

@article{li2023angle,
  title={AnglE-Optimized Text Embeddings},
  author={Li, Xianming and Li, Jing},
  journal={arXiv preprint arXiv:2309.12871},
  year={2023}
}

When using our pre-trained LLM-based models and using xxx in one word: prompt, it is recommended to cite the following work in addition to the above citation:

@article{jiang2023scaling,
  title={Scaling Sentence Embeddings with Large Language Models},
  author={Jiang, Ting and Huang, Shaohan and Luan, Zhongzhi and Wang, Deqing and Zhuang, Fuzhen},
  journal={arXiv preprint arXiv:2307.16645},
  year={2023}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

angle_emb-0.1.0.tar.gz (14.9 kB view details)

Uploaded Source

Built Distribution

angle_emb-0.1.0-py3-none-any.whl (12.5 kB view details)

Uploaded Python 3

File details

Details for the file angle_emb-0.1.0.tar.gz.

File metadata

  • Download URL: angle_emb-0.1.0.tar.gz
  • Upload date:
  • Size: 14.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.10

File hashes

Hashes for angle_emb-0.1.0.tar.gz
Algorithm Hash digest
SHA256 2fb35f737896fb02222c537880091d415da58347ba29f2d3c0ab84c9333c55ee
MD5 3f6cae6f46e13394ccf0033571eb9ff2
BLAKE2b-256 89d06d0648961a67d6a8f7017095088c97adbdd31a6d479a6eb3664a32ee7760

See more details on using hashes here.

File details

Details for the file angle_emb-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: angle_emb-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 12.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.10

File hashes

Hashes for angle_emb-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1946fb5473a3fb817c76572c5ac0d68e0fad162bb852a291df6d8388c72b66c6
MD5 65553005bc892276a3d98d4d9f4f3e1f
BLAKE2b-256 43ebecb771ca6188e9be3369550c03eaac1f0753d37f361f79786c0f3c358242

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page