Large language model to corpus
Project description
Introduction
The goal of this tool is to apply Large Language Models operations to monolingual corpus to generate parallell corpus.
Uses cases:
- Asking a model to translate, summarize, paraphrasing original sentence to be able to benchmark its performance
- For corpus generation tasks from monolingual corpus, like for example, translated corpus.
- When developing prompts for your application, enables to test the prompt over a list of sentence to do evaluations
You basically provide an input file and prompt and it generates a target corpus:
Quick start
For example, to use OpenAI ChatGPT to translate a file:
llm-to-corpus samples/eng.txt samples/fra.txt "translate to French"
To see models and options available:
llm-to-corpus --help
Usage
Evaluation with Chatgpt
Translate Flores200 corpus to evalute quality of Catalan translation
llm-to-corpus samples/flores200.eng chatgpt.txt "Translate to Catalan the following text:"
pip install sacrebleu
sacrebleu samples/flores200.cat -i chatgpt.txt -m bleu chrf --format text
Evaluation with Bloom
Translate Flores200 corpus to evalute quality of Catalan translation
llm-to-corpus samples/flores200.eng bloom.txt "Translate to Catalan the following text:" --model mt0-xxl-mt
pip install sacrebleu
sacrebleu samples/flores200.cat -i bloom.txt -m bleu chrf --format text
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llm-to-corpus-0.0.3.tar.gz.
File metadata
- Download URL: llm-to-corpus-0.0.3.tar.gz
- Upload date:
- Size: 5.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4453cfffd3f53d532bbd7c3b01ec2699f9b6fe0d4add9d7060077bf350b21179
|
|
| MD5 |
e98d66ce09fc9961f1637bc62dd04b19
|
|
| BLAKE2b-256 |
fe76a18724e39fbbe4a6a2e85d5cf92580061e0a0bc130be66297795474169ee
|
File details
Details for the file llm_to_corpus-0.0.3-py3-none-any.whl.
File metadata
- Download URL: llm_to_corpus-0.0.3-py3-none-any.whl
- Upload date:
- Size: 10.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2a26923b0dc2b545a2e42d2abc1fe10eef09c81a1897948398b0765291ebe79e
|
|
| MD5 |
6b7c4f1c4bdb5b8d4bef60db3265a2f1
|
|
| BLAKE2b-256 |
9ab71c4e67ea009805c8b562d167223f462ef5d78a1eb7e687d0541a15d8b548
|