tools for formatting datasets for fine tuning
Project description
format_conversation_dataset
Convert your diarized content into a dataset that can be used to finetune a model!
Install
pip install format_conversation_dataset
How to use
Designate a speaker number as the ‘assistant’ and supply input and output file paths, and this module with do the rest.
from format_covnersation_dataset.core import *
convert_file(‘input/file/path’, ‘output/file/path’, 1, “You are participating in a conversation”)
This will output a json format like so:
{‘messages’: [ { ‘role’ : ‘system’, ‘content’ : ‘You are participating
in a conversation’ }, { ‘role’ : ‘user’, ‘content’ : ‘SPEAKER_02 : Hello
everyone SPEAKER_03 : Good morning SPEAKER_04 : Hi’ }, { ‘role’ :
‘assistant’, ‘content’ : ‘Hi!’ },
]}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file format_conversation_dataset-0.0.1.tar.gz
.
File metadata
- Download URL: format_conversation_dataset-0.0.1.tar.gz
- Upload date:
- Size: 8.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.14
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
5c970f9f35a128e41a26c024bf51ad25e398d699d2461c1e156bc3f5348e8b76
|
|
MD5 |
4894ecfa0e827cc51936a129d59a6705
|
|
BLAKE2b-256 |
9b0f985d4f20b2ae0b70d44f99397abb5fee6809dd859c901a714595b43254ed
|
File details
Details for the file format_conversation_dataset-0.0.1-py3-none-any.whl
.
File metadata
- Download URL: format_conversation_dataset-0.0.1-py3-none-any.whl
- Upload date:
- Size: 8.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.14
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
fd1917e15b2d466ac48e9eb20e2a0e820dac35ceaadaefbf2f1be217e2de8ff3
|
|
MD5 |
78903655adf84e96fe21f5e1546467c2
|
|
BLAKE2b-256 |
a1f3a57eacef20d76e3d41e069585ea47c9dc1655e6fcf02b5860a2994023063
|