Skip to main content

tools for formatting datasets for fine tuning

Project description

format_conversation_dataset

Convert your diarized content into a dataset that can be used to finetune a model!

Install

pip install format_conversation_dataset

How to use

Designate a speaker number as the ‘assistant’ and supply input and output file paths, and this module with do the rest.

from format_covnersation_dataset.core import *

convert_file(‘input/file/path’, ‘output/file/path’, 1, “You are participating in a conversation”)

This will output a json format like so:

{‘messages’: [ { ‘role’ : ‘system’, ‘content’ : ‘You are participating in a conversation’ }, { ‘role’ : ‘user’, ‘content’ : ‘SPEAKER_02 : Hello everyone SPEAKER_03 : Good morning SPEAKER_04 : Hi’ }, { ‘role’ : ‘assistant’, ‘content’ : ‘Hi!’ },
]}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

format_conversation_dataset-0.0.1.tar.gz (8.5 kB view details)

Uploaded Source

Built Distribution

format_conversation_dataset-0.0.1-py3-none-any.whl (8.2 kB view details)

Uploaded Python 3

File details

Details for the file format_conversation_dataset-0.0.1.tar.gz.

File metadata

File hashes

Hashes for format_conversation_dataset-0.0.1.tar.gz
Algorithm Hash digest
SHA256 5c970f9f35a128e41a26c024bf51ad25e398d699d2461c1e156bc3f5348e8b76
MD5 4894ecfa0e827cc51936a129d59a6705
BLAKE2b-256 9b0f985d4f20b2ae0b70d44f99397abb5fee6809dd859c901a714595b43254ed

See more details on using hashes here.

File details

Details for the file format_conversation_dataset-0.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for format_conversation_dataset-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 fd1917e15b2d466ac48e9eb20e2a0e820dac35ceaadaefbf2f1be217e2de8ff3
MD5 78903655adf84e96fe21f5e1546467c2
BLAKE2b-256 a1f3a57eacef20d76e3d41e069585ea47c9dc1655e6fcf02b5860a2994023063

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page