Skip to main content

OpenAI Model Trainer and Formatter

Project description

codara-model-trainer

Overview

The codara-model-trainer is a Python package designed to assist in creating datasets for fine-tuning machine learning models, particularly language models. It simplifies the process of gathering and formatting training data in a JSON Lines (JSONL) format.

Features

  • Easy creation of training data sets in JSONL format.
  • Methods to set system instructions, training prompts, and generative responses.
  • Automatically handles file creation and appending data in the correct format.

Installation

pip install codara-model-trainer

Usage

  1. Create the data set with agent instructions, training prompts, and generative responses as needed:
    from codara_model_trainer import create_data_set
    
    gpt_response = openai_api_call("User prompt here")
    create_data_set("System instructions here", "User prompt here", gpt_response, "optional-filepath/filename.jsonl")
    

The data will be saved in the model-training/fine-tune-data-set.jsonl file if the filepath isn't set.

Structure of Data

The data is structured in JSON Lines format, where each line is a valid JSON object. An example of the data structure:

{
  "messages": [
    {
      "role": "system",
      "content": "System instructions here"
    },
    {
      "role": "user",
      "content": "User prompt here"
    },
    {
      "role": "assistant",
      "content": "Model response here"
    }
  ]
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

codara-model-trainer-2.0.0.tar.gz (2.6 kB view details)

Uploaded Source

Built Distribution

codara_model_trainer-2.0.0-py3-none-any.whl (3.3 kB view details)

Uploaded Python 3

File details

Details for the file codara-model-trainer-2.0.0.tar.gz.

File metadata

  • Download URL: codara-model-trainer-2.0.0.tar.gz
  • Upload date:
  • Size: 2.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.6

File hashes

Hashes for codara-model-trainer-2.0.0.tar.gz
Algorithm Hash digest
SHA256 4fb466f5ad0885438933c900941b5e9dd8ca68a5cbe9d23f3c7c7179dc22d106
MD5 4f69e5b6e8c7d964f37b962ba2509fa4
BLAKE2b-256 5f97e1edffb5d36746a3b9c114d4714bf8187b87280364b48d696cbf2c0db90d

See more details on using hashes here.

File details

Details for the file codara_model_trainer-2.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for codara_model_trainer-2.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 83877bed0cccc6f1031d872ff71f813f04e43a15c07c8da4970519628cdc4683
MD5 36b169a61cd563bd467896cd835bd84d
BLAKE2b-256 eec35665642aac12ae2a885a4086ead4a113889414cb236007f377416866fd08

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page