Skip to main content

Check for data drift with OAI data

Project description

ft-drift

ft-drift helps you check for data drift by comparing two OpenAI multi-turn chat jsonl files.

Install

pip install ft_drift

Background

Checking for dataset drift can help you debug if:

  1. Your model is trained on data that doesn’t reflect production (different prompts, functions, etc).
  2. Your training data contains unexpected or accidental artifacts.

In either situation, you can compare data from relevant sources (i.e. production vs fine-tuning) to find unwanted changes. This is one of the most common source of errors when fine-tuning models!

The demo below shows a cli tool used to detect data drift between two files, file_a.jsonl and file_b.jsonl. Afterwards, a table of important tokens that account for the drift are shown, such as:

  • END-UI-FORMAT
  • UI-FORMAT
  • “```json”
  • etc.

Usage

After installing ft_drift, the cli command detect_drift will be available to you.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ft-drift-0.0.4.tar.gz (12.4 kB view hashes)

Uploaded Source

Built Distribution

ft_drift-0.0.4-py3-none-any.whl (12.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page