Check for data drift with OAI data
Project description
ft-drift
ft-drift
helps you check for data drift by comparing two OpenAI
multi-turn chat jsonl
files.
Install
pip install ft_drift
Background
Checking for dataset drift can help you debug if:
- Your model is trained on data that doesn’t reflect production (different prompts, functions, etc).
- Your training data contains unexpected or accidental artifacts.
In either situation, you can compare data from relevant sources (i.e. production vs fine-tuning) to find unwanted changes. This is one of the most common source of errors when fine-tuning models!
The demo below shows a cli tool used to detect data drift between two
files, file_a.jsonl
and file_b.jsonl
. Afterwards, a table of
important tokens that account for the drift are shown, such as:
END-UI-FORMAT
UI-FORMAT
- “```json”
- etc.
Usage
After installing ft_drift
, the cli command detect_drift
will be
available to you.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.