Check for data drift with OAI data

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

Development Status
- 4 - Beta
Intended Audience
- Developers
License
- OSI Approved :: Apache Software License
Natural Language
- English
Programming Language

Project description

ft-drift

ft-drift helps you check for data drift by comparing two OpenAI multi-turn chat jsonl files.

Install

pip install ft_drift

Background

Checking for dataset drift can help you debug if:

Your model is trained on data that doesn’t reflect production (different prompts, functions, etc).
Your training data contains unexpected or accidental artifacts.

In either situation, you can compare data from relevant sources (i.e. production vs fine-tuning) to find unwanted changes. This is one of the most common source of errors when fine-tuning models!

The demo below shows a cli tool used to detect data drift between two files, file_a.jsonl and file_b.jsonl. Afterwards, a table of important tokens that account for the drift are shown, such as:

END-UI-FORMAT
UI-FORMAT
“```json”
etc.

Usage

After installing ft_drift, the cli command detect_drift will be available to you.

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

Development Status
- 4 - Beta
Intended Audience
- Developers
License
- OSI Approved :: Apache Software License
Natural Language
- English
Programming Language

Release history Release notifications | RSS feed

0.0.13

Apr 10, 2024

0.0.12

Apr 10, 2024

0.0.11

Apr 10, 2024

0.0.10

Apr 10, 2024

0.0.9

Apr 10, 2024

0.0.8

Apr 10, 2024

0.0.7

Apr 10, 2024

0.0.6

Apr 10, 2024

0.0.5

Apr 10, 2024

This version

0.0.4

Apr 10, 2024

0.0.3

Apr 10, 2024

0.0.2

Apr 10, 2024

0.0.1

Apr 10, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ft-drift-0.0.4.tar.gz (12.4 kB view hashes)

Uploaded Apr 10, 2024 Source

Built Distribution

ft_drift-0.0.4-py3-none-any.whl (12.0 kB view hashes)

Uploaded Apr 10, 2024 Python 3

Hashes for ft-drift-0.0.4.tar.gz

Hashes for ft-drift-0.0.4.tar.gz
Algorithm	Hash digest
SHA256	`22fff415945e031121c297f554fe3a35b196f36771ac724b5511f91ad65a60f2`
MD5	`c9b1b6be354bc3298fccce7915a7f515`
BLAKE2b-256	`9d4822c976f392f9c1387815d7070a17bdfb4053ac12dd47675939398a27c6ad`

Hashes for ft_drift-0.0.4-py3-none-any.whl

Hashes for ft_drift-0.0.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`53f6b74baed75e8f2623f509d01d17bc0a810ad90047e6a197f6a393600851ca`
MD5	`5f5d9a64901e0bb9e0439bf1d24238a8`
BLAKE2b-256	`844c6771df513e69065bd108b221318ec8b0cc13b9b90d8e166d02de6d9ba3a0`