Skip to main content

Extract datasets from models and train slimmer LoRAs on them

Project description

unfat

Easily extract prompt/completion datasets from models and generate Axolotl configs to auto-distill smaller, slimmer LoRAs from the original models.

Example

from unfat.datasets import hub_prompts, HubSplit, Dataset, Prompts
from unfat.extract import Extractor, ClientOpts
from unfat.lora import LoraSettings
import os

output_dir = "output"
extractor = Extractor(
    # Extract from Qwen2.5-Coder-32B-Instruct
    teacher="hf:Qwen/Qwen2.5-Coder-32B-Instruct",
    # Make up to 10 concurrent requests at a time
    max_concurrent=10,
    output_dir=output_dir,
    # Use glhf.chat for the API
    client_opts=ClientOpts(
        base_url="https://glhf.chat/api/openai/v1",
        api_key=os.environ["GLHF_API_KEY"],
    ),
    # Pull the prompts from a coding dataset
    dataset=Dataset(
        train=[
            hub_prompts(
                name="perlthoughts/coding-prompts-small",
                text_field="instruction",
                split=HubSplit(name="train"),
            ),
        ],
    ),
)

# Runs the coding prompts through Qwen2.5-32B-Instruct and saves them to the
# output dir
extractor.run()

# Training hyperparameters
lora_settings = LoraSettings(
    lora_r=32,
    lora_alpha=16,
    lora_dropout=0.01,
    num_epochs=2,
    learning_rate=4e-4,
    warmup_steps=10,
)
# Save the Axolotl config to train a LoRA for Llama-3.1-70B-Instruct
axolotl_config = lora_settings.llama_70b_axolotl(extractor.output_dataset())
axolotl_config.save(output_dir)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

unfat-0.0.3.tar.gz (5.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

unfat-0.0.3-py3-none-any.whl (6.9 kB view details)

Uploaded Python 3

File details

Details for the file unfat-0.0.3.tar.gz.

File metadata

  • Download URL: unfat-0.0.3.tar.gz
  • Upload date:
  • Size: 5.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.0.1 CPython/3.13.1 Linux/6.12.10-200.fc41.x86_64

File hashes

Hashes for unfat-0.0.3.tar.gz
Algorithm Hash digest
SHA256 ca8b880e295b30a34606dc1fe92212f5807326d8123bd7d8f96d55e121aa9057
MD5 25ebca6a247911c6b267023ea4c874a5
BLAKE2b-256 ce7b7a70fabe13516cb2d8f464b2bfa12b3be7c4ceb7c56f6728b4a0ee5ea617

See more details on using hashes here.

File details

Details for the file unfat-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: unfat-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 6.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.0.1 CPython/3.13.1 Linux/6.12.10-200.fc41.x86_64

File hashes

Hashes for unfat-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 78d45ecbf1205fce6aea1c0bcc756fe1f19ef09b1dff30c01f5e7093fdd8350a
MD5 d7ab93cf9d16095eb91cae15a391decf
BLAKE2b-256 c281a4b344e9453a48d1cb80c38ea8d436da0481babe2e6149c493483413cee6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page