Skip to main content

Extract datasets from models and train slimmer LoRAs on them

Project description

unfat

Easily extract prompt/completion datasets from models and generate Axolotl configs to auto-distill smaller, slimmer LoRAs from the original models.

Example

from unfat.datasets import hub_prompts, HubSplit, Dataset, Prompts
from unfat.extract import Extractor, ClientOpts
from unfat.lora import LoraSettings
import os

output_dir = "output"
extractor = Extractor(
    # Extract from Qwen2.5-Coder-32B-Instruct
    teacher="hf:Qwen/Qwen2.5-Coder-32B-Instruct",
    # Make up to 10 concurrent requests at a time
    max_concurrent=10,
    output_dir=output_dir,
    # Use glhf.chat for the API
    client_opts=ClientOpts(
        base_url="https://glhf.chat/api/openai/v1",
        api_key=os.environ["GLHF_API_KEY"],
    ),
    # Pull the prompts from a coding dataset
    dataset=Dataset(
        train=[
            hub_prompts(
                name="perlthoughts/coding-prompts-small",
                text_field="instruction",
                split=HubSplit(name="train"),
            ),
        ],
    ),
)

# Runs the coding prompts through Qwen2.5-32B-Instruct and saves them to the
# output dir
extractor.run()

# Training hyperparameters
lora_settings = LoraSettings(
    lora_r=32,
    lora_alpha=16,
    lora_dropout=0.01,
    num_epochs=2,
    learning_rate=4e-4,
    warmup_steps=10,
)
# Save the Axolotl config to train a LoRA for Llama-3.1-70B-Instruct
axolotl_config = lora_settings.llama_70b_axolotl(extractor.output_dataset())
axolotl_config.save(output_dir)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

unfat-0.0.4.tar.gz (5.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

unfat-0.0.4-py3-none-any.whl (7.3 kB view details)

Uploaded Python 3

File details

Details for the file unfat-0.0.4.tar.gz.

File metadata

  • Download URL: unfat-0.0.4.tar.gz
  • Upload date:
  • Size: 5.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.0.1 CPython/3.13.1 Linux/6.12.10-200.fc41.x86_64

File hashes

Hashes for unfat-0.0.4.tar.gz
Algorithm Hash digest
SHA256 4bcd9cbde2e314015cef33b9fd543ab63ee3172573e069a1fc52e6d73a2c68b2
MD5 abfc828a1af1be42138e8f2aefe91954
BLAKE2b-256 b675efb2994c27e30fbe84dfe64d9c91632a4793ff4c584d8a2e48fa3e9049c3

See more details on using hashes here.

File details

Details for the file unfat-0.0.4-py3-none-any.whl.

File metadata

  • Download URL: unfat-0.0.4-py3-none-any.whl
  • Upload date:
  • Size: 7.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.0.1 CPython/3.13.1 Linux/6.12.10-200.fc41.x86_64

File hashes

Hashes for unfat-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 7ee62205ef890a2212a316e275d4148efc08d816350874eee825da4665d75ce7
MD5 b5eeb42f1dfdf6bcb24c4ce306659586
BLAKE2b-256 92b06ff2a510931a80dd3b17c6163d0a8023cf7b463dd61c8bda4876df48e9fd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page