Skip to main content

Extract datasets from models and train slimmer LoRAs on them

Project description

unfat

Easily extract prompt/completion datasets from models and generate Axolotl configs to auto-distill smaller, slimmer LoRAs from the original models.

Example

from unfat.datasets import hub_prompts, HubSplit, Dataset, Prompts
from unfat.extract import Extractor, ClientOpts
from unfat.lora import LoraSettings
import os

output_dir = "output"
extractor = Extractor(
    # Extract from Qwen2.5-Coder-32B-Instruct
    teacher="hf:Qwen/Qwen2.5-Coder-32B-Instruct",
    # Make up to 10 concurrent requests at a time
    max_concurrent=10,
    output_dir=output_dir,
    # Use glhf.chat for the API
    client_opts=ClientOpts(
        base_url="https://glhf.chat/api/openai/v1",
        api_key=os.environ["GLHF_API_KEY"],
    ),
    # Pull the prompts from a coding dataset
    dataset=Dataset(
        train=[
            hub_prompts(
                name="perlthoughts/coding-prompts-small",
                text_field="instruction",
                split=HubSplit(name="train"),
            ),
        ],
    ),
)

# Runs the coding prompts through Qwen2.5-32B-Instruct and saves them to the
# output dir
extractor.run()

# Training hyperparameters
lora_settings = LoraSettings(
    lora_r=32,
    lora_alpha=16,
    lora_dropout=0.01,
    num_epochs=2,
    learning_rate=4e-4,
    warmup_steps=10,
)
# Save the Axolotl config to train a LoRA for Llama-3.1-70B-Instruct
axolotl_config = lora_settings.llama_70b_axolotl(extractor.output_dataset())
axolotl_config.save(output_dir)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

unfat-0.0.1.tar.gz (5.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

unfat-0.0.1-py3-none-any.whl (6.9 kB view details)

Uploaded Python 3

File details

Details for the file unfat-0.0.1.tar.gz.

File metadata

  • Download URL: unfat-0.0.1.tar.gz
  • Upload date:
  • Size: 5.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.0.1 CPython/3.13.1 Linux/6.12.10-200.fc41.x86_64

File hashes

Hashes for unfat-0.0.1.tar.gz
Algorithm Hash digest
SHA256 f7470939bedbb89e0c6a0e35622b9a3d78aa04cb32d256739d6518bac50a8117
MD5 e0ce2f41a5796e5acd32d976764744dc
BLAKE2b-256 37c868bc37126c0386c38fa0a1d187189b196808a7d35230507766b17b8c454d

See more details on using hashes here.

File details

Details for the file unfat-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: unfat-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 6.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.0.1 CPython/3.13.1 Linux/6.12.10-200.fc41.x86_64

File hashes

Hashes for unfat-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 985455279a8a198af64541919c986234cd64a5bbf5749552637d0703cad5cc81
MD5 c15adda102f694fadb770f6b019f7184
BLAKE2b-256 f24a65ade1dc3e6625b04a18f13b1ecc3a465beb03335e89e6d27510332c771c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page