Skip to main content

Extract datasets from models and train slimmer LoRAs on them

Project description

unfat

Easily extract prompt/completion datasets from models and generate Axolotl configs to auto-distill smaller, slimmer LoRAs from the original models.

Example

from unfat.datasets import hub_prompts, HubSplit, Dataset, Prompts
from unfat.extract import Extractor, ClientOpts
from unfat.lora import LoraSettings
import os

output_dir = "output"
extractor = Extractor(
    # Extract from Qwen2.5-Coder-32B-Instruct
    teacher="hf:Qwen/Qwen2.5-Coder-32B-Instruct",
    # Make up to 10 concurrent requests at a time
    max_concurrent=10,
    output_dir=output_dir,
    # Use glhf.chat for the API
    client_opts=ClientOpts(
        base_url="https://glhf.chat/api/openai/v1",
        api_key=os.environ["GLHF_API_KEY"],
    ),
    # Pull the prompts from a coding dataset
    dataset=Dataset(
        train=[
            hub_prompts(
                name="perlthoughts/coding-prompts-small",
                text_field="instruction",
                split=HubSplit(name="train"),
            ),
        ],
    ),
)

# Runs the coding prompts through Qwen2.5-32B-Instruct and saves them to the
# output dir
extractor.run()

# Training hyperparameters
lora_settings = LoraSettings(
    lora_r=32,
    lora_alpha=16,
    lora_dropout=0.01,
    num_epochs=2,
    learning_rate=4e-4,
    warmup_steps=10,
)
# Save the Axolotl config to train a LoRA for Llama-3.1-70B-Instruct
axolotl_config = lora_settings.llama_70b_axolotl(extractor.output_dataset())
axolotl_config.save(output_dir)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

unfat-0.0.2.tar.gz (5.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

unfat-0.0.2-py3-none-any.whl (6.9 kB view details)

Uploaded Python 3

File details

Details for the file unfat-0.0.2.tar.gz.

File metadata

  • Download URL: unfat-0.0.2.tar.gz
  • Upload date:
  • Size: 5.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.0.1 CPython/3.13.1 Linux/6.12.10-200.fc41.x86_64

File hashes

Hashes for unfat-0.0.2.tar.gz
Algorithm Hash digest
SHA256 43a8fb4a84ec72dbd9b418b539d301e8a7cec50f3f8f22e70914d8c1f156ec4c
MD5 68d325656c4d862c478d2f772537166c
BLAKE2b-256 acffc9c3a49b4943e1da5a7b7685b4cbc20b85e15891286f4f478816890ce591

See more details on using hashes here.

File details

Details for the file unfat-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: unfat-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 6.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.0.1 CPython/3.13.1 Linux/6.12.10-200.fc41.x86_64

File hashes

Hashes for unfat-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 a4fa3656e24ad13dda61169fa1f6176eadc2844ccf86ae3d2c72937d929ac285
MD5 abbff711e5d2ddd8e9a5112a60437df0
BLAKE2b-256 266d26afe3dbe69226a6b4232633ab7c84b598d534d072f7782d36ef4192ecae

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page