osma-ai

Machine learning framework for building specialist models.

Project description

Osma Header

Osma is a powerful framework designed to significantly streamline the process of fine-tuning language models using data curated by larger, more capable teacher models, in effort to outperform the teacher models. It provides a structured approach to defining the curation process using signatures, generating high-quality training datasets, and fine-tuning local models. Osma is inspired by the research done by Stanford University's Natural Language Processing (NLP) Group, and is in alpha development.

Features

Dataset Management: Easy loading, manipulation, and saving of datasets.
Structured Signatures: Define strict input/output schemas ensuring consistency in generated data.
Teacher-Student Workflow: Use a managed model to curate training examples from raw data.
Trainset Curation: Automatically generate reasoning and labels for your dataset.
Filtering: Mechanisms to validate and filter generated data against ground truth or custom logic.
Local Fine-Tuning: Seamlessly fine-tune local models using curated datasets.
Evaluation: Tools to evaluate model performance against test sets.

Simple Example

import osma
from typing import Literal

# Load and shuffle data
ds = osma.Dataset("data.jsonl").shuffle()

# Define the task signature with inputs and outputs
classes = Literal["positive", "negative"]
sg = osma.Signature(
    osma.InputFields("text"),
    osma.OutputField("sentiment", classes),
    reasoning=True
)

# Initialize the Teacher Model
teacher = osma.LanguageModel("gemini/gemini-1.5-flash")

# Curate a training set
trainset = osma.Trainset(ds.range(0, 500), sg, teacher)
trainset.save("train.jsonl")

# Fine-tune a local Student Model
student = osma.LanguageModel("google/gemma-2-2b-it", provider=osma.ModelProvider.LOCAL)
student.train(trainset)

# Run Inference
print(student(sg, text="I love this framework!"))

Installation

Using uv:

uv install osma

Using pip:

pip install osma

Environment Variables

To use Osma, you must export the necessary keys for the models you intend to use.

HF_TOKEN: Required for accessing open-source models (student models).

When using Osma to curate a trainset, you will need to specify the appropriate API key for the managed model's provider:

GEMINI_API_KEY: Required if using Google Gemini as a teacher.
OPENAI_API_KEY: Required if using OpenAI models as a teacher.

Note: the above is a non-exhaustive list - you can find the appropriate API key for your model provider in the documentation for that provider.

Key Methods

Dataset

Initialize a dataset from a JSONL file.

ds = osma.Dataset("path/to/data.jsonl")

Randomly shuffle the dataset rows.

ds = ds.shuffle()

Select a subset of rows based on index range.

ds = ds.range(0, 100)

Return the first n rows of the dataset.

ds = ds.head(5)

Save the dataset to a file.

ds.save("output.jsonl")

Signature

Define a task signature with input fields, output fields, and optional reasoning.

sg = osma.Signature(osma.InputFields("input_col"), osma.OutputField("output_name", str), reasoning=True)

Trainset

Curate a new trainset by processing a dataset with a teacher model.

ts = osma.Trainset(ds, sg, teacher_model)

Load an existing trainset from a file.

ts = osma.Trainset("train.jsonl")

Filter rows based on a comparison function between generated and source data.

ts = ts.filter(ds, lambda x, y: x['field'] == y['field'])

Save the trainset to a file.

ts.save("curated.jsonl")

Randomly shuffle the trainset rows.

ts = ts.shuffle()

Select a subset of the trainset based on index range.

ts = ts.range(0, 100)

Return the first n rows of the trainset.

ts = ts.head(5)

Model

Initialize a managed teacher model using the provider/model string.

teacher = osma.LanguageModel("gemini/gemini-1.5-flash")

Initialize a local student model.

student = osma.LanguageModel("google/gemma-2b", provider=osma.ModelProvider.LOCAL)

Generate output for a given signature and specific input arguments.

result = model(sg, text="example input")

Fine-tune the local model on the provided trainset.

student.train(trainset)

Evaluate the model on a test dataset using a scoring function.

results = student.evaluate(sg, eval_ds, lambda res, row: res['val'] == row['val'])

Project details

Release history Release notifications | RSS feed

This version

0.1.0

Jan 3, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

osma_ai-0.1.0.tar.gz (16.4 kB view details)

Uploaded Jan 3, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

osma_ai-0.1.0-py3-none-any.whl (19.4 kB view details)

Uploaded Jan 3, 2026 Python 3

File details

Details for the file osma_ai-0.1.0.tar.gz.

File metadata

Download URL: osma_ai-0.1.0.tar.gz
Upload date: Jan 3, 2026
Size: 16.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.9 {"installer":{"name":"uv","version":"0.9.9"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for osma_ai-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`00dc496e2ea26d4807b3860db11772a64c289e8e08e74ade0e0ddd265da563f4`
MD5	`dd2f82954cb66544905d107e62438c15`
BLAKE2b-256	`94922bdb7afce71405445aa63f06711870483f6b0910384ecd0d6f6cf403a979`

See more details on using hashes here.

File details

Details for the file osma_ai-0.1.0-py3-none-any.whl.

File metadata

Download URL: osma_ai-0.1.0-py3-none-any.whl
Upload date: Jan 3, 2026
Size: 19.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.9 {"installer":{"name":"uv","version":"0.9.9"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for osma_ai-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4f3a5d0fc2ad3f3d08fd44ec1ed7fc047269acf2900e3080b5958c412d191fdc`
MD5	`5b88a9732cff217508a20d565eb34249`
BLAKE2b-256	`7e1da528c084e8a0d320dfe5b07befda139dc23cdb73402f04b0a17dab9175c0`

See more details on using hashes here.

osma-ai 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

Features

Simple Example

Installation

Environment Variables

Key Methods

Dataset

Signature

Trainset

Model

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes