Skip to main content

Train SetFit models with Autodistill

Project description

Autodistill SetFit Module

This repository contains the code supporting the SetFit target model trainer for use with Autodistill.

SetFit is a framework for fine-tuning Sentence Transformer models with a few examples of each class on which you want to train. SetFit is developed by Hugging Face.

Installation

To use the SetFit target model, you will need to install the following dependency:

pip3 install autodistill-setfit

Quickstart

The SetFit module takes in .jsonl files and trains a text classification model.

Each record in the JSONL file should have an entry called text that contains the text to be classified. The label entry should contain the ground truth label for the text. This format is returned by Autodistill base text classification models like the GPTClassifier.

Here is an example entry of a record used to train a research paper subject classifier:

{"title": "CC-GPX: Extracting High-Quality Annotated Geospatial Data from Common Crawl", "content": "arXiv:2405.11039v1 Announce Type: new \nAbstract: The Common Crawl (CC) corpus....", "classification": "natural language processing"}
from autodistill_setfit import SetFitModel

target_model = SetFitModel()

# train a model
target_model.train("./data.jsonl", output="model", epochs=5)

target_model = SetFitModel("model")

# run inference on the new model
pred = target_model.predict("Geospatial data.")

print(pred)
# geospatial

License

This project is licensed under an MIT license.

🏆 Contributing

We love your input! Please see the core Autodistill contributing guide to get started. Thank you 🙏 to all our contributors!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

autodistill_setfit-0.1.1.tar.gz (3.4 kB view details)

Uploaded Source

Built Distribution

autodistill_setfit-0.1.1-py3-none-any.whl (3.5 kB view details)

Uploaded Python 3

File details

Details for the file autodistill_setfit-0.1.1.tar.gz.

File metadata

  • Download URL: autodistill_setfit-0.1.1.tar.gz
  • Upload date:
  • Size: 3.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.7

File hashes

Hashes for autodistill_setfit-0.1.1.tar.gz
Algorithm Hash digest
SHA256 091d0725eae422492dec18a56040a0029618c744abea139166cd58996bda4a42
MD5 c3fbb74dbefcd038e1621189aac13ee0
BLAKE2b-256 1b07f21335d52cc7741bda813513d98fd8669cb0a1f835f7ef48d221e411e7ed

See more details on using hashes here.

File details

Details for the file autodistill_setfit-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for autodistill_setfit-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 a8d9ade91788ee5392ccd01c14744a59ca766146a8b0878863e0b540eace27c2
MD5 d7835e85186b64595ec16c4ce33c6c5e
BLAKE2b-256 87a997ca687159ff67d621ace85004e4211eb44f9e5119e270264a690f21a149

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page