Skip to main content

build unstructured to structured data transformation pipelines

Project description

LabelKit - build and evaluate data labeling pipelines

A lightweight framework for building and evaluating data transformation and data extraction pipelines using LLMs. Designed for simplicity, rapid prototyping, evaluation and optimization.


Star us on Github!

Twitter Follow Downloads

LLMs can make your treasure trove of unstructured data useful if only you could transform it into structured, or extract key fields from it. Today, building LLM-powered pipelines is difficult because LLMs are unpredictable. Unlike traditional software, you can't simply write unit and integration tests that confirm the correctness of your code.

With LLMs you need a different approach: you need to evaluate your code on a dataset, and tune the code to find the right tradeoff between:

  • Accuracy
  • Cost
  • Latency

LabelKit is an extremely lightweight framework that helps you build these pipelines such that you can:

  • Easily run them on a dataset (not just a single data point)
  • Keep track of token usage, cost and latency
  • Evaluate accuracy against ground truth
  • Evaluate the correctness of each step in the pipeline
  • Easily parametrize each step (eg. model choice) so you can tune the paramters to optimize performance

Get Started

Installing LabelKit is a breeze. Simply run pip install labelkit in your terminal.

License

This project is licensed under the terms of the MIT License.

Contributors

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

labelkit-0.1.0.tar.gz (11.6 kB view hashes)

Uploaded Source

Built Distribution

labelkit-0.1.0-py3-none-any.whl (15.1 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page