build unstructured to structured data transformation pipelines
Project description
LabelKit - build and evaluate data labeling pipelines
A lightweight framework for building and evaluating data transformation and data extraction pipelines using LLMs. Designed for simplicity, rapid prototyping, evaluation and optimization.
LLMs can make your treasure trove of unstructured data useful if only you could transform it into structured, or extract key fields from it. Today, building LLM-powered pipelines is difficult because LLMs are unpredictable. Unlike traditional software, you can't simply write unit and integration tests that confirm the correctness of your code.
With LLMs you need a different approach: you need to evaluate your code on a dataset, and tune the code to find the right tradeoff between:
- Accuracy
- Cost
- Latency
LabelKit is an extremely lightweight framework that helps you build these pipelines such that you can:
- Easily run them on a dataset (not just a single data point)
- Keep track of token usage, cost and latency
- Evaluate accuracy against ground truth
- Evaluate the correctness of each step in the pipeline
- Easily parametrize each step (eg. model choice) so you can tune the parameters to optimize performance
Get Started
Installing LabelKit is a breeze. Simply run pip install labelkit
in your terminal.
License
This project is licensed under the terms of the MIT License.
Contributors
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for superpipe_py-0.1.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 77219b04b37f0caee5cc7874cada7782a6a6a241326126262b6b6fc83c85a04a |
|
MD5 | 63e71db9221f895a98169c6bdc543f1c |
|
BLAKE2b-256 | 7bbb93c58b653a674bc8ba574c5df97adffa9dc7055ac2b2ec1c17da89d30878 |