Skip to main content

build unstructured to structured data transformation pipelines

Project description

LabelKit - build and evaluate data labeling pipelines

A lightweight framework for building and evaluating data transformation and data extraction pipelines using LLMs. Designed for simplicity, rapid prototyping, evaluation and optimization.


Star us on Github!

Twitter Follow Downloads

LLMs can make your treasure trove of unstructured data useful if only you could transform it into structured, or extract key fields from it. Today, building LLM-powered pipelines is difficult because LLMs are unpredictable. Unlike traditional software, you can't simply write unit and integration tests that confirm the correctness of your code.

With LLMs you need a different approach: you need to evaluate your code on a dataset, and tune the code to find the right tradeoff between:

  • Accuracy
  • Cost
  • Latency

LabelKit is an extremely lightweight framework that helps you build these pipelines such that you can:

  • Easily run them on a dataset (not just a single data point)
  • Keep track of token usage, cost and latency
  • Evaluate accuracy against ground truth
  • Evaluate the correctness of each step in the pipeline
  • Easily parametrize each step (eg. model choice) so you can tune the paramters to optimize performance

Get Started

Installing LabelKit is a breeze. Simply run pip install labelkit in your terminal.

License

This project is licensed under the terms of the MIT License.

Contributors

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

labelkit-0.1.0.tar.gz (11.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

labelkit-0.1.0-py3-none-any.whl (15.1 kB view details)

Uploaded Python 3

File details

Details for the file labelkit-0.1.0.tar.gz.

File metadata

  • Download URL: labelkit-0.1.0.tar.gz
  • Upload date:
  • Size: 11.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.1 CPython/3.10.12 Darwin/23.2.0

File hashes

Hashes for labelkit-0.1.0.tar.gz
Algorithm Hash digest
SHA256 5f33b58a8c5670e810d01ad2bc2558a0601a80393476d269c1ce7dc56ccc5aeb
MD5 f6762958f3f3cb309c35937bdffc43a8
BLAKE2b-256 40a81e6af9e2f755146228a02162ece09f10f6c674aae30da50481d26ee6db8b

See more details on using hashes here.

File details

Details for the file labelkit-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: labelkit-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 15.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.1 CPython/3.10.12 Darwin/23.2.0

File hashes

Hashes for labelkit-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 86e1f019610e3fb9b929e5f69635ffd2eae1d0490a583b188b3b689138a1258f
MD5 87066692edb76e06560d7cda065e9c88
BLAKE2b-256 1a73bd6ddb5e35e7838bf4f0427c05728ce88d48333190a1e08ce9eb01791139

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page