Skip to main content

LLM Labeling UI is an open source project for large language model data labeling

Project description

LLM Labeling UI

total download version

LLM Labeling UI

About

WARNING: This software is mainly developed according to my personal habits and is still under development. We are not responsible for any data loss that may occur during your use.

LLM Labeling UI is a project fork from Chatbot UI, and made the following modifications to make it more suitable for large language model data labeling tasks.

  • The backend code is implemented in python, the frontend code is precompiled, so it can run without a nodejs environment
  • The Chatbot UI uses localStorage to save data, with a size limit of 5MB, the LLM Labeling UI can load local data when starting the service, with no size limit
  • Web interaction:
    • Browse data in pages, search by keywords, filter by messages count.
    • Directly modify/delete model's response results.
    • Split long conversations into multiple conversations
    • A confirmation button has been added before deleting the conversation message
    • Display the number of messages and token length in the current conversation
    • Allow modify system prompt during the dialogue
    • Replace string in current conversation
  • Useful command line tools to help you clean/manage your data, such as language cleaning, duplicate removal, embedding cluster, etc.

Quick Start

pip install llm-labeling-ui

1. Provide OpenAI API Key

You can provide openai api key before start server or configure it later in the web page.

export OPENAI_API_KEY=YOUR_KEY
export OPENAI_ORGANIZATION=YOUR_ORG

2. Start Server

llm-labeling-ui server start --data chatbot-ui-v4-format-history.json --tokenizer meta-llama/Llama-2-7b
  • --data: Chatbot-UI-v4 format, here is an example. Before the service starts, a chatbot-ui-v4-format-history.sqlite file will be created based on chatbot-ui-v4-format-history.json. All your modifications on the page will be saved into the sqlite file. If the chatbot-ui-v4-format-history.sqlite file already exists, it will be automatically read.
  • --tokenizer is used to display how many tokens the current conversation on the webpage contains. Please note that this is not the token consumed by calling the openai api.

Command Line Tools

  • cluster: Cluster operations, such as create embedding, run cluster, semantic deduplication, etc.
  • conversation: Conversation operations, such as remove prefix, remove deduplication, etc
  • tag: Add tags to you data, such as lang classification(en,zh..), traditional or simplified chinese classification, etc.

User --help to see more details, such as:

llm-labeling-ui cluster --help

Usage: llm-labeling-ui cluster [OPTIONS] COMMAND [ARGS]...

╭─ Options ────────────────────────────────────────────────────────────────────╮
│ --help          Show this message and exit.                                  │
╰──────────────────────────────────────────────────────────────────────────────╯
╭─ Commands ───────────────────────────────────────────────────────────────────╮
│ create-embedding  Create embedding                                           │
│ dedup             Delete redundant data in the same clustering result        │
│                   according to certain strategies.                           | prune-embedding   Remove embedding not exists in db                          | run               DBSCAN embedding cluster                                   │
│ view              View cluster result                                        │
╰──────────────────────────────────────────────────────────────────────────────

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm-labeling-ui-0.10.2.tar.gz (1.3 MB view details)

Uploaded Source

Built Distribution

llm_labeling_ui-0.10.2-py3-none-any.whl (4.1 MB view details)

Uploaded Python 3

File details

Details for the file llm-labeling-ui-0.10.2.tar.gz.

File metadata

  • Download URL: llm-labeling-ui-0.10.2.tar.gz
  • Upload date:
  • Size: 1.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.10

File hashes

Hashes for llm-labeling-ui-0.10.2.tar.gz
Algorithm Hash digest
SHA256 8dcf102fbdc19bf099ef9dc0c321817000f35ba05d29cb97b52972c001c29629
MD5 62fdfc6010f481537fbf743c6e0fbbb1
BLAKE2b-256 82f2b8c56affd41e21b7d4070b12542e9e935d0631ef5e0eef8e3f6c8854c436

See more details on using hashes here.

File details

Details for the file llm_labeling_ui-0.10.2-py3-none-any.whl.

File metadata

File hashes

Hashes for llm_labeling_ui-0.10.2-py3-none-any.whl
Algorithm Hash digest
SHA256 813fb40ab58736c11d302668e9045ee4cd21d4bae96a21b4d726930ab894e1c0
MD5 e7ff57bfa91d021a258a2375aec2da30
BLAKE2b-256 a3744bbc99a354d3cd1b419e96c25dc86952ba7cac665c2219f34e08083379c8

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page