Skip to main content

General Purpose Tagger using GPT

Project description

Packaged with Poetry Code style: black Coverage

GPTagger :label:

GPT Tagger is a powerful text tagger that makes use of the GPT model. This tool allows you to extract tags from a given text by leveraging the capabilities of GPT. However, using GPT as a text tagger is not a trivial task. GPT has the tendency to generate non-existing, fabricated, or processed text. To mitigate this issue, GPT Tagger provides a reliable method to ensure that the generated tags are derived from the input text while allowing GPT to process the extracted tags to some extent.

Below is an example of how GPT may respond wrong.

Text: "I earn $1000 this week!"
Prompt: "Extract how much he/she earns"

# Non-existent text
GPT: "one thousand dollar"
# Make-up text
GPT: "$999999"
# Processed text
GPT: "$1,000"

Introduction

GPTagger Demo

These incorrect responses highlight the importance of using a reliable tag extraction tool like GPT Tagger. To do that, GPT Tagger follows a set of main steps:

  1. 🕵️‍♀️ Extraction: GPT Tagger sniffs out all possible tags by following your instructions to GPT.
  2. 🔍 Indexing: It spots the exact locations of these tags within the text.
  3. ✅ Validator: GPT Tagger's trusty validator steps in to validate if the extracted tags pass the rule-based and ML-based checks.

Check the example above how we extract ingredients from a yummy recipe text. 😋

Features ✨

Scale up GPT annotators and use switch between GPT3.5 and GPT4 easily

  • Want to have a higher precision? try using GPT-4!
  • Want to have a higher recall? Scale up the number of GPT annotators!

Instead of making a perfect prompt, use validator to shave off bad extractions

  • Simple validator: Length, Regex...
  • ML validator: GPT validator (Consider it like a chain of GPTs!)

How to Use 🚀

Setup

make install
export OPENAI_API_KEY=<your-key>

Pre-defined NER pipeline

The easiest way to dive into the GPT Tagger is through the Gradio web demo! Fire it up with a single command:

poetry run python GPTagger/app.py

If you prefer having the power of GPT Tagger at your fingertips in Python, check out this snippet:

from pathlib import Path
from GPTagger import *

cfg = NerConfig(
    tag_name='date',
    tag_regex=r"\d",
    tag_max_len=128,
)
prompt = PromptTemplate.from_template(Path('<path-to-prompt>').read_text())
pipeline = NerPipeline.from_config(cfg)

doc = Path('<path-to-doc>').read_text()
tags = pipeline(doc, prompt)

Build Custom Pipelines 🎉

We believe that the possibilities of using GPT as a text tagger are endless! We invite you to contribute your own custom pipelines. Together, we'll unlock the true potential of GPT Tagger and make text tagging an better experience.

Leave a star if you find GPTagger is useful for your product or company! 🌟

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gptagger-0.0.2.tar.gz (22.2 kB view details)

Uploaded Source

Built Distribution

gptagger-0.0.2-py3-none-any.whl (25.8 kB view details)

Uploaded Python 3

File details

Details for the file gptagger-0.0.2.tar.gz.

File metadata

  • Download URL: gptagger-0.0.2.tar.gz
  • Upload date:
  • Size: 22.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.0 CPython/3.9.9 Darwin/21.6.0

File hashes

Hashes for gptagger-0.0.2.tar.gz
Algorithm Hash digest
SHA256 eb4854585ac6f3c6f06aa57d135ce9d12d7e33b762e5808ecc39627167d935ed
MD5 57642954cc1320ede2e4bfb04955eeab
BLAKE2b-256 d8edae733ae1c4c5bc07692ddac658a4c7b9acc10a0b147987cb825ad1d55c9f

See more details on using hashes here.

File details

Details for the file gptagger-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: gptagger-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 25.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.0 CPython/3.9.9 Darwin/21.6.0

File hashes

Hashes for gptagger-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 52eca80cab8de4f2e9ea27b8ea8605614c0c4f7129883dde8242dfa30447bf1d
MD5 231ad1527f8078ef107bf821da7171cf
BLAKE2b-256 52d6654cd8ade5c195653e85c9a6fbb792ad3fa711c50319258f65f4b4c959f9

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page