Skip to main content

LLM based text categorization tool

Project description

LLM Enhanced Categorizer (LLMEC)

LEC is simple tool which you can use to categorise your string records into predefined -nested- categories using the power of LLMs.

  • upload your categories and subcategories ( read from yaml file or create them on the fly )
  • initialize LEC. You can use different modes and your own task-specific prompts if you like
  • Run the LEC and it will output a dataframe with all categories and subcategories and llm's reasoning to select them
  • It is possible to leave notes for LLM for each category to help him categorize with more accuracy
  • You can also use included naive classification method which supports regex based or keyword matching mechanism to reduce the LLM compute

Here is some benchmarking you to understand better.

category depth category combination size allowed retry

Number of Records Main Model Refiner Model Categorization Mode Batch Prompting Accuracy Total Time Avg Token CPU Type GPU Type
1000 Model A Refiner X Mode 1 Yes 92.5% 10 mins 512 Intel Xeon E5-2670 NVIDIA Tesla K80
2000 Model B Refiner Y Mode 2 No 89.0% 20 mins 1024 Intel Xeon E5-2680 NVIDIA Tesla V100
5000 Model C Refiner Z Mode 3 Yes 94.7% 50 mins 768 AMD EPYC 7742 NVIDIA A100
10000 Model D Refiner W Mode 4 No 88.3% 1 hr 40 mins 2048 Intel Xeon E5-2690 NVIDIA RTX 3090

Usage

lec = LLMEnhancedClassifier(
        llm_model=llm_model,
        llm_refiner_model=llm_refiner,
        categories_yaml_path='categories.yaml',
        meta_patterns_yaml_path='bank_patterns.yaml',
        subcategory_level=2  # Change this value to set the number of subcategories (max 4)
    )
    
lec.load_records(df)
 df = lec.classify_lvl_by_lvl()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

categorizer-0.0.1.tar.gz (19.1 kB view details)

Uploaded Source

Built Distribution

categorizer-0.0.1-py3-none-any.whl (20.7 kB view details)

Uploaded Python 3

File details

Details for the file categorizer-0.0.1.tar.gz.

File metadata

  • Download URL: categorizer-0.0.1.tar.gz
  • Upload date:
  • Size: 19.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.20

File hashes

Hashes for categorizer-0.0.1.tar.gz
Algorithm Hash digest
SHA256 692ba9c783892b31cd36e9b5b164de620ea4605a01d6319ed553f157db0a7623
MD5 1e3367befb5757695e76dc5e16710797
BLAKE2b-256 d0b50ab5e380cfcf8bd580f0dee9147be943bec8f7cd6a6ea5f498cf51942a77

See more details on using hashes here.

File details

Details for the file categorizer-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: categorizer-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 20.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.20

File hashes

Hashes for categorizer-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 872e11ea1c9d2ff63a895d368595247f5931f99e4ba246c3f51ea9ca007639eb
MD5 2be9f2e33c994aee11d01759036de78b
BLAKE2b-256 106d843a54d99823e3c71513e836ccf54487e026d86b1383c203ef0857d48d7d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page