LLM based text categorization tool
Project description
Categorizer
Categorizer is a simple tool which you can use to categorise your string records into predefined -nested- categories using the power of LLMs.
- upload your categories and subcategories ( read from yaml file or create them on the fly )
- initialize LEC. You can use different modes and your own task-specific prompts if you like
- Run the LEC and it will output a dataframe with all categories and subcategories and llm's reasoning to select them
- It is possible to leave notes for LLM for each category to help him categorize with more accuracy
- You can also use included naive classification method which supports regex based or keyword matching mechanism to reduce the LLM compute
Here is some benchmarking you to understand better.
category depth category combination size allowed retry
Number of Records | Main Model | Refiner Model | Categorization Mode | Batch Prompting | Accuracy | Total Time | Avg Token | CPU Type | GPU Type |
---|---|---|---|---|---|---|---|---|---|
1000 | Model A | Refiner X | Mode 1 | Yes | 92.5% | 10 mins | 512 | Intel Xeon E5-2670 | NVIDIA Tesla K80 |
2000 | Model B | Refiner Y | Mode 2 | No | 89.0% | 20 mins | 1024 | Intel Xeon E5-2680 | NVIDIA Tesla V100 |
5000 | Model C | Refiner Z | Mode 3 | Yes | 94.7% | 50 mins | 768 | AMD EPYC 7742 | NVIDIA A100 |
10000 | Model D | Refiner W | Mode 4 | No | 88.3% | 1 hr 40 mins | 2048 | Intel Xeon E5-2690 | NVIDIA RTX 3090 |
Usage
lec = LLMEnhancedClassifier(
llm_model=llm_model,
llm_refiner_model=llm_refiner,
categories_yaml_path='categories.yaml',
meta_patterns_yaml_path='bank_patterns.yaml',
subcategory_level=2 # Change this value to set the number of subcategories (max 4)
)
lec.load_records(df)
df = lec.classify_lvl_by_lvl()
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
dataimputer-0.0.2.tar.gz
(2.3 kB
view details)
Built Distribution
File details
Details for the file dataimputer-0.0.2.tar.gz
.
File metadata
- Download URL: dataimputer-0.0.2.tar.gz
- Upload date:
- Size: 2.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.20
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3a379e4c90137b06b8df85ed0a0f208f347fd774e7b5831e986190b5254c1b71 |
|
MD5 | 036084f73db8eef54733ee2c188ca71d |
|
BLAKE2b-256 | 5f276fc2a582ef36dacaf1d7debbd20fa35998d284d1a17b6e32934721a7f52b |
File details
Details for the file dataimputer-0.0.2-py3-none-any.whl
.
File metadata
- Download URL: dataimputer-0.0.2-py3-none-any.whl
- Upload date:
- Size: 2.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.20
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 75f84fc0411247829d6426f30b25478717340712e41b1872989616b825ff3008 |
|
MD5 | e28246507efbff8036e336e50426dad3 |
|
BLAKE2b-256 | 2516456f7f6e3ae9e2365d79a99437fdfbf8e633be3b4eb4bc1bf9472073da65 |