Library for performing trends analysis via LLMs.
Project description
Text Categorizer
Problem statement
When analyzing texts of any nature (being search keywords and YouTube video descriptions) it might be challenging to identify insights given unstructured nature of such texts.
Solution
text-categorizer
uses power of large language models to perform categorization
at your texts - simply provide a couple of seed examples and text-categorizer
will do the rest.
Deliverable (implementation)
text-categorizer
is implemented as a:
- library - Use it in your projects with a help of
TextCategorizer
class. - CLI tool -
text-categorizer
tool is available to be used in the terminal. - HTTP endpoint -
text-categorizer
can be easily exposed as HTTP endpoint. - Langchain tool - integrated
text-categorizer
into your Langchain applications.
Deployment
Prerequisites
-
Python 3.11+
-
A GCP project with billing account attached
-
Service account created and service account key downloaded in order to write data to BigQuery.
-
Once you downloaded service account key export it as an environmental variable
export GOOGLE_APPLICATION_CREDENTIALS=path/to/service_account.json
-
If authenticating via service account is not possible you can authenticate with the following command:
gcloud auth application-default login
-
- API key to access to access Google Gemini.
-
Once you created API key export it as an environmental variable
export GOOGLE_API_KEY=<YOUR_API_KEY_HERE>
-
Installation
Install text-categorizer
with pip install text-categorization
command.
Usage
This section is focused on using
text-categorizer
as a CLI tool. Check library, http endpoint, langchain tool sections to learn more.
Once text-categorizer
is installed you can call it:
text-categorizer 'text1' 'text2' 'text3' \
--examples path/to/examples.txt \
--llm gemini \
--llm.model=gemini-1.5-flash \
--output-type csv \
--output-destination sample_results
where:
'text1' 'text2' 'text3'
- texts that needed to be categorized,--examples path/to/examples.txt
- path to examples (each line of the file should be formatted as text - category, i.e. dog - pet).--llm gemini
- type of large language model (currently only Google Gemini is supported)--llm.model=gemini-1.5-flash
- any parameters to initialize selected LLM--output-type csv
- type of output--output-destination sample_results
- name of output table or file.
--examples
- might also come from BigQuery - simply pass full table name
(project.dataset.table) as an example (Table should contains two columns -
text and category).
Instead of passing texts as a parameters to text-categorizer
you can provide
--remote-texts
flag - it will accept a full table name in BigQuery. You can combine
passing texts and --remote-texts
(Table should contain a column named text) .
Disclaimer
This is not an officially supported Google product.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file text-categorization-0.1.1.tar.gz
.
File metadata
- Download URL: text-categorization-0.1.1.tar.gz
- Upload date:
- Size: 9.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d337ece1a9b5e2357e1bee7f13f31e7c5e698bdf7b6c63fe6a26a8d7d551de25 |
|
MD5 | 4cf8817a1922359995770f2d291fb4e7 |
|
BLAKE2b-256 | ce2949384adb8c087e556038bf568f38dd38c351028fe1f86b02d27090964e6a |