Library for performing trends analysis via LLMs.

These details have not been verified by PyPI

Project description

Text Categorizer

Problem statement

When analyzing texts of any nature (being search keywords and YouTube video descriptions) it might be challenging to identify insights given unstructured nature of such texts.

Solution

text-categorizer uses power of large language models to perform categorization at your texts - simply provide a couple of seed examples and text-categorizer will do the rest.

Deliverable (implementation)

text-categorizer is implemented as a:

library - Use it in your projects with a help of TextCategorizer class.
CLI tool - text-categorizer tool is available to be used in the terminal.
HTTP endpoint - text-categorizer can be easily exposed as HTTP endpoint.
Langchain tool - integrated text-categorizer into your Langchain applications.

Deployment

Prerequisites

Python 3.11+
A GCP project with billing account attached
Service account created and service account key downloaded in order to write data to BigQuery.
- Once you downloaded service account key export it as an environmental variable
```
export GOOGLE_APPLICATION_CREDENTIALS=path/to/service_account.json
```
- If authenticating via service account is not possible you can authenticate with the following command:
```
gcloud auth application-default login
```

API key to access to access Google Gemini.
- Once you created API key export it as an environmental variable
```
export GOOGLE_API_KEY=<YOUR_API_KEY_HERE>
```

Installation

Install text-categorizer with pip install text-categorization command.

Usage

This section is focused on using text-categorizer as a CLI tool. Check library, http endpoint, langchain tool sections to learn more.

Once text-categorizer is installed you can call it:

text-categorizer 'text1' 'text2' 'text3' \
  --examples path/to/examples.txt \
  --llm gemini \
  --llm.model=gemini-1.5-flash \
  --output-type csv \
  --output-destination sample_results

where:

'text1' 'text2' 'text3' - texts that needed to be categorized,
--examples path/to/examples.txt - path to examples (each line of the file should be formatted as text - category, i.e. dog - pet).
--llm gemini - type of large language model (currently only Google Gemini is supported)
--llm.model=gemini-1.5-flash - any parameters to initialize selected LLM
--output-type csv - type of output
--output-destination sample_results - name of output table or file.

--examples - might also come from BigQuery - simply pass full table name (project.dataset.table) as an example (Table should contains two columns - text and category).

Instead of passing texts as a parameters to text-categorizer you can provide --remote-texts flag - it will accept a full table name in BigQuery. You can combine passing texts and --remote-texts (Table should contain a column named text) .

Disclaimer

This is not an officially supported Google product.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.2.dev8 pre-release

Aug 30, 2024

0.1.2.dev7 pre-release

Aug 28, 2024

0.1.2.dev6 pre-release

Aug 27, 2024

0.1.2.dev5 pre-release

Aug 5, 2024

0.1.2.dev4 pre-release

Aug 2, 2024

0.1.2.dev3 pre-release

Aug 2, 2024

0.1.2.dev2 pre-release

Aug 2, 2024

0.1.2.dev1 pre-release

Aug 2, 2024

0.1.1

Jul 29, 2024

0.1.1.dev0 pre-release

Jul 26, 2024

0.1.0

Jul 26, 2024

0.1.0.dev6 pre-release

Jul 25, 2024

0.1.0.dev5 pre-release

Jul 25, 2024

0.1.0.dev4 pre-release

Jul 25, 2024

0.1.0.dev3 pre-release

Jul 25, 2024

0.1.0.dev2 pre-release

Jul 25, 2024

0.1.0.dev1 pre-release

Jul 25, 2024

0.1.0.dev0 pre-release

Jul 25, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

text-categorization-0.1.2.dev8.tar.gz (10.1 kB view details)

Uploaded Aug 30, 2024 Source

File details

Details for the file text-categorization-0.1.2.dev8.tar.gz.

File metadata

Download URL: text-categorization-0.1.2.dev8.tar.gz
Upload date: Aug 30, 2024
Size: 10.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.11.9

File hashes

Hashes for text-categorization-0.1.2.dev8.tar.gz
Algorithm	Hash digest
SHA256	`1036306babb40947de02ddc5f8c261ec87058d68215d51860b5b7f1fbeca4cf1`
MD5	`f8cdaff00156de6594730383221bcca9`
BLAKE2b-256	`6749d0e66f814e0a11a86ea62c6f077e1a3a3379ad8732e5e77c3b3fc04427cd`