Skip to main content

Library for performing trends analysis via LLMs.

Project description

Text Categorizer

Problem statement

When analyzing texts of any nature (being search keywords and YouTube video descriptions) it might be challenging to identify insights given unstructured nature of such texts.

Solution

text-categorizer uses power of large language models to perform categorization at your texts - simply provide a couple of seed examples and text-categorizer will do the rest.

Deliverable (implementation)

text-categorizer is implemented as a:

  • library - Use it in your projects with a help of TextCategorizer class.
  • CLI tool - text-categorizer tool is available to be used in the terminal.
  • HTTP endpoint - text-categorizer can be easily exposed as HTTP endpoint.
  • Langchain tool - integrated text-categorizer into your Langchain applications.

Deployment

Prerequisites

  • Python 3.11+

  • A GCP project with billing account attached

  • Service account created and service account key downloaded in order to write data to BigQuery.

    • Once you downloaded service account key export it as an environmental variable

      export GOOGLE_APPLICATION_CREDENTIALS=path/to/service_account.json
      
    • If authenticating via service account is not possible you can authenticate with the following command:

      gcloud auth application-default login
      
  • API key to access to access Google Gemini.
    • Once you created API key export it as an environmental variable

      export GOOGLE_API_KEY=<YOUR_API_KEY_HERE>
      

Installation

Install text-categorizer with pip install text-categorization command.

Usage

This section is focused on using text-categorizer as a CLI tool. Check library, http endpoint, langchain tool sections to learn more.

Once text-categorizer is installed you can call it:

text-categorizer 'text1' 'text2' 'text3' \
  --examples path/to/examples.txt \
  --llm gemini \
  --llm.model=gemini-1.5-flash \
  --output-type csv \
  --output-destination sample_results

where:

  • 'text1' 'text2' 'text3' - texts that needed to be categorized,
  • --examples path/to/examples.txt - path to examples (each line of the file should be formatted as text - category, i.e. dog - pet).
  • --llm gemini - type of large language model (currently only Google Gemini is supported)
  • --llm.model=gemini-1.5-flash - any parameters to initialize selected LLM
  • --output-type csv - type of output
  • --output-destination sample_results - name of output table or file.

--examples - might also come from BigQuery - simply pass full table name (project.dataset.table) as an example (Table should contains two columns - text and category).

Instead of passing texts as a parameters to text-categorizer you can provide --remote-texts flag - it will accept a full table name in BigQuery. You can combine passing texts and --remote-texts (Table should contain a column named text) .

Disclaimer

This is not an officially supported Google product.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

text-categorization-0.1.2.dev6.tar.gz (10.0 kB view details)

Uploaded Source

File details

Details for the file text-categorization-0.1.2.dev6.tar.gz.

File metadata

File hashes

Hashes for text-categorization-0.1.2.dev6.tar.gz
Algorithm Hash digest
SHA256 016b72d22e848dfbb7721f73fdad3283f6007bbc569a5c6d9a2c09be1e796362
MD5 19c1e7311dfec5a95b9f91a072e263c9
BLAKE2b-256 d26ce5c48f4a5df8dcd5a3d5f6a4cdb46c60b9b174f58b95fa87027c79527668

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page