Skip to main content

THE GABRIEL library for numerical analysis of texts in the social sciences.

Project description

The Generalized Attribute Based Ratings Information Extraction Library (GABRIEL)

Description

GABRIEL is a simple Python framework built on top of LLMs like ChatGPT to simplify quantitative text analysis in the social sciences.

IMPORTANT: Follow this Colab tutorial notebook for the easiest setup guide: https://colab.research.google.com/drive/1tshfY-2al7asU7pTFvFFg1n4NSvLXtZg?usp=sharing

The full documentation is below.

Installation

The new Python library replaces the previous API and dramatically simplifies the use of the package. Installation is extremely simple using pip.

Before you install our package, we require that you install the OpenAI library. Open your terminal or command prompt and run:

pip install openai

Once you have installed OpenAI, install GABRIEL using

pip install gabriel-ratings

Use

Simple ratings framework

The main way to get ratings from GABRIEL is using the Archangel class. The class requires an OpenAI api-key for instantiation. We strongly recommend you store the key as an environment variable. To create an Archangel object, use the following syntax.

from GABRIEL.Archangel import Archangel
combined_assistant = Archangel(your_api_key)

Once you create the object, you can run a simple ratings framework through the rate_texts function. You must supply a list of the texts to rate, texts; an attributes_dict, where the keys are your attributes, and the values are the definitions, and a task_description, which is a few sentence description of what you're trying to acccomplish (your data, your question, etc.). In addition, we require a save_folder and a file_name, which is where the output from your run will be saved.

You can also specify a specific OpenAI model for your call, using the model parameter (the default is GPT-3.5-turbo). See below for the full list of parameters, and more detailed descriptions.

The simplest ratings call, which returns a Pandas dataframe, is just:

ratings = archangel.rate_texts(texts, attribute_dict= attribute_dict, save_folder = 'path_to_your_folder', file_name = 'your_file_name.csv', task_description = 'your_task_description')

Features

The Archangel class comes with a number of easy to use features to help you run your code.

  • parallelization: the library parallelizes API calls to dramatically speed up running time. We configure this by default.
  • cost estimates: we provide a very rough cost estimate of each run when you begin the call, based on the model and texts you input.
  • auto-saving: the class will auto-save your results to a CSV at each iteration, as long as you provide a valid path.

Preset classes

To simplify the task of choosing your hyperparameters, we provide two default options:

  • 'mazda': cheap, fast, and reliable. Uses GPT-3.5-turbo, with text truncation to 9500 words to allow for prompts. Runs 50 queries in parallel.
  • 'tesla': expensive. Uses GPT-4-turbo, with 30 parallel queries. Not recommended due to cost.

Function parameters

The full list of parameters for the function is as follows.

  • attribute_dict A dictionary where the keys are the attributes you want to evaluate, and the values are the descriptions. See Colab notebook for examplse.
  • truncate (optional, defaults to True): Whether to truncate the text. This avoids overloading the API token limit (16k tokens for the default model). -truncate_len (optional, defaults to 5000) the amount of text to keep.
  • project_probs (optional, defaults to False): Whether to project the probabilities from 0 to 100 to a 0 to 1 scale.
  • api_key (mandatory): Your OpenAI API key.
  • model (optional): Backend model, default = gpt-3.5-turbo-1106.
  • seed (optional, RECOMMENDED): Set a seed for cross-run replicability. For instance, seed = 0.

Citation

Please cite the project using:

The Generalized Attribute Based Ratings Information Extraction Library (GABRIEL). Hemanth Asirvatham and Elliott Mokski (2023). https://github.com/elliottmokski/GABRIEL-distribution.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gabriel_ratings-0.1.5.tar.gz (37.2 kB view details)

Uploaded Source

Built Distribution

GABRIEL_ratings-0.1.5-py3-none-any.whl (40.3 kB view details)

Uploaded Python 3

File details

Details for the file gabriel_ratings-0.1.5.tar.gz.

File metadata

  • Download URL: gabriel_ratings-0.1.5.tar.gz
  • Upload date:
  • Size: 37.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.8.5

File hashes

Hashes for gabriel_ratings-0.1.5.tar.gz
Algorithm Hash digest
SHA256 1fe39d6bcde30c9c4bdee91b808ad83c6dd585b0efc0b2b9d3fdc134c89d23fe
MD5 b7765b51e4b3eb7a1f9d41ec2a168c29
BLAKE2b-256 d1a7137381808c4afa79f3b6a268b3459e88fc1b920f6766dd161b497b575e32

See more details on using hashes here.

File details

Details for the file GABRIEL_ratings-0.1.5-py3-none-any.whl.

File metadata

File hashes

Hashes for GABRIEL_ratings-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 8603afd0193199925a98737d4205c93685c68b024a31912c3cb66562624e8454
MD5 25cc7f20f3c8785b2b2b0b8cc1e984ae
BLAKE2b-256 888f7c52521cc48dc044fea337fe7a710ff7a993d391b6c59cba8cd55f07d6be

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page