Skip to main content

Performing thematic analysis with OpenAI's GPT-4 models

Project description

AutoThemeGenerator is a package that allows you to perform thematic analysis in qualitative studies using OpenAI's GPT models.

Documentation pypi package GitHub Source Code Colab Example

User inputs

Users are only required to specify the folder location where their interview transcripts are stored. Accepted formats of transcripts include PDF, .docx, and .txt (prefered). AutoThemeGenerator assumes that each document is a transcript of one interviewed participant.

Requirements

Required packages

To use AutoThemeGenerator, you are required to have the following packages installed:

  • openai

  • docx

  • tqdm

  • nltk

  • nltk.tokenize (submodule of nltk)

  • python-docx

  • textract

  • requests

  • zipfile (Python standard library)

  • shutil (Python standard library)

  • json (Python standard library)

  • pprint (Python standard library)

If you do not have these packages installed in python, you can do the following:

pip install openai==1.12.0 python-docx docx tqdm nltk textract requests

OpenAI API key

You also need an OpenAI key to be able to use this package. If you do not have one, you can apply for an OpenAI API key at platform.openai.com/api-keys.

pip version

The package could only be installed with version older than 24.1. Newer versions of pip will not work due to compatability issues with textract. To downgrade to a version older than 24.1, please do the following:

pip install "pip<24.1"

Installation

To install in python, simply do the following:

pip install AutoThemeGenerator

Quick Start

Here we provide a quick example on how you can execute AutoThemeGenerator to conveniently perform qualitative analysis from your transcript. For details towards each of the package's functions and parameters, refer to the documention.

from AutoThemeGenerator import analyze_and_synthesize_transcripts



# Specify the folders containing your transcript

# This is the folder containing transcripts in .docx, .PDF or .txt format

directory_path = "my_transcript_folder"

# specify your OpenAI API key

api_key = "<insert your API key>"

# specify the folder you wish to save your themes. 

save_results_path = "folder_of_my_saved_results"



# specify the context of your study

context = (

    "Physical inactivity is a major risk factor for developing several chronic illness. "

    "However, university students and staff in the UK are found to be more physically inactive "

    "compared the general UK population. "

    )

# specify your research questions

research_questions = (

    "This study seeks to understand the barriers and enablers "

    "of physical activity (PA) among university staff and students in "

    "the UK under the university setting, using the Theoretical "

    "Domain Framework (TDF) to guide the investigation. "

    )

# specify your survey script

survey_script = (

    "Knowledge\n "

    "What do you know about physical activity? How might you define physical activity? "

    "... ..." # note: truncated to save space

    "... ..." 

    )



# Analyze and synthesize transcripts

initial_themes, individual_synthesized_themes, overall_synthesized_themes = \

analyze_and_synthesize_transcripts(

    directory_path = directory_path, context = context,

    research_questions = research_questions, script = survey_script,

    api_key = api_key, save_results_path = save_results_path)



# display your study-level themes

print(overall_synthesized_themes)

You can now view the themes in the form of a topic sentence, a detailed explaination and a relevant quote

Citation

Yuyi Yang, Charles Alba, Chenyu Wang, Xi Wang, Jami Anderson, and Ruopeng An. "GPT Models Can Perform Thematic Analysis in Public Health Studies, Akin to Qualitative Researchers." Journal of Social Computing, vol. 5, no. 4, (2024): 293-312. doi: 10.23919/JSC.2024.0024

Questions?

Contact me at alba@wustl.edu

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

autothemegenerator-0.2.1.tar.gz (12.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

AutoThemeGenerator-0.2.1-py3-none-any.whl (12.0 kB view details)

Uploaded Python 3

File details

Details for the file autothemegenerator-0.2.1.tar.gz.

File metadata

  • Download URL: autothemegenerator-0.2.1.tar.gz
  • Upload date:
  • Size: 12.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.11.5

File hashes

Hashes for autothemegenerator-0.2.1.tar.gz
Algorithm Hash digest
SHA256 ad03ae1f6d2b47e89ddb1254e1b3121a7952cf5d5e8672e4da4b9de052a179bb
MD5 9530ba01a828576b7bd83816b11a18ff
BLAKE2b-256 ac08a1ddd0a85cdf086f449e8f1f797ca4d7b46db80776fe1eb90f0d9de0a0de

See more details on using hashes here.

File details

Details for the file AutoThemeGenerator-0.2.1-py3-none-any.whl.

File metadata

File hashes

Hashes for AutoThemeGenerator-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 b0f4fd205e1562ac0b3cfd11fa3a7a749216f26660605735565c700d1ac6a33f
MD5 cc96ed29820f0afa95b73c53b6a95461
BLAKE2b-256 c8acb0fbfdc0c20246d0b1e04b23459479a72fea3932f57f0651b1a70398bb84

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page