Performing thematic analysis with OpenAI's GPT-4 models
Project description
AutoThemeGenerator is a package that allows you to perform thematic analysis in qualitative studies using OpenAI's GPT models.
User inputs
Users are only required to specify the folder location where their interview transcripts are stored. Accepted formats of transcripts include PDF
, .docx
, and .txt
(prefered). AutoThemeGenerator
assumes that each document is a transcript of one interviewed participant.
Requirements
Required packages
To use AutoThemeGenerator
, you are required to have the following packages installed:
-
openai
-
docx
-
tqdm
-
nltk
-
nltk.tokenize
(submodule ofnltk
) -
python-docx
-
textract
-
requests
-
zipfile
(Python standard library) -
shutil
(Python standard library) -
json
(Python standard library)
If you do not have this packages installed in python, you can do the following:
pip install openai==1.12.0 python-docx docx tqdm nltk textract requests
OpenAI API key
You also need an OpenAI key to be able to use this package. If you do not have one, you can apply for an OpenAI API key at platform.openai.com/api-keys.
Installation
To install in python, simply do the following:
pip install AutoThemeGenerator
Quick Start
Here we provide a quick example on how you can execute AutoThemeGenerator
to conveniently perform qualitative analysis from your transcript. For details towards each of the package's functions and parameters, refer to the documention.
from AutoThemeGenerator import analyze_and_synthesize_transcripts
# Specify the folders containing your transcript
# This is the folder containing transcripts in .docx, .PDF or .txt format
directory_path = "my_transcript_folder"
# specify your OpenAI API key
api_key = "<insert your API key>"
# specify the folder you wish to save your themes.
save_results_path = "folder_of_my_saved_results"
# specify the context of your study
context = (
"Physical inactivity is a major risk factor for developing several chronic illness. "
"However, university students and staff in the UK are found to be more physically inactive "
"compared the general UK population. "
)
# specify your research questions
research_questions = (
"This study seeks to understand the barriers and enablers "
"of physical activity (PA) among university staff and students in "
"the UK under the university setting, using the Theoretical "
"Domain Framework (TDF) to guide the investigation. "
)
# specify your survey script
survey_script = (
"Knowledge\n "
"What do you know about physical activity? How might you define physical activity? "
"... ..." # note: truncated to save space
"... ..."
)
# Analyze and synthesize transcripts
initial_themes, individual_synthesized_themes, overall_synthesized_themes = \
analyze_and_synthesize_transcripts(
directory_path = directory_path, context = context,
research_questions = research_questions, script = survey_script,
api_key = api_key, save_results_path = save_results_path)
# display your study-level themes
print(overall_synthesized_themes)
You can now view the themes in the form of a topic sentence, a detailed explaination and a relevant quote
Citation
Y Yang, C Alba, W Xi, M Li, C Wang, A Jami, R An. "GPT Models Can Perform Thematic Analysis in Public Health Studies, Akin to Qualitative Researchers" Working paper.
Questions?
Contact me at alba@wusl.edu
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file autothemegenerator-0.1.2.tar.gz
.
File metadata
- Download URL: autothemegenerator-0.1.2.tar.gz
- Upload date:
- Size: 11.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.11.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | efd28c7f862c9430e14ee9fec7423b7b278ebe60691ac075709a0d1a502be3f7 |
|
MD5 | 30c9c41bcbfa9891a7a300c1b6fb79a9 |
|
BLAKE2b-256 | b843fb250e8ff8a8b2094996cf4de701746474ba282dccba059de38a53f07bb7 |
File details
Details for the file AutoThemeGenerator-0.1.2-py3-none-any.whl
.
File metadata
- Download URL: AutoThemeGenerator-0.1.2-py3-none-any.whl
- Upload date:
- Size: 11.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.11.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3a576f8cb647bb55a071e0e906ce861d09129a22d6caac2a690b3a6c697aa433 |
|
MD5 | a2aacbabc4911a5efcca0f4ab8ab2b49 |
|
BLAKE2b-256 | c25a5eebe23d632dea0c639c36dcc644f4f8c47045f23de3e35ae429dd28ead7 |