iQual is a package that leverages natural language processing to scale up interpretative qualitative analysis. It also provides methods to assess the bias, interpretability and efficiency of the machine-enhanced codes.

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

iQual

This repository contains the code and resources necessary to implement the techniques described in the paper A Method to Scale-Up Interpretative Qualitative Analysis, with an Application to Aspirations in Cox’s Bazaar, Bangladesh. The iQual package is designed for qualitative analysis of open-ended interviews and aims to extend a small set of interpretative human-codes to a much larger set of documents using natural language processing. The package provides a method for assessing the robustness and reliability of this approach. The iQual package has been applied to analyze 2,200 open-ended interviews on parent’s aspirations for children from Rohingya refugees and their Bangladeshi hosts in Cox’s Bazaar, Bangladesh. It draws on work in anthropology and philosophy to expand conceptions of aspirations in economics to distinguish between material goals, moral and religious values, and navigational capacity—the ability to achieve goals and aspirations, showing that they have very different correlates.

With iQual, researchers can efficiently analyze large amounts of qualitative data while maintaining the nuance and accuracy of human interpretation.

Installation

To install iQual using pip, use the following command:

pip install -U iQual

Alternatively, you can install iQual from source. To do so, use the following commands:

git clone https://github.com/worldbank/iQual.git
cd iQual
pip install -e .

Dependencies

iQual requires Python 3.7+ and the following dependencies:

pandas

scikit-learn

sentence-transformers

spaCy

numpy

umap-learn

scipy

statsmodels

simplejson

matplotlib

seaborn

Features

iQual is a package designed for qualitative analysis of open-ended interviews. It allows researchers to efficiently analyze large amounts of qualitative data while maintaining the nuance and accuracy of human interpretation.

Customizable pipelines using scikit-learn pipelines
Text-vectorization using:
- Any of the scikit-learn text feature extraction method.
- Any sentence-transformers compatible model.
- Any spaCy model with a doc.vector attribute.
Classification using any scikit-learn classification method
Feature Transformation:
- Dimensionality reduction using any scikit-learn decomposition method, or UMAP using umap-learn.
- Feature scaling using any scikit-learn preprocessing method.
Model selection and performance evaluation using scikit-learn methods.
Model performance evaluation using scikit-learn metrics.
Tests for bias and interpretability, with statsmodels.

Basic Usage

The following code demonstrates the basic usage of the iQual package. It shows how to construct a pipeline, fit it to the data, and use it to classify new data.

Import the iqual package and initiate the model class.

from iqual import iqualnlp     # Import `iqualnlp` from the `iqual` package

iqual_model = iqualnlp.Model() # Initiate the model class

Add text features to the model. The add_text_features method takes the following arguments:

question_col: The name of the column containing the question text.
answer_col: The name of the column containing the answer text.
model: Name of a scikit-learn, spaCy, sentence-transformers, or a precomputed vector (picklized dictionary) model. The default is TfidfVectorizer.
env: The environment or package which is being used. The default is scikit-learn. Available options are scikit-learn, spacy, sentence-transformers, and saved-dict.
**kwargs: Additional keyword arguments to pass to the model.

# Use a scikit-learn feature extraction method
iqual_model.add_text_features(question_col,answer_col,model='TfidfVectorizer',env='scikit-learn') 

# OR - Use a sentence-transformers model
iqual_model.add_text_features(question_col,answer_col,model='all-mpnet-base-v2',env='sentence-transformers') 

# OR - Use a spaCy model
iqual_model.add_text_features(question_col,answer_col,model='en_core_web_lg',env='spacy') 

# OR - Use a precomputed vector (picklized dictionary)
iqual_model.add_text_features(question_col,answer_col,model='qa_precomputed.pkl',env='saved-dict')

(OPTIONAL) Add a feature transformation layer. The add_feature_transformer method takes the following arguments:

name: The name of the feature transformation layer.
transformation: The type of transformation. Available options are FeatureScaler and DimensionalityReduction.

To add a feature scaling layer, use the following code:

iqual_model.add_feature_transformer(name='Normalizer', transformation="FeatureScaler") # or any other scikit-learn scaler

To add a dimensionality reduction layer, use the following code:

iqual_model.add_feature_transformer(name='UMAP', transformation="DimensionalityReduction") # supports UMAP or any other scikit-learn decomposition method

Add a classifier layer. The add_classifier method takes the following arguments:

name: The name of the classifier layer. The default is LogisticRegression.
**kwargs: Additional keyword arguments to pass to the classifier.

iqual_model.add_classifier(name = "LogisticRegression") #  Add a classifier layer from scikit-learn

(OPTIONAL) Add a threshold layer for the classifier using add_threshold

iqual_model.add_threshold() # Add a threshold layer for the classifier, recommended for imbalanced data

Compile the model with compile.

iqual_model.compile() # Compile the model

Fit the model to the data using fit. The fit method takes the following arguments:

X_train: The training data. (pandas dataframe)
y_train: The training labels. (pandas series)

iqual_model.fit(X_train,y_train) # Fit the model to the data

Predict the labels for new data using predict. The predict method takes the following arguments:

X_test: The test data. (pandas dataframe)

y_pred = iqual_model.predict(X_test) # Predict the labels for new data

For examples on cross-validation fitting, model selection & performance evaluation, bias, interpretability and measurement tests, refer to the notebooks folder.

Notebooks

The notebooks folder contains detailed examples on using iQual. The notebooks are organized into the following categories:

Basic Modelling These notebooks demonstrates the basic usage of the package, the pipeline construction, and the vectorization and classification options.
Advanced Modelling These notebooks demonstrate advanced pipeline construction, mixing and matching of feature extraction and classification methods, and model selection.
Interpretability These notebooks demonstrate the interpretability and related tests for measurement and comparison of interpretability across human and enhanced (machine + human) codes.
Bias and Efficiency These notebooks demonstrate the bias and efficiency tests for determining the value and validity of enhanced codes.

Citation & Authors

If you use this package, please cite the following paper:

A Method to Scale-Up Interpretative Qualitative Analysis, with an Application to Aspirations in Cox’s Bazaar, Bangladesh

Ashwin,Julian; Rao,Vijayendra; Biradavolu,Monica Rao; Chhabra,Aditya; Haque,Arshia; Khan,Afsana Iffat; Krishnan,Nandini.
A Method to Scale-Up Interpretative Qualitative Analysis, with an Application to Aspirations in Cox’s Bazaar, Bangladesh (English). (Policy Research Working Paper No. WPS 10046)
Paper is funded by the Knowledge for Change Program (KCP) Washington, D.C. : World Bank Group.
http://documents.worldbank.org/curated/en/099759305162210822/IDU0a357362e00b6004c580966006b1c2f2e3996

Maintainers

Please contact the following people for any queries regarding the package:

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

This version

0.1.2

Feb 22, 2023

0.1.1 yanked

Feb 22, 2023

Reason this release was yanked:

Partial upload

0.1.0 yanked

Feb 18, 2023

Reason this release was yanked:

Major bug, most modules unusable

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

iQual-0.1.2-py3-none-any.whl (23.1 kB view hashes)

Uploaded Feb 22, 2023 Python 3

Hashes for iQual-0.1.2-py3-none-any.whl

Hashes for iQual-0.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a14e73febf7b027b0800b5e0b12cabe5fbde6e143acf784d871275c78d2b9a65`
MD5	`c0c17a1fb672406564fc186f1fbe3e8e`
BLAKE2b-256	`b8c2a7317940e0732e8de9f61e91312c9b941dd583642b9fbab49494fc8eff3c`