Skip to main content

Qualitative Research support tools in Python!

Project description

QRMine

/ˈkärmīn/

QRMine is a suite of qualitative research (QR) data mining tools in Python using Natural Language Processing (NLP) and Machine Learning (ML). QRMine is work in progress. Read More..

What it does

NLP

  • Lists common categories for open coding.
  • Create a coding dictionary with categories, properties and dimensions.
  • Topic modelling.
  • Arrange docs according to topics.
  • Compare two documents/interviews.
  • Select documents/interviews by sentiment, category or title for further analysis.
  • Sentiment analysis
  • Network analysis
  • Co-citation finder

ML

  • Accuracy of a neural network model trained using the data
  • Confusion matrix from an support vector machine classifier
  • K nearest neighbours of a given record
  • K-Means clustering
  • Principal Component Analysis (PCA)
  • Association rules

How to install

pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.1.0/en_core_web_sm-2.1.0.tar.gz#egg=en_core_web_sm

pip install qrmine

How to Use

  • input files are transcripts as txt files and a single csv file with numeric data. The output txt file can be specified.

  • The coding dictionary, topics and topic assignments can be created from the entire corpus (all documents) using the respective command line options.

  • Categories (concepts), summary and sentiment can be viewed for entire corpus or specific titles (documents) specified using the --titles switch. Sentence level sentiment output is possible with the --sentence flag.

  • You can filter documents based on sentiment, titles or categories and do further analysis, using --filters or -f

  • Many of the ML functions like neural network takes a second argument (-n) . In nnet -n signifies the number of epochs, number of clusters in kmeans, number of factors in pca, and number of neighbours in KNN. KNN also takes the --rec or -r argument to specify the record.

  • Variables from csv can be selected using --titles (defaults to all). The first variable will be ignored (index) and the last will be the DV (dependant variable).

Command-line options

pythom -m qrmine --help
Command Alternate Description
--inp -i Input file in the text format with Topic
--out -o Output file name
--csv csv file name
--num -n N (clusters/epochs etc depending on context)
--rec -r Record (based on context)
--titles -t Document(s) title(s) to analyze/compare
--codedict Generate coding dictionary
--topics Generate topic model
--assign Assign documents to topics
--cat List categories of entire corpus or individual docs
--summary Generate summary for entire corpus or individual docs
--sentiment Generate sentiment score for entire corpus or individual docs
--nlp Generate all NLP reports
--sentence Generate sentence level scores when applicable
--nnet Display accuracy of a neural network model -n epochs(3)
--svm Display confusion matrix from an svm classifier
--knn Display nearest neighbours -n neighbours (3)
--kmeans Display KMeans clusters -n clusters (3)
--cart Display Association Rules
--pca Display PCA -n factors (3)

Use it in your code

from qrmine import Content
from qrmine import Network
from qrmine import Qrmine
from qrmine import ReadData
from qrmine import Sentiment
from qrmine import MLQRMine
  • More instructions and a jupyter notebook available here.

Input file format

NLP

Individual documents or interview transcripts in a single text file separated by Topic. Example below

Transcript of the first interview with John.
Any number of lines
<break>First_Interview_John</break>

Text of the second interview with Jane.
More text.
<break>Second_Interview_Jane</break>

....

Multiple files are suported, each having only one break tag at the bottom with the topic. (The tag may be renamed in the future)

ML

A single csv file with the following generic structure.

  • Column 1 with identifier. If it is related to a text document as above, include the title.
  • Last column has the dependent variable (DV). (NLP algorithms like the topic asignments may provide the DV)
  • All independent variables (numerical) in between.
index, obesity, bmi, exercise, income, bp, fbs, has_diabetes
1, 0, 29, 1, 12, 120, 89, 1
2, 1, 32, 0, 9, 140, 92, 0
......

Author

Citation

Please cite QRMine in your publications if it helped your research. Here is an example BibTeX entry:


@misc{eapenbr2019qrmine,
  title={QRMine -Qualitative Research Tools in Python.},
  author={Eapen, Bell Raj and contributors},
  year={2019},
  publisher={GitHub},
  journal = {GitHub repository},
  howpublished={\url{https://github.com/dermatologist/qrmine}}
}

Publication with the theoretical foundations of this tool is being worked on. QRMine is inspired by this work and the associated paper.

Demo

QRMine

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

qrmine-3.2.0-py2.py3-none-any.whl (27.6 kB view details)

Uploaded Python 2Python 3

File details

Details for the file qrmine-3.2.0-py2.py3-none-any.whl.

File metadata

  • Download URL: qrmine-3.2.0-py2.py3-none-any.whl
  • Upload date:
  • Size: 27.6 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.40.2 CPython/3.7.4

File hashes

Hashes for qrmine-3.2.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 8af1f63c7f5723484de285c55ac11f3b0b8f07a335fb86000ced6c5a994d4c68
MD5 fb3e29adc7a63c7c40c8a4b9325217a1
BLAKE2b-256 60080a19519bf6f9a34584e4a4fc7e80cc7cb4987be448fbe3a52ba006b79e17

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page