Qualitative coding tools for computer scientists

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: GNU Affero General Public License v3
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

Qualitative Coding

Qualitative coding for computer scientists.

Qualitative coding is a form of feature extraction in which text (or images, video, etc.) is tagged with features of interest. Sometimes the codebook is defined ahead of time, other times it emerges through multiple rounds of coding. For more on how and why to use qualitative coding, see Emerson, Fretz, and Shaw's Writing Ethnographic Fieldnotes or Shaffer's Quantitative Ethnography.

Most of the tools available for qualitative coding and subsequent analysis were designed for non-programmers. They are GUI-based, proprietary, and don't expose the data in well-structured ways. Concepts from computer science, such as trees, sorting, and filtering, could also be applied to qualitative coding analysis if the interface supported it. Furthermore, a command-line based tool can be combined with other utilities into flexible pipelines.

Qualitative Coding, or qc, was designed to address these issues. I have used qc as a primary coding tool in a SIGCSE paper on how K-12 schools define and design computer science courses. The impetus for packaging and releasing a stable version was my own dissertation work.

Limitations

Due to its nature as a command-line program, qc is only well-suited to coding textual data.
qc uses line numbers as a fundamental unit. Therefore, it requires text files in your corpus to be hard-wrapped at 80 characters. The init task can handle this for you.
Currently, the only interface for actually doing the coding is a split-screen in vim, with the corpus text on one side and comma-separated codes adjacent. This works well for me, but might not work well for you. I have other ideas in the pipeline, but they won't be around soon.

Installation

pip install qualitative-coding

Setup

All the source files you want to code should be in a directory (possibly nested).
Choose a working directory. Run qc init. This will create settings.yaml.
In settings.yaml, update corpus_dir with the directory holding your source files. This may be relative to settings.yaml or absolute. Similarly, specify directories for codes_dir logs_dir, memos_dir, and the YAML file where you want to store your codebook. Unless you're particular, the default settings are fine.
Run qc init --prepare-corpus --prepare-codes --coder yourname. This will hard-wrap all the text in your corpus at 80 characters and create blank coding files.

Usage

Workflow

qc is designed to give you a powerful terminal-based interface. The general workflow is to use code to apply qualitative codes to your text files. As you go, you will start to have ideas about the meanings and organization of your codes. Use memo to capture these.

Once you finish a round of coding, it's time to reorganize your codes. Use codebook to refresh the codebook based on new coding. Use stats to see the distribution of your codes. If you want to move codes into a tree, make these changes directly in the codebook's YAML. If you realize you have redundant codes, use rename.

After you finish coding, you may want to use your codes for analysis. Tools are provided for viewing statistics, cross-tabulation, and examples of codes, with many options for selecting and filtering at various units of analysis. Results can be exported to CSV for downstream analysis.

The --coder argument supports keeping track of multiple coders on a project, and there are options to filter on coder where relevant. More analytical tools, such as inter-rater reliability, are coming.

Tutorial

Create a new directory somewhere. We will create a virtual environment, intstall qc, and download some sample text from Wikipedia.

$ python3 -m venv env
$ source env/bin/activate
$ pip install qualitative-coding
$ qc init
$ qc init
$ curl -o corpus/what_is_coding.txt "https://en.wikipedia.org/w/index.php?title=Coding_%28social_sciences%29&action=raw"
$ qc init --prepare-corpus --prepare-codes --coder chris

Why run qc init three times? The first time creates a prepopulated settings.py file. You could then change any settings. The second time reads settings.py and creates the specified files and directories. And the third run, with the flags, processes the corpus file and creates a corresponding coding file.

Now we're ready to start coding. This next command will open a split-window vim session. Add comma-separated codes to the blank file on the right. I usually page-up (control+u) and page-down (control+d) each file to keep their line numbers synchronized. Once you've added some codes, we can analyze and refine them.

$ qc code chris -f
$ qc codebook
$ qc list
- a_priori
- analysis
- coding_process
- computers
- errors
- grounded_coding
- themes

Now that we have coded our corpus (consisting of a single document), we should think about whether these codes have any structure. All data in qc is stored in flat files, so you can easily modify it by hand. Re-organize some of your codes in codebook.yaml. When you finish, run codebook again. It will go through your corpus and add any missing codes.

$ qc list
- analysis
- coding_process
    - a_priori
    - grounded_coding
- computers
- errors
- themes

I decided to group a priori coding and grounded coding together under coding process. Let's see some statistics on the codes:

$ qc stats
Code                  Count
------------------  -------
analysis                  2
coding_process            7
.  a_priori               2
.  grounded_coding        2
computers                 2
errors                    1
themes                    2

stats has lots of useful filtering and formatting options. For example, qc stats --pattern wiki --depth 1 --min 10 --format latex would only consider files having "wiki" in the filename. Within these files, it would show only top-level categories of codes having at least ten instances, and would output a table suitable for inclusion in a LaTeX document. Use --help on any command to see available options.

Next, we might want to see examples of what we have coded.

$ qc find analysis
Showing results for codes:  analysis

what_is_coding.txt (2)
================================================================================

[0:3]
In the [[social science|social sciences]], '''coding''' is an analytical process | analysis
in which data, in both [[quantitative research|quantitative]] form (such as      | 
[[questionnaire]]s results) or [[qualitative research|qualitative]] form (such   | 

[52:57]
process of selecting core thematic categories present in several documents to    | 
discover common patterns and relations.<ref>Grbich, Carol. (2013). "Qualitative  | 
Data Analysis" (2nd ed.). The Flinders University of South Australia: SAGE       | analysis
Publications Ltd.</ref>                                                          | 
                                                                                 |

Again, there are lots of options for filtering and viewing your coding. At some point, you will probably want to revise your codes. You can easily rename a code, or collapse codes together, with the remane command. This updates your codebook as well as in all your code files.

$ qc rename grounded_coding grounded

At this point, you are starting to realize some of the deeper themes running through your corpus. Capturing these in an "integrative memo" is an important part of qualitative coding. memo will open a preformatted document for you in vim.

$ qc memo chris --message "Thoughts on coding process"

Congratulations! You have finished the first round of coding. Before you move on, this would be an excellent time to check your files into version control. I hope you find qc to be powerful and efficient; it's worked for me!

-Chris Proctor

Commands

Use --help for a full list of available options for each command.

init

Initializes a new coding project, as described above.

$ qc init

check

Checks that all required files and directories are in place.

$ qc check

code

Opens a split-screen vim window with a corpus file and the corresponding code file. The name of the coder is a required positional argument. After optionally filtering using common options (below), select a document with no existing codes (for this coder) using --first (-1) or --random (-r)

$ qc code chris -1

codebook (cb)

Scans through all the code files and adds new codes to the codebook.

$ qc codebook

list (ls)

Lists all the codes currently in the codebook.

$ qc list --expanded

rename

Goes through all the code files and replaces one or more codes with another. Removes the old codes from the codebook.

$ qc rename humorous funy funnny funny

find

Displays all occurences of the provided code(s).

$ qc find math science art

stats

Displays frequency of usage for each code. Note that counts include all usages of children. List code names to show only certain codes. In addition to the common options below, code results can be filtered with --max, and --min.

$qc stats --recursive-codes --depth 2

crosstab (ct)

Displays a cross-tabulation of code co-occurrence within the unit of analysis, as counts or as probabilities (--probs, -0). Optionally use a compact (--compact, -z) output format to display more columns. In the future, this may also include odds ratios.

$qc crosstab planning implementation evaluation --recursive-codes --depth 1 --probs

Common Options

Filtering the corpus

--pattern pattern (-p): Only include corpus files and their codes which match (glob-style) pattern.
-invert (-i): Only include corpus files that do not match pattern.
--filenames filepath (-f): Only include corpus files listed in filepath (one per line).
--coder coder (-c): Only include codes entered by coder (if you use different names for different rounds of coding, you can also use this to filter by round of coding).

Filtering code selection

code [codes]: Many commands have an optional positional argument in which you may list codes to consider. If none are given, the root node in the tree of codes is assumed.
--recursive-codes (-r): Include children of selected codes.
--depth depth (-d): Limit the recursive depth of codes to select.
--unit unit (-n): Unit of analysis for reporting. Currently "document" and "line" are supported by most commands.
--recursive-counts (-a): When counting codes, also count instances of codes' children. In contrast to --recursive-codes, which controls which codes will be reported, this option controls how the counting is done.

Output and formatting

--format format (-m): Formatting style for output table. Supported values include "html", "latex", "github", and many more.
--expanded (-e): Show names of codes in expanded form (e.g. "coding_process:grounded")
--outfile outfile (-o): Save tabular results to a csv file instead of displaying.

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: GNU Affero General Public License v3
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

1.7.6

Aug 28, 2025

1.7.5

Aug 28, 2025

1.7.4

Aug 28, 2025

1.7.3

Jan 17, 2025

1.7.2

Jan 17, 2025

1.7.1

Jan 17, 2025

1.7.0

Jan 17, 2025

1.6.2

Nov 1, 2024

1.6.1

Oct 14, 2024

1.6.0

Sep 18, 2024

1.5.6

Sep 16, 2024

1.5.5

Sep 14, 2024

1.5.4

Sep 14, 2024

1.5.3

Sep 14, 2024

1.5.2

Sep 5, 2024

1.5.0

Sep 4, 2024

1.4.2

Sep 3, 2024

1.4.1

Aug 20, 2024

1.4.0

Aug 20, 2024

1.3.2

May 14, 2024

1.3.1

May 11, 2024

1.3.0

Apr 27, 2024

1.2.11

Apr 26, 2024

1.2.10

Apr 26, 2024

1.2.9

Mar 23, 2024

1.2.8

Mar 18, 2024

1.2.7

Mar 17, 2024

1.2.6

Mar 16, 2024

1.2.5

Mar 16, 2024

1.2.4

Mar 14, 2024

1.2.3

Mar 14, 2024

1.2.2

Mar 11, 2024

1.2.1

Mar 10, 2024

1.2.0

Mar 1, 2024

1.1.2

Feb 29, 2024

1.1.1

Feb 29, 2024

1.1.0

Feb 27, 2024

1.0.7

Feb 25, 2024

1.0.6

Feb 23, 2024

1.0.5

Feb 23, 2024

1.0.4

Feb 23, 2024

1.0.3

Feb 20, 2024

1.0.2

Feb 20, 2024

1.0.1

Feb 20, 2024

1.0.0

Feb 20, 2024

0.2.3

Jul 27, 2023

0.2.2

Mar 3, 2023

0.2.1

Feb 23, 2023

0.2.0

Feb 23, 2023

0.1.13

Feb 22, 2023

0.1.12

May 11, 2020

0.1.11

Mar 2, 2020

0.1.10

Feb 23, 2020

0.1.9

Feb 23, 2020

0.1.8

Feb 21, 2020

0.1.7

Feb 21, 2020

0.1.6

Feb 19, 2020

0.1.5

Feb 19, 2020

0.1.4

Feb 17, 2020

0.1.3

Feb 17, 2020

This version

0.1.2

Feb 17, 2020

0.1.1

Feb 17, 2020

0.1.0

Feb 16, 2020

0.0.23

Feb 15, 2020

0.0.22

Feb 15, 2020

0.0.21

Feb 15, 2020

0.0.20

Feb 15, 2020

0.0.19

Jan 25, 2020

0.0.18

Jan 25, 2020

0.0.17

Jan 25, 2020

0.0.16

Jan 25, 2020

0.0.15

Jan 25, 2020

0.0.14

Oct 19, 2019

0.0.13

Oct 19, 2019

0.0.12

Sep 13, 2019

0.0.11

Sep 13, 2019

0.0.10

Sep 13, 2019

0.0.9

Sep 10, 2019

0.0.8

Sep 10, 2019

0.0.7

Sep 10, 2019

0.0.6

Sep 10, 2019

0.0.5

Sep 10, 2019

0.0.4

Sep 10, 2019

0.0.3

Sep 10, 2019

0.0.2

Sep 10, 2019

0.0.1

Sep 10, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

qualitative-coding-0.1.2.tar.gz (22.4 kB view details)

Uploaded Feb 17, 2020 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

qualitative_coding-0.1.2-py3-none-any.whl (20.1 kB view details)

Uploaded Feb 17, 2020 Python 3

File details

Details for the file qualitative-coding-0.1.2.tar.gz.

File metadata

Download URL: qualitative-coding-0.1.2.tar.gz
Upload date: Feb 17, 2020
Size: 22.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.7.5

File hashes

Hashes for qualitative-coding-0.1.2.tar.gz
Algorithm	Hash digest
SHA256	`3a3f7a4bbe50ee28b77cb42fe4b02d7c0eaa59378eb5de7fff6aab5ebd65ce60`
MD5	`cf236af3eb702435ac8401f167747db3`
BLAKE2b-256	`5892f5184bd765279bb6e6a4d80dfc130da206f90a4e96d6422ed9c7cee2a46d`

See more details on using hashes here.

File details

Details for the file qualitative_coding-0.1.2-py3-none-any.whl.

File metadata

Download URL: qualitative_coding-0.1.2-py3-none-any.whl
Upload date: Feb 17, 2020
Size: 20.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.7.5

File hashes

Hashes for qualitative_coding-0.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`1331620f41ffff7a2fca51983d286779d79a71196e010c5db3e6649768734208`
MD5	`c9297bc64c286e7b1bcd45b50477bee9`
BLAKE2b-256	`37b431a0fd3c2968ff0e04cbb49bc11ab9b419afb5f1b0ac5ba6d84a9d900859`

See more details on using hashes here.

qualitative-coding 0.1.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Qualitative Coding

Limitations

Installation

Setup

Usage

Workflow

Tutorial

Commands

init

check

code

codebook (cb)

list (ls)

rename

find

stats

crosstab (ct)

Common Options

Filtering the corpus

Filtering code selection

Output and formatting

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes