Qualitative Research support tools in Python!

These details have not been verified by PyPI

Project links

Project description

🔍 CRISP-T (Sense-making from Text and Numbers!)

TL;DR 🚀 CRISP-T is a qualitative research method and a toolkit to perform textual (e.g. topic modelling) and numeric (e.g. decision trees) analysis of mixed datasets for computational triangulation and sense-making using large language models. 👉 See Demo.

✅ CRISP is written in Python, but you don’t need to know Python to use it!

✅ CRISP is not a data science tool; it’s a sense-making tool!

✅ CRISP does not replace your analysis; it just augments it!

✅ CRISP employs an interpretivist approach, and the same lens is required to comprehend its results!

✅ CRISP does not need LLMs but can augment them with tools!

✅ CRISP is designed to simplify your life as a qualitative researcher!

💯 CRISP is open-source! licensed under the GPL-3.0 License.

Qualitative research focuses on collecting and analyzing textual data—such as interview transcripts, open-ended survey responses, and field notes—to explore complex phenomena and human experiences. Researchers may also incorporate quantitative or external sources (e.g., demographics, census data, social media) to provide context and triangulate findings. Characterized by an inductive approach, qualitative research emphasizes generating theories from data rather than testing hypotheses. While qualitative and quantitative data are often used together, there is no standard method for combining them.

CRISP-T is a method and toolset to integrate textual data (as a list of documents) and numeric data (as Pandas DataFrame) into structured classes that retain metadata from various analytical processes, such as topic modeling and decision trees. Researchers, with or without GenAI assistance, can define relationships between textual and numerical datasets based on their chosen theoretical lens. An optional final analytical phase ensures that proposed relationships actually hold true. Further, if the numeric and textual datasets share same id, or if the textual metadata contains keywords that match numeric column names; both datasets are filtered simultaneously, ensuring alignment and facilitating triangulation. 👉 See Demo.

CRISP-T implements semantic search using ChromaDB to find relevant documents or document chunks based on similarity to a query or reference documents. This is useful for literature reviews to find documents likely to fit inclusion criteria within your corpus/search results. It can also be used for coding/annotating documents by finding relevant chunks within a specific document.

An MCP server exposes all functionality as tools, resources, and prompts, enabling integration with AI agent platforms such as Claude desktop, VSCODE and other MCP-compatible clients. CRISP-T cannot directly code the documents, but it provides semantic chunk search that may be used in association with other tools to acheive automated coding. For example, VSCODE provides built in tools for editing text and markdown files, which can be used to code documents based on semantic search.

Installation

pip install crisp-t

Include machine learning features for numeric data analysis (Recommended):

pip install crisp-t[ml]

Include XGBoost for gradient boosting features (Optional):

pip install crisp-t[xg]

Mac users need to install libomp: brew install libomp for XGBoost to work. (Needed only if you want to use XGBoost)

Command Line Scripts

CRISP-T now provides four main command-line scripts:

crisp — Main CLI for qualitative triangulation and analysis (see below)
crispviz — Visualization CLI for corpus data (word frequencies, topic charts, wordclouds, etc.)
crispt — Corpus manipulation CLI (create, edit, query, and manage corpus objects)
crisp-mcp — Starts the MCP server for AI integration (see MCP section below)

All scripts are installed as entry points and can be run directly from the command line after installation.

crisp (Analytical CLI)

crisp [OPTIONS]

⚠️ First step is to create a corpus from sources (Data Import).

Source data is read from a directory containing text files (.txt, .pdf) and a single .csv file (for numeric data). The corpus is saved to --out folder and this folder can be used as input for all subsequent analyses.

⚠️ This import step only needs to be done once.

crisp --source PATH --out PATH

e.g., crisp --source crisp_source --out crisp_input

ℹ️ crisp_input is recommended for --out option above. The folder is created in the current directory. 👉 See Demo.

ℹ️ From here onwards, you can load the corpus from that folder using --inp option for all subsequent analyses. You can omit --inp if you are using the crisp_input folder, as it is the default for --inp. 👉 See Demo.

⚡️ Advanced users may also load corpus from URL --source or multiple URLs using --sources option. ⚡️

Input/Output Options

--source, -s PATH|URL: Read source data from a directory (reads .txt, .pdf and a single .csv) or from a URL
--sources PATH|URL: Provide multiple sources; can be used multiple times
--inp, -i PATH: Load an existing corpus from a folder containing corpus.json (and optional corpus_df.csv)
--out, -o PATH: When saving the corpus, provide a folder path; the CLI writes corpus.json (and corpus_df.csv if available) into that folder. When saving analysis results (topics, sentiment, etc.), this acts as a base path: files are written with suffixes, e.g., results_topics.json.
--unstructured, -t TEXT: Text CSV column(s) to analyze/compare (can be used multiple times). This is useful when you have free-form text data in a DataFrame. If this is provided, those columns are used as documents.
--ignore TEXT: Comma-separated words to ignore during ingestion (applies to --source/--sources)

Analysis Options

--codedict: Generate qualitative coding dictionary
--topics: Generate topic model using LDA
--assign: Assign documents to topics
--cat: List categories of entire corpus or individual documents
--summary: Generate extractive text summary
--sentiment: Generate sentiment scores using VADER
--sentence: Generate sentence-level scores when applicable
--nlp: Generate all NLP reports (combines above text analyses)
--nnet, --cls, --knn, --kmeans, --cart, --pca, --regression, --lstm, --ml: Machine learning and clustering options (requires crisp-t[ml])
- --regression: Perform linear or logistic regression (automatically detects binary outcomes for logistic regression)
- --lstm: Train LSTM model on text data to predict outcome variable (requires binary outcome and 'id' column for alignment)
--visualize: Generate visualizations (word clouds, topic charts, etc.)
--num, -n INTEGER: Number parameter (clusters, topics, epochs, etc.) - default: 3
--rec, -r INTEGER: Record parameter (top N results, recommendations) - default: 3
--filters, -f TEXT: Filters to apply as key=value (can be used multiple times); keeps only documents where document.metadata[key] == value. Invalid formats raise an error.
--verbose, -v: Print verbose messages for debugging

Data Sources

--source, -s PATH|URL: Read source data from a directory (reads .txt and .pdf) or from a URL
--sources PATH|URL: Provide multiple sources; can be used multiple times

Display Options

The --print, -p option provides flexible ways to display corpus information with color-coded output. You can use either quoted or unquoted syntax:

Syntax:

Quoted: --print "command subcommand"
Unquoted: --print command --print subcommand

Basic Options:

--print all: Display all corpus information (documents, dataframe, metadata)
--print documents: Show first 5 documents with IDs, names, and text snippets
--print documents --print N: Show first N documents (e.g., --print documents --print 10 shows 10 documents)
--print documents --print metadata: Display metadata for all documents (categories, scores, etc.)

DataFrame Options:

--print dataframe: Show DataFrame head with shape and column information
--print dataframe --print metadata: Display DataFrame columns starting with metadata_ prefix
--print dataframe --print stats: Show descriptive statistics and value distributions

Metadata Options:

--print metadata: Display all corpus metadata keys and values
--print metadata --print KEY: Show specific metadata (e.g., --print metadata --print pca)
- Available keys include: pca, numeric_clusters, kmeans, nnet_predictions, svm_confusion_matrix, decision_tree_accuracy, and more

Legacy Option:

--print stats: (Deprecated) Use --print dataframe --print stats instead

Examples:

# Show first 10 documents (unquoted syntax)
crisp --print documents --print 10

# Show first 10 documents (quoted syntax - backward compatible)
crisp --print "documents 10"

# View document metadata (unquoted)
crisp --print documents --print metadata

# View document metadata (quoted)
crisp --print "documents metadata"

# Check PCA results (unquoted)
crisp --print metadata --print pca

# View DataFrame statistics (unquoted)
crisp --print dataframe --print stats

crispviz (Visualization CLI)

crispviz [OPTIONS]

--inp, --source, --sources: Input corpus or sources
--out: Output directory for PNG images
Visualization flags: --freq, --by-topic, --wordcloud, --ldavis, --top-terms, --corr-heatmap, --tdabm
Optional params: --bins, --top-n, --columns, --topics-num

Visualization Options:

--freq: Export word frequency distribution
--by-topic: Export distribution by dominant topic (requires LDA)
--wordcloud: Export topic wordcloud (requires LDA)
--ldavis: Export interactive LDA visualization as HTML (requires LDA and pyLDAvis)
--top-terms: Export top terms bar chart
--corr-heatmap: Export correlation heatmap from CSV numeric columns
--tdabm: Export TDABM visualization (requires TDABM analysis in corpus metadata). Use crispt --tdabm to perform the analysis first.
--topics-num N: Number of topics for LDA (default: 8, based on Mettler et al., 2025)

crispt (Corpus Manipulation CLI)

crispt [OPTIONS]

--id, --name, --description: Corpus metadata
--doc: Add document as id|name|text or id|text (repeatable)
--remove-doc: Remove document by ID (repeatable)
--meta: Add/update corpus metadata as key=value (repeatable)
--add-rel: Add relationship as first|second|relation (repeatable)
--clear-rel: Clear all relationships
--out: Save corpus to folder/file as corpus.json
--inp: Load corpus from folder/file containing corpus.json
Query options:
- --df-cols: Print DataFrame column names
- --df-row-count: Print DataFrame row count
- --df-row INDEX: Print DataFrame row by index
- --doc-ids: Print all document IDs
- --doc-id ID: Print document by ID
- --relationships: Print all relationships
- --relationships-for-keyword KEYWORD: Print relationships involving a keyword
Semantic search (requires chromadb):
- --semantic QUERY: Perform semantic search with query string
- --similar-docs DOC_IDS: Find documents similar to comma-separated list of document IDs (useful for literature reviews)
- --num N: Number of results to return (default: 5). Used for --semantic and --similar-docs
- --semantic-chunks QUERY: Perform semantic search on document chunks. Returns matching chunks for a specific document (use with --doc-id and --rec for similarity threshold between 0 and 10 with a default of 8.5)
- --rec THRESHOLD: Threshold for semantic operations. For --semantic-chunks, use 0-10 (default: 8.5). For --similar-docs, use 0-1 (default: 0.7). Only results with similarity above this value are returned
- --metadata-df: Export collection metadata as DataFrame+
- --metadata-keys KEYS: Comma-separated metadata keys to include+
TDABM analysis:
- --tdabm Y_VAR:X_VARS:RADIUS: Perform Topological Data Analysis Ball Mapper (TDABM) analysis. Format: y_variable:x_variables:radius (e.g., satisfaction:age,income:0.3). Radius defaults to 0.3 if omitted.

ℹ️ --metadata-df and --metadata-keys options can be used to export or add metadata from NLP to the DataFrame. For example, you can extract sentiment scores or topic assignments as additional columns for numerical analysis. This is useful if dataframe and documents are aligned as in a survey response.

Example Usage

When saving the corpus via --out, the CLI writes corpus.json (and corpus_df.csv if present) into the specified folder. If you pass a file path, only its parent directory is used for writing corpus.json.

MCP Server

CRISP-T provides a Model Context Protocol (MCP) server that exposes all functionality as tools, resources, and prompts. This enables integration with AI assistants and other MCP-compatible clients.

Using the MCP Server

Configuring MCP Clients

Claude Desktop

Add to your Claude Desktop configuration file:

MacOS: ~/Library/Application Support/Claude/claude_desktop_config.json Windows: %APPDATA%\Claude\claude_desktop_config.json

{
  "mcpServers": {
    "crisp-t": {
      "command": "<python-path>crisp-mcp"
    }
  }
}

Using with Other MCP Clients

The server can be used with any MCP-compatible client. Configure your client to run the crisp-mcp command via stdio.

Available Tools

The MCP server provides tools for:

Corpus Management

load_corpus - Load corpus from folder or source
save_corpus - Save corpus to folder
add_document - Add new document
remove_document - Remove document by ID
get_document - Get document details
list_documents - List all document IDs
add_relationship - Link text keywords with numeric columns
get_relationships - Get all relationships
get_relationships_for_keyword - Query relationships by keyword

NLP/Text Analysis

assign_topics - Assign documents to topics (creates keyword labels)
extract_categories - Extract common concepts
generate_summary - Generate extractive summary
sentiment_analysis - VADER sentiment analysis

Semantic Search (requires chromadb)

semantic_search - Find documents similar to a query using semantic similarity
find_similar_documents - Find documents similar to a set of reference documents (useful for literature reviews and qualitative research)
semantic_chunk_search - Find relevant chunks within a specific document (useful for coding/annotating documents)
export_metadata_df - Export ChromaDB metadata as DataFrame

DataFrame/CSV Operations

get_df_columns - Get DataFrame column names
get_df_row_count - Get number of rows
get_df_row - Get specific row by index

Machine Learning (requires crisp-t[ml])

kmeans_clustering - K-Means clustering
decision_tree_classification - Decision tree with feature importance
svm_classification - SVM classification
neural_network_classification - Neural network classification
regression_analysis - Linear/logistic regression with coefficients
pca_analysis - Principal Component Analysis
association_rules - Apriori association rules
knn_search - K-nearest neighbors search
lstm_text_classification - LSTM model for text-based outcome prediction

Resources

The server exposes corpus documents as resources:

corpus://document/{id} - Access document text by ID

Prompts

analysis_workflow - Complete step-by-step analysis guide based on INSTRUCTIONS.md
triangulation_guide - Guide for triangulating qualitative and quantitative findings

Example MCP commands

Role of CRISP-T in research and practice

The workflow enables AI assistants to help conduct comprehensive analyses by combining text analytics, machine learning, and triangulation of qualitative-quantitative findings.

For example, in market research, a company collects:

Textual feedback from customer support interactions.
Numerical data on customer retention and sales performance. Using this framework, business analysts can investigate how recurring concerns in feedback correspond to measurable business outcomes.

Framework Documentation

For detailed information about available functions, metadata handling, and theoretical frameworks, see the comprehensive user instructions. For semantic search examples and best practices, see the Semantic Search Guide. Documentation (WIP) is also available here.

Data model

References

Citation

Released on 10/11/2025 for presentation at ICIS 2025 conference.
Paper coming soon. Cite this repository in the meantime:

Give us a star ⭐️

If you find this project useful, give us a star. It helps others discover the project.

Contact

Bell Eapen (UIS) | Contact |

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

2.3.2

Mar 11, 2026

2.3.1

Feb 16, 2026

2.3.0

Feb 11, 2026

2.2.1

Feb 1, 2026

2.2.0

Jan 31, 2026

2.1.0

Jan 30, 2026

2.0.0

Jan 29, 2026

1.2.3

Jan 9, 2026

1.2.2

Dec 23, 2025

1.2.1

Dec 1, 2025

1.2.0

Nov 8, 2025

1.1.1

Oct 28, 2025

1.1.0

Oct 28, 2025

This version

1.0.0

Oct 26, 2025

0.9.0

Oct 25, 2025

0.8.0

Oct 22, 2025

0.7.0

Oct 21, 2025

0.6.0

Oct 19, 2025

0.5.0

Oct 18, 2025

0.4.0

Oct 15, 2025

0.3.0

Oct 12, 2025

0.2.0

Oct 10, 2025

0.1.0

Oct 10, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

crisp_t-1.0.0.tar.gz (786.4 kB view details)

Uploaded Oct 26, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

crisp_t-1.0.0-py3-none-any.whl (106.0 kB view details)

Uploaded Oct 26, 2025 Python 3

File details

Details for the file crisp_t-1.0.0.tar.gz.

File metadata

Download URL: crisp_t-1.0.0.tar.gz
Upload date: Oct 26, 2025
Size: 786.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for crisp_t-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`4072ff230bc2654eeec8e7e07e4547d50c885ea1c157930091445fb2b9d269da`
MD5	`734bec2f63b4c3a88ce4fe065e0cfab8`
BLAKE2b-256	`8ab138da782229875b72f12c9c774b33a789070b155eab3c20bb629be8ec23a5`

See more details on using hashes here.

File details

Details for the file crisp_t-1.0.0-py3-none-any.whl.

File metadata

Download URL: crisp_t-1.0.0-py3-none-any.whl
Upload date: Oct 26, 2025
Size: 106.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for crisp_t-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`57c5bf3790a6abe2f72873114f114dbaf1465286e923fb29cd0428f46cae6686`
MD5	`7f2c19f06a8cd54988276e01ed97bcc0`
BLAKE2b-256	`2dfa3f7a0e08992e742603e5d1c93f4a83625179b52bb9225e77f3173d92bef0`

See more details on using hashes here.

crisp-t 1.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

🔍 CRISP-T (Sense-making from Text and Numbers!)

Installation

Command Line Scripts

crisp (Analytical CLI)

⚠️ First step is to create a corpus from sources (Data Import).

Input/Output Options

Analysis Options

Data Sources

Display Options

crispviz (Visualization CLI)

crispt (Corpus Manipulation CLI)

Example Usage

MCP Server

Using the MCP Server

Configuring MCP Clients

Claude Desktop

Using with Other MCP Clients

Available Tools

Resources

Prompts

Example MCP commands

Role of CRISP-T in research and practice

Framework Documentation

Data model

References

Citation

Give us a star ⭐️

Contact

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes