Qualitative Research support tools in Python!
Project description
🔍 CRISP-T (Sense-making from Text and Numbers!)
TL;DR 🚀 CRISP-T is a qualitative research method and a toolkit to perform textual (e.g. topic modelling) and numeric (e.g. decision trees) analysis of mixed datasets for computational triangulation and sense-making using large language models.
Qualitative research involves the collection and analysis of textual data, such as interview transcripts, open-ended survey responses, and field notes. It is often used in social sciences, humanities, and health research to explore complex phenomena and understand human experiences. In addition to textual data, qualitative researchers may also collect quantitative data, such as survey responses or demographic information, to complement their qualitative findings. Additionally, qualitative researchers use external data sources, such as census data or social media data, to provide context and triangulate their findings. Qualitative research is often characterized by its inductive approach, where researchers aim to generate theories or concepts from the data rather than testing pre-existing hypotheses. It emphasizes the importance of data-driven analysis and theory development.
CRISP-T is a method and corresponding open-source tool set to integrate textual data (as a list of documents) and numeric data (as Pandas DataFrame) into structured classes that retain metadata from various analytical processes, such as topic modeling and decision trees. Researchers, with or without GenAI assistance, can define relationships between textual and numerical datasets based on their chosen theoretical lens. A final analytical phase ensures that proposed relationships actually hold true. 👉 See Demo.
An MCP server exposes all functionality as tools, resources, and prompts, enabling integration with AI agent platforms such as Claude desktop, VSCODE and other MCP-compatible clients.
Installation
pip install crisp-t
Include machine learning features for numeric data analysis:
pip install crisp-t[ml]
Include XGBoost for gradient boosting features:
pip install crisp-t[xg]
- Mac users need to install libomp:
brew install libompfor XGBoost to work.
Command Line Scripts
CRISP-T now provides four main command-line scripts:
crisp— Main CLI for qualitative triangulation and analysis (see below)crispviz— Visualization CLI for corpus data (word frequencies, topic charts, wordclouds, etc.)crispt— Corpus manipulation CLI (create, edit, query, and manage corpus objects)crisp-mcp— Starts the MCP server for AI integration (see MCP section below)
All scripts are installed as entry points and can be run directly from the command line after installation.
crisp (Analytical CLI)
crisp [OPTIONS]
Input/Output Options
--source, -s PATH|URL: Read source data from a directory (reads .txt, .pdf and a single .csv) or from a URL--sources PATH|URL: Provide multiple sources; can be used multiple times--inp, -i PATH: Load an existing corpus from a folder containingcorpus.json(and optionalcorpus_df.csv)--out, -o PATH: When saving the corpus, provide a folder path; the CLI writescorpus.json(andcorpus_df.csvif available) into that folder. When saving analysis results (topics, sentiment, etc.), this acts as a base path: files are written with suffixes, e.g.,results_topics.json.--unstructured, -t TEXT: Text CSV column(s) to analyze/compare (can be used multiple times). This is useful when you have free-form text data in a DataFrame. If this is provided, those columns are used as documents.--ignore TEXT: Comma-separated words to ignore during ingestion (applies to--source/--sources)
Analysis Options
--codedict: Generate qualitative coding dictionary--topics: Generate topic model using LDA--assign: Assign documents to topics--cat: List categories of entire corpus or individual documents--summary: Generate extractive text summary--sentiment: Generate sentiment scores using VADER--sentence: Generate sentence-level scores when applicable--nlp: Generate all NLP reports (combines above text analyses)--nnet,--cls,--knn,--kmeans,--cart,--pca,--regression,--ml: Machine learning and clustering options (requirescrisp-t[ml])--regression: Perform linear or logistic regression (automatically detects binary outcomes for logistic regression)
--visualize: Generate visualizations (word clouds, topic charts, etc.)--num, -n INTEGER: Number parameter (clusters, topics, epochs, etc.) - default: 3--rec, -r INTEGER: Record parameter (top N results, recommendations) - default: 3--filters, -f TEXT: Filters to apply askey=value(can be used multiple times); keeps only documents wheredocument.metadata[key] == value. Invalid formats raise an error.--verbose, -v: Print verbose messages for debugging
Data Sources
--source, -s PATH|URL: Read source data from a directory (reads .txt and .pdf) or from a URL--sources PATH|URL: Provide multiple sources; can be used multiple times
crispviz (Visualization CLI)
crispviz [OPTIONS]
--inp, --source, --sources: Input corpus or sources--out: Output directory for PNG images- Visualization flags:
--freq,--by-topic,--wordcloud,--top-terms,--corr-heatmap - Optional params:
--bins,--top-n,--columns
crispt (Corpus Manipulation CLI)
crispt [OPTIONS]
--id,--name,--description: Corpus metadata--doc: Add document asid|name|textorid|text(repeatable)--remove-doc: Remove document by ID (repeatable)--meta: Add/update corpus metadata askey=value(repeatable)--add-rel: Add relationship asfirst|second|relation(repeatable)--clear-rel: Clear all relationships--out: Save corpus to folder/file ascorpus.json--inp: Load corpus from folder/file containingcorpus.json- Query options:
--df-cols: Print DataFrame column names--df-row-count: Print DataFrame row count--df-row INDEX: Print DataFrame row by index--doc-ids: Print all document IDs--doc-id ID: Print document by ID--relationships: Print all relationships--relationships-for-keyword KEYWORD: Print relationships involving a keyword
- Semantic search (requires
chromadb):-
--semantic QUERY: Perform semantic search with query string -
--semantic-n N: Number of results to return (default: 5) -
--metadata-df: Export collection metadata as DataFrame+ -
--metadata-keys KEYS: Comma-separated metadata keys to include+ -
- The above two options can be used to export or add metadata from NLP to the DataFrame. For example, you can extract sentiment scores or topic assignments as additional columns for numerical analysis. This is useful if dataframe and documents are aligned as in a survey response.
-
Example Usage
When saving the corpus via --out, the CLI writes corpus.json (and corpus_df.csv if present) into the specified folder. If you pass a file path, only its parent directory is used for writing corpus.json.
MCP Server
CRISP-T provides a Model Context Protocol (MCP) server that exposes all functionality as tools, resources, and prompts. This enables integration with AI assistants and other MCP-compatible clients.
Using the MCP Server
Configuring MCP Clients
Claude Desktop
Add to your Claude Desktop configuration file:
MacOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Windows: %APPDATA%\Claude\claude_desktop_config.json
{
"mcpServers": {
"crisp-t": {
"command": "<python-path>crisp-mcp"
}
}
}
Using with Other MCP Clients
The server can be used with any MCP-compatible client. Configure your client to run the crisp-mcp command via stdio.
Available Tools
The MCP server provides tools for:
Corpus Management
load_corpus- Load corpus from folder or sourcesave_corpus- Save corpus to folderadd_document- Add new documentremove_document- Remove document by IDget_document- Get document detailslist_documents- List all document IDsadd_relationship- Link text keywords with numeric columnsget_relationships- Get all relationshipsget_relationships_for_keyword- Query relationships by keyword
NLP/Text Analysis
assign_topics- Assign documents to topics (creates keyword labels)extract_categories- Extract common conceptsgenerate_summary- Generate extractive summarysentiment_analysis- VADER sentiment analysis
Semantic Search (requires chromadb)
semantic_search- Find documents similar to a query using semantic similarityexport_metadata_df- Export ChromaDB metadata as DataFrame
DataFrame/CSV Operations
get_df_columns- Get DataFrame column namesget_df_row_count- Get number of rowsget_df_row- Get specific row by index
Machine Learning (requires crisp-t[ml])
kmeans_clustering- K-Means clusteringdecision_tree_classification- Decision tree with feature importancesvm_classification- SVM classificationneural_network_classification- Neural network classificationregression_analysis- Linear/logistic regression with coefficientspca_analysis- Principal Component Analysisassociation_rules- Apriori association rulesknn_search- K-nearest neighbors search
Resources
The server exposes corpus documents as resources:
corpus://document/{id}- Access document text by ID
Prompts
analysis_workflow- Complete step-by-step analysis guide based on INSTRUCTIONS.mdtriangulation_guide- Guide for triangulating qualitative and quantitative findings
Example MCP commands
Role of CRISP-T in research and practice
The workflow enables AI assistants to help conduct comprehensive analyses by combining text analytics, machine learning, and triangulation of qualitative-quantitative findings.
For example, in market research, a company collects:
- Textual feedback from customer support interactions.
- Numerical data on customer retention and sales performance. Using this framework, business analysts can investigate how recurring concerns in feedback correspond to measurable business outcomes.
Framework Documentation
For detailed information about available functions, metadata handling, and theoretical frameworks, see the comprehensive user instructions. For semantic search examples and best practices, see the Semantic Search Guide. Documentation (WIP) is also available here.
Citation
- Released on 10/11/2025 for presentation at ICIS 2025 conference.
- Paper coming soon. Cite this repository in the meantime:
Give us a star ⭐️
If you find this project useful, give us a star. It helps others discover the project.
Contact
- Bell Eapen (UIS) | Contact |
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file crisp_t-0.3.0.tar.gz.
File metadata
- Download URL: crisp_t-0.3.0.tar.gz
- Upload date:
- Size: 706.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
73f2b15885a630140ea8c467dd022dd87ae6ea693b6f86b464d4c7b8ea5411e2
|
|
| MD5 |
e5283679e741e0b8433a2d528a3a01b6
|
|
| BLAKE2b-256 |
0216ce7b1cc78bcc59bffe30a4c087f0c9021bf48493895701e0072e2acd558d
|
File details
Details for the file crisp_t-0.3.0-py3-none-any.whl.
File metadata
- Download URL: crisp_t-0.3.0-py3-none-any.whl
- Upload date:
- Size: 86.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d0bc5c15aec4c17e68c45786db73065acaa63b50f6625f093f214fd371cd6ef4
|
|
| MD5 |
e9f1fe77131b353b4dd8a581aea222aa
|
|
| BLAKE2b-256 |
9ef04016d557bb8af270ab985865fdf207b9596347c49fb714c34f14050dc504
|