Zero-code toolkit for social science text analysis — classification (QuantiKit), qualitative coding (QualiKit), and research methods tools (Toolbox)
Project description
SocialSciKit
A zero-code text analysis toolkit for social science researchers
English | 中文文档
What is SocialSciKit?
SocialSciKit is an open-source Python toolkit that enables social science researchers to perform text analysis without writing a single line of code. It provides a Gradio-based web interface with full bilingual support (English / Chinese).
Three core modules:
- QuantiKit — End-to-end text classification pipeline (method recommendation → annotation → prompt/fine-tuning classification → evaluation → export)
- QualiKit — End-to-end qualitative coding pipeline (upload → de-identification → research framework → LLM coding with evidence grounding → human review → export)
- Toolbox — Standalone research methods tools: Inter-Coder Reliability (ICR) calculator, Multi-LLM Consensus Coding, and Methods Section Generator
Highlights
- Visualization Dashboard — academic-style matplotlib charts (confusion matrix heatmaps, per-class P/R/F1 bars, confidence histograms, progress donuts, theme distribution) embedded throughout both pipelines
- Evidence Highlighting — LLM codings include a verbatim
evidence_spanfrom the source text; the review UI highlights the supporting quote inline in the original document - Project Save & Restore — serialize the entire research project state (data, annotations, sessions, coding results) to a single JSON file; resume work later from the Home tab
- Zero-code web UI — Gradio 4.44+ with full EN/ZH language switching at runtime
Table of Contents
- Installation
- Quick Start
- QuantiKit: Text Classification
- QualiKit: Qualitative Coding
- Toolbox: Research Methods Tools
- Project Save & Restore
- Supported LLM Backends
- Example Datasets
- Project Structure
- Key References
- Citation
- Development
- License & Disclaimer
- Author
Installation
Requirements
- Python 3.9 or higher
- pip (Python package manager)
Option A: Install from PyPI
pip install socialscikit
Option B: Install from source
git clone https://github.com/Baron-Sun/socialscikit.git
cd socialscikit
pip install -e .
Core Dependencies
| Package | Version | Purpose |
|---|---|---|
gradio |
≥ 4.0 | Web UI framework |
pandas |
≥ 2.0 | Data manipulation |
openpyxl |
any | Excel read/write |
spacy |
≥ 3.7 | NLP pipeline (tokenization, NER) |
transformers |
≥ 4.40 | Fine-tuning (RoBERTa / XLM-R) |
datasets |
any | HuggingFace dataset handling |
openai |
≥ 1.0 | OpenAI API client |
anthropic |
≥ 0.25 | Anthropic API client |
scikit-learn |
any | Evaluation metrics |
scipy |
any | Statistical computation |
bertopic |
any | Topic modeling |
presidio-analyzer |
any | PII detection engine |
presidio-anonymizer |
any | PII anonymization |
langdetect |
any | Language detection |
tiktoken |
any | Token counting |
httpx |
any | Ollama HTTP client |
rich |
any | CLI formatting |
Optional: spaCy language models
For best de-identification performance, download at least one spaCy model:
# English
python -m spacy download en_core_web_sm
# Chinese
python -m spacy download zh_core_web_sm
Quick Start
Launch the unified app (recommended)
socialscikit launch
# or simply:
socialscikit
# Opens at http://127.0.0.1:7860
Launch individual modules
# QuantiKit only
socialscikit quantikit --port 7860
# QualiKit only
socialscikit qualikit --port 7861
CLI Options
| Flag | Description | Default |
|---|---|---|
--port |
Server port number | 7860 / 7861 |
--share |
Create a public Gradio link | False |
First-time language switch
The default UI language is English. Use the Language toggle at the top of the page to switch to Chinese. All labels, buttons, and instructions update in real time.
QuantiKit: Text Classification
QuantiKit guides you through the full text classification workflow in 6 steps.
Step 1 · Data Upload
- Supported formats: CSV, Excel (.xlsx/.xls), JSON, JSONL
- Upload your data file, then map the
textandlabelcolumns - Automatic data validation: detects missing values, empty strings, encoding issues
- One-click fix: auto-repair common data quality issues
- Diagnostic report: label distribution, text length statistics, duplicate detection
Step 2 · Recommendation
- Method recommender: analyzes your data characteristics (size, class count, imbalance ratio, text length) and recommends the optimal classification approach — zero-shot, few-shot, or fine-tuning — with literature citations
- Budget recommender: estimates "how many labels do you need?" using power-law learning curve fitting, with 80% confidence intervals and marginal return curves
- Cold-start mode: priors from CSS benchmark datasets (HatEval, SemEval, MFTC)
- Empirical mode: fits
f1 = a * n^b + con your labeled subset
Step 3 · Annotation
- Built-in annotation UI — no need for external tools
- Label each text sample, with skip, undo, flag for review support
- Real-time progress donut chart — visual progress tracker updates after every action
- Export annotated data as CSV, merge with original dataset
Step 4 · Classification
Three sub-approaches available in parallel tabs:
| Sub-tab | Method | When to use |
|---|---|---|
| Prompt Classification | Zero/few-shot via LLM API | Small datasets (< 200 labeled) |
| Fine-tuning | Local transformer fine-tuning | Medium datasets (200+), no API cost |
| API Fine-tuning | OpenAI fine-tuning API | Large datasets, best performance |
Prompt Classification features:
- Prompt Designer: task description + class definitions + positive/negative examples → auto-generates a structured prompt
- Prompt Optimizer: generates 3 variants using APE (Automatic Prompt Engineering), evaluates each on a test split
- One-click batch classification on the full dataset
Step 5 · Evaluation
Full visualization dashboard:
- Metric summary cards (HTML) — Accuracy, Macro-F1, Weighted-F1, Cohen's Kappa, total/correct counts
- Confusion matrix heatmap — row-normalized, annotated with counts + percentages
- Per-class metrics bar chart — Precision / Recall / F1 grouped bars per class
- Collapsible full text report below charts
Step 6 · Export
- Download classification results as CSV (original text + predicted labels + confidence)
- Pipeline log export — JSON metadata usable by the Toolbox Methods Generator
- Save project — persist all research state (data, predictions, annotation session) to a single JSON file
QualiKit: Qualitative Coding
QualiKit supports the full qualitative coding workflow for interview transcripts, focus group data, and open-ended survey responses.
Step 1 · Upload & Segment
- Supported formats: plain text (.txt)
- Automatic speaker detection and segmentation (by paragraph or by speaker turn)
- Configurable context window (number of surrounding sentences to include)
- Preview segmented results in a table before proceeding
Step 2 · De-identification
- Automatic PII detection: person names, email addresses, phone numbers, Chinese ID card numbers
- Chinese-aware NER: detects Chinese names with title/honorific patterns
- English NER via spaCy and Presidio
- Replacement strategies: pseudonym, redact (
[REDACTED]), or tag-based ([PERSON_1]) - Per-item review: accept, reject, or edit each detected PII replacement individually
- Bulk actions: accept all, accept high-confidence only (≥ 0.90), or apply all accepted to the text
Step 3 · Research Framework
- Define your Research Questions (RQs) and Sub-themes using an interactive editable table
- Add/remove rows dynamically
- LLM-powered sub-theme suggestion: connect to an LLM backend, and it analyzes your transcript to suggest relevant sub-themes per RQ
- Confirm framework before proceeding to coding
Step 4 · LLM Coding
- Batch coding: LLM reads each segment and assigns RQ + sub-theme labels with confidence scores
- Evidence grounding: the LLM prompt requires a verbatim
evidence_span— the exact phrase or sentence from the source text that supports the coding decision - Supports OpenAI, Anthropic, and Ollama backends
- Results displayed with segment text, assigned codes, confidence levels, and evidence spans
Step 5 · Review
- Review coding results in a table sorted by confidence
- Evidence highlighting: when you select an item, the original text is shown with the LLM's
evidence_spanhighlighted in green, so you can verify the coding decision at a glance; if the exact quote isn't found, a fallback "Evidence" block displays the cited text - Visualization dashboard (collapsible accordion):
- Review progress donut — accepted / edited / rejected / pending counts
- Confidence histogram — low/medium/high tier shading + median marker
- Theme distribution — horizontal bar chart of RQ frequencies
- Per-item actions: accept, reject, or edit (reassign RQ/sub-theme)
- Bulk accept by confidence threshold
- Manual coding: select a segment, preview its content, and manually assign RQ + sub-theme labels
- Cascading dropdown: sub-theme choices automatically filter based on selected RQ
Step 6 · Export
- Export reviewed coding results as structured Excel file
- Pipeline log export — JSON metadata usable by the Toolbox Methods Generator
- Save project — persist the entire coding session (segments, RQs, review state, evidence spans) to a single JSON file
Toolbox: Research Methods Tools
The Toolbox provides standalone research utilities that work independently or in combination with QuantiKit / QualiKit.
ICR Calculator
Compute inter-coder reliability for 2 or more coders with automatic metric selection:
| Scenario | Metric |
|---|---|
| 2 coders, single-label | Cohen's Kappa + Krippendorff's Alpha + per-category agreement |
| 3+ coders, single-label | Krippendorff's Alpha + pairwise Cohen's Kappa |
| 2 coders, multi-label | Jaccard index (pairwise) |
| 3+ coders, multi-label | Average pairwise Jaccard |
- Upload a CSV with coder columns, select which columns to compare
- Interpretation follows the Landis & Koch (1977) scale
Consensus Coding
Multi-LLM majority-vote coding for qualitative data:
- Configure 2–5 LLM backends (OpenAI, Anthropic, Ollama) with independent models
- Each LLM codes every text segment; final label is determined by majority vote
- Agreement statistics across LLMs are reported automatically
Methods Section Generator
Auto-generate a methods section paragraph (English + Chinese) for your paper:
- From pipeline log: QuantiKit and QualiKit can export a pipeline log (JSON) capturing all metadata (sample size, model, metrics, themes, etc.). Import the log and generate a ready-to-use methods paragraph.
- Manual input: Fill in metadata fields manually if you prefer not to use the pipeline log.
Project Save & Restore
Long research projects rarely finish in one session. SocialSciKit serialises the full state of your work — loaded DataFrames, annotation sessions (including cursor and history), extraction review sessions, research questions, de-identification results — into a single JSON file:
- Save: at the end of any pipeline, click "Save Project" in Step 6 to download a
.jsonarchive - Restore: return to the Home tab, expand "Load Saved Project", upload the JSON file, and all state is restored across both pipelines
- Tagged-union serialisation: complex types (
pd.DataFrame,AnnotationSession,ExtractionReviewSession,ResearchQuestion,ExtractionResult, enums) round-trip losslessly; elapsed-time counters are preserved via monotonic time offsets - Version-aware: project files include a
__project_version__field so future readers can migrate old archives
Supported LLM Backends
| Backend | Example Models | Use Case |
|---|---|---|
| OpenAI | gpt-4o, gpt-4o-mini, gpt-4.1, gpt-4.1-mini, gpt-4.1-nano |
Classification, coding, prompt optimization |
| Anthropic | claude-sonnet-4-20250514, claude-haiku-4-5-20251001 |
Classification, coding, prompt optimization |
| Ollama | llama3, mistral, qwen2.5 |
Local inference, no API key needed |
To use Ollama, install it from ollama.com and pull a model:
ollama pull llama3
Example Datasets
The examples/ directory contains ready-to-use sample data:
| File | Module | Description |
|---|---|---|
sentiment_example.csv |
QuantiKit | 50 Chinese product/service reviews with 3 sentiment labels |
policy_example.csv |
QuantiKit | 40 Chinese policy text excerpts with 8 policy-instrument labels |
interview_example.txt |
QualiKit | Single-person community healthcare interview transcript |
interview_focus_group.txt |
QualiKit | 4-person focus group on elderly digital service experiences |
icr_example.csv |
Toolbox | 20 policy texts coded by 3 coders (A/B/C) for ICR calculation |
consensus_example.csv |
Toolbox | 15 interview segments for multi-LLM consensus coding |
methods_log_quantikit.json |
Toolbox | Sample QuantiKit pipeline log for methods generation |
methods_log_qualikit.json |
Toolbox | Sample QualiKit pipeline log for methods generation |
Cookbook: Sentiment Classification (QuantiKit)
- Launch:
socialscikit launch→ click QuantiKit tab - Upload
examples/sentiment_example.csv - Map columns: text →
text, label →label - Go to Step 2 → click Recommend to see method suggestion
- Go to Step 4 → select LLM backend → enter labels
- Click Generate Prompt → Run Classification
- Go to Step 5 → evaluate against gold labels
- Go to Step 6 → export results
Cookbook: Focus Group Coding (QualiKit)
- Launch:
socialscikit launch→ click QualiKit tab - Upload
examples/interview_focus_group.txt - Step 1: select "Speaker turn" segmentation → click Segment
- Step 2: run de-identification → review and accept/reject each PII replacement
- Step 3: define RQs and sub-themes → optionally use LLM to suggest sub-themes
- Step 4: select LLM backend → run batch coding
- Step 5: review results, bulk accept high-confidence codes, manually fix low-confidence ones
- Step 6: export to Excel
Project Structure
socialscikit/
├── core/ # Shared infrastructure
│ ├── data_loader.py # Multi-format data reader (CSV/Excel/JSON/txt)
│ ├── data_validator.py # Schema validation + auto-fix
│ ├── data_diagnostics.py # Data quality diagnostics report
│ ├── llm_client.py # Unified LLM client (OpenAI/Anthropic/Ollama)
│ ├── icr.py # Inter-coder reliability (Kappa/Alpha/Jaccard)
│ ├── methods_writer.py # Methods section generator (EN/ZH templates)
│ ├── charts.py # Academic-style matplotlib charts (viz dashboard)
│ ├── project_io.py # Project state serialization (save/restore)
│ └── templates/ # Template files for download
│
├── quantikit/ # Text classification module
│ ├── feature_extractor.py # Dataset feature extraction
│ ├── method_recommender.py # Rule-based method recommendation (with citations)
│ ├── budget_recommender.py # Annotation budget estimation
│ ├── prompt_optimizer.py # APE-based prompt generation & optimization
│ ├── prompt_classifier.py # Zero/few-shot LLM classification
│ ├── annotator.py # Built-in annotation interface
│ ├── classifier.py # Transformer fine-tuning pipeline
│ ├── api_finetuner.py # OpenAI fine-tuning API wrapper
│ └── evaluator.py # Accuracy / F1 / Kappa / confusion matrix
│
├── qualikit/ # Qualitative coding module
│ ├── segmenter.py # Text segmentation (paragraph / speaker turn)
│ ├── segment_extractor.py # Segment-level extraction
│ ├── deidentifier.py # PII detection (Chinese + English)
│ ├── deident_reviewer.py # De-identification interactive review
│ ├── theme_definer.py # Theme definition + LLM suggestion
│ ├── theme_reviewer.py # Theme review & overlap detection
│ ├── coder.py # LLM batch coding
│ ├── confidence_ranker.py # Confidence scoring & ranking
│ ├── coding_reviewer.py # Human-in-the-loop coding review
│ ├── extraction_reviewer.py # Extraction result review
│ ├── consensus.py # Multi-LLM consensus coding (majority vote)
│ └── exporter.py # Excel / Markdown export
│
├── ui/ # Gradio web interface
│ ├── main_app.py # Unified app (Home + QuantiKit + QualiKit + Toolbox)
│ ├── quantikit_app.py # QuantiKit UI callbacks
│ ├── qualikit_app.py # QualiKit UI callbacks
│ ├── toolbox_app.py # Toolbox UI callbacks (ICR/Consensus/Methods)
│ └── i18n.py # Internationalization (EN / ZH)
│
├── cli.py # Command-line entry point
│
examples/ # Sample datasets
tests/ # Test suite (676 tests)
promo/ # Promotional posters + HTML sources
pyproject.toml # Package metadata & dependencies
CITATION.cff # Citation metadata
Key References
The method recommendation engine and workflow design are grounded in the following computational social science literature:
- Sun, B., Chang, C., Ang, Y. Y., Mu, R., Xu, Y. & Zhang, Z. (2026). Creation of the Chinese Adaptive Policy Communication Corpus. ACL 2026.
- Carlson, K. et al. (2026). The use of LLMs to annotate data in management research. Strategic Management Journal.
- Chae, Y. & Davidson, T. (2025). Large Language Models for text classification. Sociological Methods & Research.
- Do, S., Ollion, E. & Shen, R. (2024). The augmented social scientist. Sociological Methods & Research, 53(3).
- Dunivin, Z. O. (2024). Scalable qualitative coding with LLMs. arXiv:2401.15170.
- Montgomery, J. M. et al. (2024). Improving probabilistic models in text classification via active learning. APSR.
- Than, N. et al. (2025). Updating 'The Future of Coding'. Sociological Methods & Research.
- Ziems, C. et al. (2024). Can LLMs transform computational social science? Computational Linguistics, 50(1).
- Zhou, Y. et al. (2022). Large Language Models are human-level prompt engineers. ICLR 2023.
Citation
If you use SocialSciKit in your research, please cite:
@inproceedings{sun2026creation,
title = {Creation of the {Chinese} Adaptive Policy Communication Corpus},
author = {Sun, Bolun and Chang, Charles and Ang, Yuen Yuen and Mu, Ruotong and Xu, Yuchen and Zhang, Zhengxin},
booktitle = {Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2026)},
year = {2026}
}
Development
# Clone the repository
git clone https://github.com/Baron-Sun/socialscikit.git
cd socialscikit
# Install in editable mode with dev dependencies
pip install -e ".[dev]"
# Run the full test suite
pytest tests/ -v
# Code style check
ruff check .
Running the app in development mode
python -c "from socialscikit.ui.main_app import create_app; create_app().launch()"
License & Disclaimer
License: MIT
Disclaimer:
- De-identification module: Automatic PII detection is a preliminary processing tool. Manual review is mandatory before IRB submission. This tool does not guarantee complete removal of all identifying information.
- LLM classification / coding: Results should be treated as research assistance. Critical research conclusions require human validation.
- Budget recommendation: Based on statistical estimation. Actual requirements may vary depending on task complexity and data characteristics.
Author
Bolun Sun (孙伯伦)
Ph.D. Student, Kellogg School of Management, Northwestern University
Research interests: Computational Social Science, NLP, Human-Centered AI
Email: bolun.sun@kellogg.northwestern.edu | Web: baron-sun.github.io
Contributing
This project is actively maintained and updated. Contributions, suggestions, and feedback are very welcome! Feel free to open an issue or submit a pull request.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file
socialscikit-0.1.0.tar.gz.File metadata
File hashes
789686ebe47ab5a340e56249e73eec04d8849e00c4bfed13f31c92c6ae7c5bd2f763fc4c1495503daf43a5faedb69e95d173708b0d54ba0164503f32396b2f3bdfbbd117b5377a69204daac6e8b4126fSee more details on using hashes here.