AI response detection toolkit for crowdsourced behavioral research
Project description
slopit
A modular Python toolkit for detecting AI-assisted responses in crowdsourced behavioral research.
Key Features
- Behavioral Analysis: Analyze keystroke dynamics, focus patterns, and paste events
- Multiple Analyzers: Combine evidence from independent analyzers for robust detection
- Real-time Dashboard: Web-based monitoring with JATOS and Prolific integration
- Pipeline Orchestration: Configure aggregation strategies and confidence thresholds
- Multiple Data Formats: Load data from JATOS, native JSON, and other sources
- CLI and API: Command-line tools and Python API for flexible integration
Installation
Basic Installation
pip install slopit
Or with uv:
uv add slopit
With Dashboard (Recommended)
pip install slopit[dashboard]
This adds the web dashboard with:
- FastAPI-based REST API
- WebSocket real-time updates
- JATOS data synchronization
- Prolific submission management
- React-based UI
Other Optional Dependencies
# LLM-based analysis (requires PyTorch)
pip install slopit[llm]
# Documentation building
pip install slopit[docs]
# All optional dependencies
pip install slopit[dashboard,llm,docs]
Development Installation
git clone https://github.com/slopit/slopit.git
cd slopit/python
pip install -e ".[dev]"
Requirements
- Python 3.13+
- NumPy 2.0+
- pandas 2.2+
- scikit-learn 1.5+
- Pydantic 2.10+
Quick Start
Loading and Analyzing Sessions
from slopit import load_session, load_sessions
from slopit.pipeline import AnalysisPipeline
from slopit.behavioral import (
KeystrokeAnalyzer,
FocusAnalyzer,
PasteAnalyzer,
TimingAnalyzer,
)
# load a single session
session = load_session("data/participant_001.json")
# or load all sessions from a directory
sessions = load_sessions("data/")
# create an analysis pipeline with multiple analyzers
pipeline = AnalysisPipeline([
KeystrokeAnalyzer(),
FocusAnalyzer(),
PasteAnalyzer(),
TimingAnalyzer(),
])
# run analysis
result = pipeline.analyze(sessions)
# check verdicts for each session
for session_id, verdict in result.verdicts.items():
print(f"{session_id}: {verdict.status} (confidence: {verdict.confidence:.2f})")
if verdict.flags:
for flag in verdict.flags:
print(f" - [{flag.analyzer}] {flag.type}: {flag.message}")
Using the CLI
Analyze sessions from the command line:
# analyze sessions and print report
slopit analyze data/
# analyze with specific analyzers
slopit analyze data/ --analyzers keystroke,focus
# export results to JSON
slopit analyze data/ --output results.json
# export results to CSV
slopit analyze data/ --csv results.csv
# print summary table only
slopit analyze data/ --summary
Validate session files:
slopit validate data/session.json
slopit validate data/
Generate reports from previous analysis:
slopit report results.json
slopit report results.json --format csv --output report.csv
Start the dashboard server:
# default settings (localhost:8000)
slopit dashboard
# custom host and port
slopit dashboard --host 0.0.0.0 --port 8080
# development mode with auto-reload
slopit dashboard --reload
Loading Data
Native JSON Format
Load sessions in the native slopit format:
from slopit import load_session, load_sessions
# single file
session = load_session("data/session.json")
# directory of files
sessions = load_sessions("data/")
# with glob pattern
sessions = load_sessions("data/", pattern="*.json")
JATOS Export Format
Load data exported from JATOS:
from slopit.io import JATOSLoader
loader = JATOSLoader()
# load a single result file
session = loader.load("study_result_123.txt")
# load all results from a directory
for session in loader.load_many("jatos_results/"):
print(f"Loaded session {session.session_id}")
Direct JATOS API Access
Connect to a JATOS server to retrieve results:
from slopit.dashboard.integrations import JATOSClient
async with JATOSClient("https://jatos.example.com", "api-token") as client:
# list available studies
studies = await client.list_studies()
# get all sessions for a study
sessions = await client.get_sessions("study-123")
# or stream sessions one at a time
async for session in client.stream_results("study-123"):
print(f"Session: {session.session_id}")
Analysis Pipeline
Creating a Pipeline
from slopit.pipeline import AnalysisPipeline, PipelineConfig
from slopit.behavioral import KeystrokeAnalyzer, FocusAnalyzer
# basic pipeline
pipeline = AnalysisPipeline([
KeystrokeAnalyzer(),
FocusAnalyzer(),
])
# pipeline with custom configuration
config = PipelineConfig(
aggregation="weighted", # "any", "majority", or "weighted"
severity_threshold="low", # minimum severity to include
confidence_threshold=0.5, # minimum confidence to include
)
pipeline = AnalysisPipeline([KeystrokeAnalyzer()], config)
Configuring Analyzers
Each analyzer accepts a configuration object:
from slopit.behavioral import (
KeystrokeAnalyzer,
KeystrokeAnalyzerConfig,
FocusAnalyzer,
FocusAnalyzerConfig,
PasteAnalyzer,
PasteAnalyzerConfig,
TimingAnalyzer,
TimingAnalyzerConfig,
)
# keystroke analyzer configuration
keystroke_config = KeystrokeAnalyzerConfig(
pause_threshold_ms=2000.0, # minimum IKI to count as pause
burst_threshold_ms=500.0, # maximum IKI within typing burst
min_keystrokes=20, # minimum keystrokes for analysis
min_iki_std_threshold=100.0, # threshold for low variance flag
max_ppr_threshold=0.95, # threshold for minimal revision flag
)
keystroke_analyzer = KeystrokeAnalyzer(keystroke_config)
# focus analyzer configuration
focus_config = FocusAnalyzerConfig(
max_blur_count=5, # blur events before flagging
max_hidden_duration_ms=30000.0, # hidden duration before flagging
blur_paste_window_ms=5000.0, # window for blur-paste detection
)
focus_analyzer = FocusAnalyzer(focus_config)
# paste analyzer configuration
paste_config = PasteAnalyzerConfig(
large_paste_threshold=50, # characters for large paste flag
suspicious_preceding_keystrokes=5, # max keystrokes before paste
)
paste_analyzer = PasteAnalyzer(paste_config)
# timing analyzer configuration
timing_config = TimingAnalyzerConfig(
min_rt_per_char_ms=20.0, # minimum ms per character
max_rt_cv_threshold=0.1, # CV threshold for consistent timing
instant_response_threshold_ms=2000.0,
instant_response_min_chars=100,
)
timing_analyzer = TimingAnalyzer(timing_config)
Processing Results
result = pipeline.analyze(sessions)
# access per-session verdicts
for session_id, verdict in result.verdicts.items():
print(f"Session: {session_id}")
print(f" Status: {verdict.status}")
print(f" Confidence: {verdict.confidence:.2f}")
print(f" Summary: {verdict.summary}")
# access aggregated flags per session
for session_id, flags in result.aggregated_flags.items():
for flag in flags:
print(f"{session_id}: {flag.type} ({flag.severity})")
# access raw analyzer results
for analyzer_name, results in result.analyzer_results.items():
for analysis_result in results:
print(f"{analyzer_name}: {len(analysis_result.flags)} flags")
Generating Reports
from slopit.pipeline import TextReporter, CSVExporter
# text report
reporter = TextReporter()
report_text = reporter.generate(result)
print(report_text)
# summary table (using Rich)
reporter.print_summary(result)
# CSV export
exporter = CSVExporter()
exporter.export(result, "results.csv")
# export individual flags
exporter.export_flags(result, "flags.csv")
Dashboard Server
The dashboard provides a web interface for real-time analysis and monitoring.
Starting the Server
CLI (recommended):
# default settings (localhost:8000)
slopit dashboard
# custom host and port
slopit dashboard --host 0.0.0.0 --port 8080
# development mode with auto-reload
slopit dashboard --reload
# custom data directory
slopit dashboard --data-dir ./sessions
Python API:
from slopit.dashboard import DashboardConfig
from slopit.dashboard.app import create_app
import uvicorn
config = DashboardConfig(
host="127.0.0.1",
port=8000,
data_dir="./data",
jatos_url="https://jatos.example.com",
jatos_token="your-api-token",
prolific_token="your-prolific-token",
)
app = create_app(config)
uvicorn.run(app, host=config.host, port=config.port)
Dashboard Features
The dashboard provides:
- Session Management: Upload, browse, and inspect session data
- Real-time Analysis: Automatic background processing with live updates
- JATOS Sync: Import data directly from JATOS experiment servers
- Prolific Integration: Approve, reject, or return submissions based on analysis
- Export: Download sessions, trials, and verdicts as CSV
REST API Endpoints
| Method | Endpoint | Description |
|---|---|---|
| GET | /api/v1/sessions/ |
List sessions with pagination |
| GET | /api/v1/sessions/{id} |
Get session details |
| GET | /api/v1/trials/ |
List trials with pagination |
| GET | /api/v1/trials/{session_id}/{index} |
Get trial details |
| GET | /api/v1/analysis/summary |
Get analysis statistics |
| GET | /api/v1/analysis/verdicts |
List verdicts with pagination |
| POST | /api/v1/analysis/batch |
Start batch analysis |
| GET | /api/v1/export/sessions/csv |
Export sessions as CSV |
| GET | /api/v1/export/verdicts/csv |
Export verdicts as CSV |
| POST | /api/v1/jatos/sync |
Sync from JATOS study |
| POST | /api/v1/prolific/submissions/batch-transition |
Batch approve/reject |
WebSocket Events
Connect to /ws for real-time updates:
import asyncio
import json
import websockets
async def listen_for_updates():
async with websockets.connect("ws://localhost:8000/ws") as ws:
async for message in ws:
event = json.loads(message)
if event["type"] == "session.new":
print(f"New session: {event['data']['session_id']}")
elif event["type"] == "verdict.computed":
print(f"Verdict ready: {event['data']['session_id']}")
print(f" Status: {event['data']['status']}")
elif event["type"] == "sync.progress":
print(f"Sync: {event['data']['progress']}/{event['data']['total']}")
asyncio.run(listen_for_updates())
Services
StorageService provides file-based persistence:
from pathlib import Path
from slopit.dashboard.services import StorageService
storage = StorageService(Path("./data"))
# save and retrieve sessions
storage.save_session(session)
retrieved = storage.get_session(session.session_id)
# list with pagination
sessions, total = storage.list_sessions(page=1, page_size=20, has_verdict=True)
# save and retrieve verdicts
storage.save_verdict("session-123", {"status": "flagged", "confidence": 0.85})
verdict = storage.get_verdict("session-123")
AnalysisService provides background processing:
from slopit.dashboard.services import AnalysisService
service = AnalysisService()
# register callback for completed analysis
def on_verdict(session_id: str, verdict: dict) -> None:
storage.save_verdict(session_id, verdict)
service.on_complete(on_verdict)
# start worker and enqueue sessions
await service.start()
await service.enqueue_session(session)
# or analyze synchronously
verdict = await service.analyze_session(session)
Prolific Integration
Manage participant submissions based on analysis results:
from slopit.dashboard.integrations import ProlificClient, ParticipantAction
async with ProlificClient("your-api-token") as client:
# list studies
studies = await client.list_studies()
# get submissions awaiting review
from slopit.dashboard.integrations import SubmissionStatus
submissions = await client.get_submissions(
"study-123",
status=SubmissionStatus.AWAITING_REVIEW,
)
# approve a submission
await client.approve_submission("submission-456")
# reject with category
await client.reject_submission(
"submission-789",
rejection_category="NO_DATA",
message="Data validation failed",
)
# return submission to pool
await client.return_submission("submission-012")
# batch operations
await client.batch_approve("study-123", ["sub-1", "sub-2", "sub-3"])
Schemas
Session Data
from slopit.schemas import SlopitSession, SlopitTrial
# validate session data
session = SlopitSession.model_validate(raw_dict)
# access session properties
print(f"Session ID: {session.session_id}")
print(f"Participant: {session.participant_id}")
print(f"Platform: {session.platform.name}")
print(f"Trials: {len(session.trials)}")
# iterate trials
for trial in session.trials:
print(f"Trial {trial.trial_index}: {trial.trial_type}")
if trial.behavioral:
keystrokes = trial.behavioral.keystrokes or []
print(f" Keystrokes: {len(keystrokes)}")
Behavioral Data
from slopit.schemas import (
BehavioralData,
KeystrokeEvent,
FocusEvent,
PasteEvent,
)
# access behavioral data from a trial
behavioral: BehavioralData = trial.behavioral
# keystroke events
for event in behavioral.keystrokes or []:
print(f"{event.time}ms: {event.key} ({event.event})")
# focus events
for event in behavioral.focus or []:
print(f"{event.time}ms: {event.event}")
# paste events
for event in behavioral.paste or []:
print(f"{event.time}ms: pasted {event.text_length} chars")
Analysis Results
from slopit.schemas.analysis import (
AnalysisResult,
PipelineResult,
SessionVerdict,
)
from slopit.schemas.flags import AnalysisFlag, Severity
# verdicts are "clean", "suspicious", or "flagged"
verdict: SessionVerdict = result.verdicts["session-123"]
print(f"Status: {verdict.status}")
# flags have type, analyzer, severity, message, and confidence
for flag in verdict.flags:
flag_info: AnalysisFlag = flag
print(f"{flag_info.analyzer}: {flag_info.type}")
print(f" Severity: {flag_info.severity}")
print(f" Confidence: {flag_info.confidence}")
print(f" Message: {flag_info.message}")
Examples
Batch Processing
from pathlib import Path
from slopit import load_sessions
from slopit.pipeline import AnalysisPipeline, CSVExporter
from slopit.behavioral import (
KeystrokeAnalyzer,
FocusAnalyzer,
PasteAnalyzer,
TimingAnalyzer,
)
def process_study(data_dir: Path, output_dir: Path) -> None:
"""Process all sessions in a study directory."""
sessions = load_sessions(data_dir)
print(f"Loaded {len(sessions)} sessions")
pipeline = AnalysisPipeline([
KeystrokeAnalyzer(),
FocusAnalyzer(),
PasteAnalyzer(),
TimingAnalyzer(),
])
result = pipeline.analyze(sessions)
# count by status
counts = {"clean": 0, "suspicious": 0, "flagged": 0}
for verdict in result.verdicts.values():
counts[verdict.status] += 1
print(f"Results: {counts}")
# export to CSV
exporter = CSVExporter()
exporter.export(result, output_dir / "verdicts.csv")
exporter.export_flags(result, output_dir / "flags.csv")
# usage
process_study(Path("data/study1"), Path("output"))
Custom Aggregation
from slopit.pipeline import AnalysisPipeline, PipelineConfig
# flag if ANY analyzer detects an issue (most sensitive)
config_any = PipelineConfig(aggregation="any")
# flag if MAJORITY of analyzers detect issues
config_majority = PipelineConfig(aggregation="majority")
# use confidence-weighted voting (recommended)
config_weighted = PipelineConfig(
aggregation="weighted",
confidence_threshold=0.6, # require 60% confidence
severity_threshold="medium", # ignore "info" and "low" flags
)
Real-time Analysis with Dashboard
from pathlib import Path
from slopit.dashboard import DashboardConfig
from slopit.dashboard.app import create_app
from slopit.dashboard.services import AnalysisService, StorageService
# create services
storage = StorageService(Path("./data"))
analysis = AnalysisService()
# register callback for completed analysis
def on_verdict(session_id: str, verdict: dict) -> None:
print(f"Analysis complete for {session_id}: {verdict['status']}")
storage.save_verdict(session_id, verdict)
analysis.on_complete(on_verdict)
# start the service
import asyncio
asyncio.run(analysis.start())
API Reference
For complete API documentation, see:
- Schemas: Data models
- IO Loaders: Data loading
- Analyzers: Behavioral analyzers
- Pipeline: Analysis orchestration
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file slopit-0.1.0.tar.gz.
File metadata
- Download URL: slopit-0.1.0.tar.gz
- Upload date:
- Size: 312.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
19a3c64ecab9939015c66b4e74a23f2d0a1523d295238da18378ac6b3d3014f7
|
|
| MD5 |
90172b87faceaaa7ad8448b62ea12f4f
|
|
| BLAKE2b-256 |
37912295bcaf0496d485621d45daf2952581fe306dd1265e768d397208b92889
|
Provenance
The following attestation bundles were made for slopit-0.1.0.tar.gz:
Publisher:
publish-pypi.yml on slopit/slopit
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
slopit-0.1.0.tar.gz -
Subject digest:
19a3c64ecab9939015c66b4e74a23f2d0a1523d295238da18378ac6b3d3014f7 - Sigstore transparency entry: 850023646
- Sigstore integration time:
-
Permalink:
slopit/slopit@0ea8d0cae56cea32e40c8a8db97e5eab88739fa3 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/slopit
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@0ea8d0cae56cea32e40c8a8db97e5eab88739fa3 -
Trigger Event:
release
-
Statement type:
File details
Details for the file slopit-0.1.0-py3-none-any.whl.
File metadata
- Download URL: slopit-0.1.0-py3-none-any.whl
- Upload date:
- Size: 159.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4bf5b52aca99d2cc23fd964571b71895dabcaaa8e87f0e318bb10a5318e6beef
|
|
| MD5 |
8b6a95dcdb0f3a171a4851dd2122c69e
|
|
| BLAKE2b-256 |
4cdc5ab197b3550f8957cc53f79d874a963a375ccb60bddc0253063a58a81863
|
Provenance
The following attestation bundles were made for slopit-0.1.0-py3-none-any.whl:
Publisher:
publish-pypi.yml on slopit/slopit
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
slopit-0.1.0-py3-none-any.whl -
Subject digest:
4bf5b52aca99d2cc23fd964571b71895dabcaaa8e87f0e318bb10a5318e6beef - Sigstore transparency entry: 850023648
- Sigstore integration time:
-
Permalink:
slopit/slopit@0ea8d0cae56cea32e40c8a8db97e5eab88739fa3 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/slopit
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@0ea8d0cae56cea32e40c8a8db97e5eab88739fa3 -
Trigger Event:
release
-
Statement type: