Skip to main content

Evidence-first discussion intelligence tooling.

Project description

ThreadSense

Discussion intelligence, not scraping. ThreadSense is a reproducible pipeline that turns public discussion threads into structured, evidence-backed product and research intelligence.

Why ThreadSense

Scrapers give you raw payloads. AI summarizers give you prose. Neither gives you a defensible basis for decisions.

ThreadSense keeps the evidence chain intact at every stage:

  1. Acquire — source connectors fetch threads from Reddit, Hacker News, and GitHub Discussions
  2. Normalize — source-specific payloads are mapped into a canonical thread model with provenance metadata
  3. Analyze — deterministic extraction identifies issues, feature requests, themes, and sentiment — each linked to the comment that produced it
  4. Synthesize — optional local-model inference adds summaries on top of the deterministic evidence layer
  5. Report — structured outputs in Markdown, HTML, or JSON, with full traceability from finding back to source comment

Every stage produces a persisted, inspectable artifact. Rerun any stage independently. Diff results across runs. Audit exactly where a finding came from.

What This Enables

Single-thread analysis

Feed a discussion URL. Get structured findings — issues, requests, themes, severity — with every claim linked to the comment that produced it.

uv run threadsense run reddit \
  "https://www.reddit.com/r/ClaudeCode/comments/1ro0qbl/anyone_actually_built_a_second_brain_that_isnt/" \
  --format markdown \
  --with-summary \
  --summary-required

Cross-thread research

Search a topic across multiple subreddits. ThreadSense deterministically selects, ranks, and analyzes matching threads, then synthesizes a corpus-level report.

uv run threadsense --output-format human research reddit \
  --query "second brain OR agentic PKM" \
  --subreddit ClaudeCode \
  --subreddit LocalLLaMA \
  --subreddit AI_Agents \
  --limit 5 \
  --per-subreddit-limit 3 \
  --with-summary

Domain-aware analysis

The analysis layer uses a contract system with domain-specific vocabularies (developer tools, product feedback, hiring, research, financial markets, gaming). Each domain defines its own theme keywords, issue markers, and severity calibration — so analysis adapts to context rather than applying one-size-fits-all heuristics.

Architecture

fetch → normalize → analyze → [optional inference] → report
         ↓              ↓              ↓                ↓
     raw artifact   canonical     analysis         report artifact
                    artifact      artifact
  • Deterministic core — parsing, normalization, scoring, and selection are reproducible across runs
  • Inference on top — LLM synthesis is optional and layered over deterministic evidence, never a substitute for it
  • Stable artifacts — each stage persists a separate JSON artifact with schema_version and SHA256 provenance
  • Fail fast — invalid URLs, malformed payloads, and schema inconsistencies surface immediately

Sources and Discovery

Capability Reddit Hacker News GitHub Discussions
Thread analysis yes yes yes
Topic research yes

Output Modes

Mode Purpose
json Machine-readable payloads for downstream tooling
human Rich terminal panels and summaries for operators
quiet Status-only output for scripts and CI
uv run threadsense --output-format human research reddit ...

See docs/output-modes.md for details.

Who This Is For

  • Product teams validating pain points and feature demand from community discussions
  • Founders doing market and competitor research across technical communities
  • DevRel teams tracking developer workflow friction and tooling sentiment
  • Researchers studying technical communities with reproducible methodology

Quickstart

# 1. Install
uv sync

# 2. Validate local setup
uv run threadsense preflight

# 3. Analyze a single thread
uv run threadsense run reddit \
  "https://www.reddit.com/r/ClaudeCode/comments/1ro0qbl/anyone_actually_built_a_second_brain_that_isnt/"

# 4. Research a topic across subreddits
uv run threadsense research reddit \
  --query "second brain OR agentic PKM" \
  --subreddit ClaudeCode \
  --subreddit LocalLLaMA \
  --subreddit AI_Agents

CLI Commands

Command Purpose
run End-to-end single-thread pipeline
research reddit Cross-subreddit topic research and corpus synthesis
fetch Acquire raw thread data
normalize Map raw data to canonical model
analyze Deterministic evidence extraction
infer LLM-assisted synthesis
report Generate output reports
corpus Build and analyze cross-thread corpora
inspect Examine persisted artifacts
batch run Process multiple threads
preflight Validate local environment
serve Local API server

Full command reference: docs/usage.md

Artifact Storage

Every pipeline run produces inspectable artifacts under .threadsense/:

.threadsense/
├── raw/<source>/          # Source payloads as fetched
├── normalized/<source>/   # Canonical thread model
├── analysis/<source>/     # Evidence-linked findings
├── reports/<source>/      # Rendered reports
├── corpora/<corpus-id>/   # Manifest, analysis, and report
└── batches/               # Batch run metadata

Details: docs/artifacts.md

Local Runtime

ThreadSense runs without a local model for deterministic analysis. Summaries and synthesis require a local OpenAI-compatible endpoint (default: http://127.0.0.1:8080/v1/chat/completions).

Details: docs/local-runtime-contract.md

Documentation

Document Content
usage.md Command reference
research-reddit.md Reddit topic research workflow
output-modes.md JSON, human, and quiet output modes
artifacts.md Artifact types and storage layout
overview.md Product and workflow overview
system-design.md Architecture and system boundaries
local-runtime-contract.md Local inference contract
pitch.md Product positioning

Validation

uv run ruff check
uv run ruff format --check .
uv run mypy --strict src tests
uv run pytest

Current Limits

  • Topic research is implemented for Reddit; other source discovery workflows are planned
  • Reddit research queries support OR/| clause unions only (intentionally narrow for deterministic alignment)
  • Corpus reports are Markdown only
  • The local API is a trusted local surface, not a hardened public service

Direction

  • Richer corpus presentation and operator workflows
  • Discovery workflows beyond Reddit
  • Evaluation and replay benchmarking
  • Source-distribution and research-quality reporting

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

threadsense-0.2.1.tar.gz (176.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

threadsense-0.2.1-py3-none-any.whl (126.9 kB view details)

Uploaded Python 3

File details

Details for the file threadsense-0.2.1.tar.gz.

File metadata

  • Download URL: threadsense-0.2.1.tar.gz
  • Upload date:
  • Size: 176.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.14

File hashes

Hashes for threadsense-0.2.1.tar.gz
Algorithm Hash digest
SHA256 14497e21437d233e5103928b853c620a207c6f35c64ce280669a946f7818dc4a
MD5 c4b28420a90b8320a7e808cc2e957803
BLAKE2b-256 b97c9bf434cfda4772f43b1fa7f85d6618ff2e6216f8688e469cb84b93573ae5

See more details on using hashes here.

File details

Details for the file threadsense-0.2.1-py3-none-any.whl.

File metadata

File hashes

Hashes for threadsense-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 cdf931ee4c5375f29f41a91b77ea762c62f435d9a38ad1ea2dafc4c5367181e8
MD5 a91a55ee7fad1944a9c78ffd00672b52
BLAKE2b-256 fe38d1bec73b9e5403d36c91b2cf9d5f989f4ef0b71d2faf71a6f600b000c931

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page