Local-first task classifier that infers your work type from computer activity signals
Project description
taskclf — Task Type Classifier from Local Activity Signals
Train and run a personal task-type classifier (e.g. coding / writing / meetings) using privacy-preserving computer activity signals such as foreground app/window metadata and aggregated input statistics (counts/rates only).
This project is intentionally scoped as a personalized classifier (single-user first). The architecture keeps:
- Collectors (platform/tool dependent) isolated behind adapters
- Features as a versioned, validated contract
- Models as bundled artifacts with schema checks
- Inference as a small, stable loop that emits task segments and daily summaries
Goals
- Fast iteration: first useful model in < 1 week of data
- Privacy: no raw keystrokes, no raw window titles persisted
- Stability: feature schema versioning + schema hash gates
- Extensibility: add new collectors and models without breaking consumers
Non-Goals
- Universal (multi-user) generalization out of the box
- Storing or analyzing raw typed content
- "Perfect" labeling UI (start minimal, iterate later)
Labels (v1)
Eight core labels defined in schema/labels_v1.json:
| ID | Label | Description |
|---|---|---|
| 0 | Build |
Writing or implementing structured content in editor/terminal |
| 1 | Debug |
Investigating issues, terminal-heavy troubleshooting |
| 2 | Review |
Reviewing technical material or diffs with light edits |
| 3 | Write |
Writing structured non-code content |
| 4 | ReadResearch |
Consuming information with minimal production |
| 5 | Communicate |
Asynchronous coordination (chat/email) |
| 6 | Meet |
Synchronous meetings or calls |
| 7 | BreakIdle |
Idle or break period |
Labels are stored as time spans (not per-keystroke events). Users can remap
core labels to personal categories via a taxonomy config
(see configs/user_taxonomy_example.yaml).
Data Flow Overview
Structures (pipelines)
- ETL pipeline reads raw → produces features parquet
- Training pipeline reads features + labels → produces model
- Inference pipeline reads new events → emits predictions + segments
Batch (repeatable)
- Ingest: pull ActivityWatch export →
data/raw/aw/ - Feature build: events → per-minute features →
data/processed/features_v1/ - Label import: label spans →
data/processed/labels_v1/ - Build dataset: join features + labels, split by time → training arrays
- Train: fit model →
models/<run_id>/ - Evaluate: metrics, acceptance checks, calibration
- Report: daily summaries →
artifacts/
Online (real-time)
Every N seconds:
- read the last minute(s) of events
- compute the latest feature bucket
- predict + smooth (with optional calibration and taxonomy mapping)
- append predictions →
artifacts/
At end-of-day:
- produce report
Privacy & Safety
This repo enforces the following:
- No raw keystrokes are stored (only aggregate counts/rates).
- No raw window titles are stored by default.
- Titles are hashed or locally tokenized; you can keep a local mapping if you choose.
- Dataset artifacts stay local-first.
Quick Start
Requirements
- Python >= 3.14
- For the recommended CLI install: uv
Install
Command-line (PyPI) — install the taskclf CLI:
uv tool install taskclf
Or with pip only:
pip install taskclf
Then run taskclf --help.
Desktop app (optional) — a small Electron shell is built for Windows, Linux, and macOS. Download the latest launcher installers from GitHub Releases. Choose the file for your OS:
| OS | File |
|---|---|
| Windows | *.exe (NSIS installer) |
| Linux | *.AppImage |
| macOS | *.dmg (open and drag taskclf to Applications) |
Those assets are published on GitHub releases whose tag starts with
launcher-v. The PyInstaller backend that the shell downloads at runtime is
published on separate v* tags (see make build-payload and payload release CI).
Development (from a git checkout):
uv sync
uv run taskclf --help
Ingest (ActivityWatch)
uv run taskclf ingest aw --input /path/to/activitywatch-export.json
This parses an ActivityWatch JSON export, normalizes app names to reverse-domain
identifiers, hashes window titles (never storing raw text), and writes
privacy-safe events to data/raw/aw/<YYYY-MM-DD>/events.parquet partitioned by
date.
Options:
--out-dir— output directory (default:data/raw/aw)--title-salt— salt for hashing window titles (default:taskclf-default-salt)
Build features
uv run taskclf features build --date 2026-02-16
Import labels
uv run taskclf labels import --file labels.csv
Or add individual label blocks:
uv run taskclf labels add-block \
--start 2026-02-16T09:00:00 --end 2026-02-16T10:00:00 --label Build
Or label what you're doing right now (no timestamps needed):
uv run taskclf labels label-now --minutes 10 --label Build
This queries ActivityWatch for a live summary of apps used in the last N minutes and creates the label span automatically.
Export labels to CSV
uv run taskclf labels export --out my_labels.csv
Train
uv run taskclf train lgbm --from 2026-02-01 --to 2026-02-16
Run batch inference
uv run taskclf infer batch --model-dir models/<run_id> --from 2026-02-01 --to 2026-02-16
Run online inference
uv run taskclf infer online --model-dir models/<run_id>
Starts a polling loop that queries a running ActivityWatch server, builds
feature rows from live window events, predicts task types using a trained model,
smooths predictions, and writes running outputs to artifacts/. Press Ctrl+C
to stop; a final daily report is generated on shutdown.
Options:
--poll-seconds— seconds between polls (default: 60)--aw-host— ActivityWatch server URL (default:http://localhost:5600)--smooth-window— rolling majority window size (default: 3)--title-salt— salt for hashing window titles (default:taskclf-default-salt)--out-dir— output directory (default:artifacts)--label-queue/--no-label-queue— auto-enqueue low-confidence predictions for manual labeling--label-confidence— confidence threshold for auto-enqueue (default: 0.55)
Run baseline (no model needed)
uv run taskclf infer baseline --from 2026-02-01 --to 2026-02-16
Rule-based classifier useful for day-1 bootstrapping before you have a trained model.
Produce report
uv run taskclf report daily --segments-file artifacts/segments.json
CLI Reference
All commands: uv run taskclf --help
| Group | Commands | Purpose |
|---|---|---|
ingest |
aw |
Import ActivityWatch exports |
features |
build |
Build per-minute feature rows |
labels |
import, add-block, label-now, show-queue, project |
Manage label spans and labeling queue |
train |
build-dataset, lgbm, evaluate, tune-reject, calibrate, retrain, check-retrain |
Training, evaluation, and retraining pipeline |
taxonomy |
validate, show, init |
User-defined label groupings |
infer |
batch, online, baseline, compare |
Prediction (ML, rule-based, comparison) |
report |
daily |
Daily summaries (JSON/CSV/Parquet) |
monitor |
drift-check, telemetry, show |
Feature drift and telemetry tracking |
| (top-level) | tray |
System tray labeling app with activity transition detection |
| (top-level) | ui |
Web UI for labeling, queue, and live prediction streaming |
Full CLI docs: docs/api/cli/main.md
Repo Layout
src/taskclf/— application code (adapters, core, features, labels, train, infer, report, ui)schema/— versioned JSON schemas for features and labelsconfigs/— configuration files (model params, retrain policy, taxonomy examples)docs/— API reference and guides (served viamake docs-serve)data/— raw and processed datasets (local, gitignored)models/— trained model bundles (one folder per run)artifacts/— predictions, segments, reports, evaluation outputstests/— test suite
Model Artifact Contract
Every saved model bundle (models/<run_id>/) contains:
- the model file
metadata.json: feature schema version + hash, label set, training date range, params, dataset hashmetrics.json: macro/weighted F1, per-class metricsconfusion_matrix.csv- categorical encoders (if applicable)
Inference refuses to run if the schema hash mismatches the model bundle.
Development
Common tasks are in the Makefile:
make lint # ruff check .
make test # pytest
make typecheck # mypy src
make docs-serve # local preview at http://127.0.0.1:8000
make docs-build # static site in site/
Electron backend payload (for packaged app downloads): after make ui-build,
build the PyInstaller one-folder sidecar zip used by GitHub releases (v* tags):
uv sync --group bundle
make build-payload # writes build/payload-<triple>.zip
See docs/api/scripts/payload_build.md for details.
License
TBD (local-first personal project by default).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file taskclf-0.4.15.tar.gz.
File metadata
- Download URL: taskclf-0.4.15.tar.gz
- Upload date:
- Size: 10.5 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8e2c34ee7d09d95c972aff11f285d47ae821300a228667fd2ca9ff47cd866d53
|
|
| MD5 |
d7fdf948cd3b149b6ef2fcfe3ffcfbd6
|
|
| BLAKE2b-256 |
34e7e24b7361420167ed35d47c44254151f732b0af85e6c73873e9f27749e136
|
Provenance
The following attestation bundles were made for taskclf-0.4.15.tar.gz:
Publisher:
publish.yml on fruitiecutiepie/taskclf
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
taskclf-0.4.15.tar.gz -
Subject digest:
8e2c34ee7d09d95c972aff11f285d47ae821300a228667fd2ca9ff47cd866d53 - Sigstore transparency entry: 1399336418
- Sigstore integration time:
-
Permalink:
fruitiecutiepie/taskclf@464dbcacdbec58707c9687bd689b3b9216fa3d33 -
Branch / Tag:
refs/tags/v0.4.15 - Owner: https://github.com/fruitiecutiepie
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@464dbcacdbec58707c9687bd689b3b9216fa3d33 -
Trigger Event:
push
-
Statement type:
File details
Details for the file taskclf-0.4.15-py3-none-any.whl.
File metadata
- Download URL: taskclf-0.4.15-py3-none-any.whl
- Upload date:
- Size: 10.6 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b7521d00509ebb9da81183963fd9af2cf26abbbb0c1221265c22595e67672755
|
|
| MD5 |
376ff2e4276091192b1cab893c1390be
|
|
| BLAKE2b-256 |
97b8305bca71dc07715a05ef114ca479ff1b3134f5154e846abdc029c5e5abe4
|
Provenance
The following attestation bundles were made for taskclf-0.4.15-py3-none-any.whl:
Publisher:
publish.yml on fruitiecutiepie/taskclf
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
taskclf-0.4.15-py3-none-any.whl -
Subject digest:
b7521d00509ebb9da81183963fd9af2cf26abbbb0c1221265c22595e67672755 - Sigstore transparency entry: 1399336423
- Sigstore integration time:
-
Permalink:
fruitiecutiepie/taskclf@464dbcacdbec58707c9687bd689b3b9216fa3d33 -
Branch / Tag:
refs/tags/v0.4.15 - Owner: https://github.com/fruitiecutiepie
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@464dbcacdbec58707c9687bd689b3b9216fa3d33 -
Trigger Event:
push
-
Statement type: