Local-first task classifier that infers your work type from computer activity signals

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

taskclf — Task Type Classifier from Local Activity Signals

Train and run a personal task-type classifier (e.g. coding / writing / meetings) using privacy-preserving computer activity signals such as foreground app/window metadata and aggregated input statistics (counts/rates only).

This project is intentionally scoped as a personalized classifier (single-user first). The architecture keeps:

Collectors (platform/tool dependent) isolated behind adapters
Features as a versioned, validated contract
Models as bundled artifacts with schema checks
Inference as a small, stable loop that emits task segments and daily summaries

Goals

Fast iteration: first useful model in < 1 week of data
Privacy: no raw keystrokes, no raw window titles persisted
Stability: feature schema versioning + schema hash gates
Extensibility: add new collectors and models without breaking consumers

Non-Goals

Universal (multi-user) generalization out of the box
Storing or analyzing raw typed content
"Perfect" labeling UI (start minimal, iterate later)

Labels (v1)

Eight core labels defined in schema/labels_v1.json:

ID	Label	Description
0	`Build`	Writing or implementing structured content in editor/terminal
1	`Debug`	Investigating issues, terminal-heavy troubleshooting
2	`Review`	Reviewing technical material or diffs with light edits
3	`Write`	Writing structured non-code content
4	`ReadResearch`	Consuming information with minimal production
5	`Communicate`	Asynchronous coordination (chat/email)
6	`Meet`	Synchronous meetings or calls
7	`BreakIdle`	Idle or break period

Labels are stored as time spans (not per-keystroke events). Users can remap core labels to personal categories via a taxonomy config (see configs/user_taxonomy_example.yaml).

Data Flow Overview

Structures (pipelines)

ETL pipeline reads raw → produces features parquet
Training pipeline reads features + labels → produces model
Inference pipeline reads new events → emits predictions + segments

Batch (repeatable)

Ingest: pull ActivityWatch export → data/raw/aw/
Feature build: events → per-minute features → data/processed/features_v1/
Label import: label spans → data/processed/labels_v1/
Build dataset: join features + labels, split by time → training arrays
Train: fit model → models/<run_id>/
Evaluate: metrics, acceptance checks, calibration
Report: daily summaries → artifacts/

Online (real-time)

Every N seconds:

read the last minute(s) of events
compute the latest feature bucket
predict + smooth (with optional calibration and taxonomy mapping)
append predictions → artifacts/

At end-of-day:

produce report

Privacy & Safety

This repo enforces the following:

No raw keystrokes are stored (only aggregate counts/rates).
No raw window titles are stored by default.
- Titles are hashed or locally tokenized; you can keep a local mapping if you choose.
Dataset artifacts stay local-first.

Quick Start

Requirements

Python >= 3.14
uv installed

Install

uv tool install taskclf

For development (from source):

uv sync
uv run taskclf --help

Ingest (ActivityWatch)

uv run taskclf ingest aw --input /path/to/activitywatch-export.json

This parses an ActivityWatch JSON export, normalizes app names to reverse-domain identifiers, hashes window titles (never storing raw text), and writes privacy-safe events to data/raw/aw/<YYYY-MM-DD>/events.parquet partitioned by date.

Options:

--out-dir — output directory (default: data/raw/aw)
--title-salt — salt for hashing window titles (default: taskclf-default-salt)

Build features

uv run taskclf features build --date 2026-02-16

Import labels

uv run taskclf labels import --file labels.csv

Or add individual label blocks:

uv run taskclf labels add-block \
  --start 2026-02-16T09:00:00 --end 2026-02-16T10:00:00 --label Build

Or label what you're doing right now (no timestamps needed):

uv run taskclf labels label-now --minutes 10 --label Build

This queries ActivityWatch for a live summary of apps used in the last N minutes and creates the label span automatically.

Train

uv run taskclf train lgbm --from 2026-02-01 --to 2026-02-16

Run batch inference

uv run taskclf infer batch --model-dir models/<run_id> --from 2026-02-01 --to 2026-02-16

Run online inference

uv run taskclf infer online --model-dir models/<run_id>

Starts a polling loop that queries a running ActivityWatch server, builds feature rows from live window events, predicts task types using a trained model, smooths predictions, and writes running outputs to artifacts/. Press Ctrl+C to stop; a final daily report is generated on shutdown.

Options:

--poll-seconds — seconds between polls (default: 60)
--aw-host — ActivityWatch server URL (default: http://localhost:5600)
--smooth-window — rolling majority window size (default: 3)
--title-salt — salt for hashing window titles (default: taskclf-default-salt)
--out-dir — output directory (default: artifacts)
--label-queue / --no-label-queue — auto-enqueue low-confidence predictions for manual labeling
--label-confidence — confidence threshold for auto-enqueue (default: 0.55)

Run baseline (no model needed)

uv run taskclf infer baseline --from 2026-02-01 --to 2026-02-16

Rule-based classifier useful for day-1 bootstrapping before you have a trained model.

Produce report

uv run taskclf report daily --segments-file artifacts/segments.json

CLI Reference

All commands: uv run taskclf --help

Group	Commands	Purpose
`ingest`	`aw`	Import ActivityWatch exports
`features`	`build`	Build per-minute feature rows
`labels`	`import`, `add-block`, `label-now`, `show-queue`, `project`	Manage label spans and labeling queue
`train`	`build-dataset`, `lgbm`, `evaluate`, `tune-reject`, `calibrate`, `retrain`, `check-retrain`	Training, evaluation, and retraining pipeline
`taxonomy`	`validate`, `show`, `init`	User-defined label groupings
`infer`	`batch`, `online`, `baseline`, `compare`	Prediction (ML, rule-based, comparison)
`report`	`daily`	Daily summaries (JSON/CSV/Parquet)
`monitor`	`drift-check`, `telemetry`, `show`	Feature drift and telemetry tracking
(top-level)	`tray`	System tray labeling app with activity transition detection
(top-level)	`ui`	Web UI for labeling, queue, and live prediction streaming

Full CLI docs: docs/api/cli/main.md

Repo Layout

src/taskclf/ — application code (adapters, core, features, labels, train, infer, report, ui)
schema/ — versioned JSON schemas for features and labels
configs/ — configuration files (model params, retrain policy, taxonomy examples)
docs/ — API reference and guides (served via make docs-serve)
data/ — raw and processed datasets (local, gitignored)
models/ — trained model bundles (one folder per run)
artifacts/ — predictions, segments, reports, evaluation outputs
tests/ — test suite

Model Artifact Contract

Every saved model bundle (models/<run_id>/) contains:

the model file
metadata.json: feature schema version + hash, label set, training date range, params, dataset hash
metrics.json: macro/weighted F1, per-class metrics
confusion_matrix.csv
categorical encoders (if applicable)

Inference refuses to run if the schema hash mismatches the model bundle.

Development

Common tasks are in the Makefile:

make lint        # ruff check .
make test        # pytest
make typecheck   # mypy src
make docs-serve  # local preview at http://127.0.0.1:8000
make docs-build  # static site in site/

License

TBD (local-first personal project by default).

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

fruitiecutiepie

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.4.15

Apr 29, 2026

0.4.14

Apr 28, 2026

0.4.13

Apr 28, 2026

0.4.12

Apr 28, 2026

0.4.11

Apr 13, 2026

0.4.10

Apr 12, 2026

0.4.9

Apr 6, 2026

0.4.8

Apr 5, 2026

0.4.7

Apr 4, 2026

0.4.6

Apr 4, 2026

0.4.5

Apr 4, 2026

0.4.4

Mar 31, 2026

0.4.3

Mar 31, 2026

0.4.2

Mar 30, 2026

0.4.1

Mar 30, 2026

0.4.0

Mar 30, 2026

0.3.12

Mar 28, 2026

0.3.11

Mar 23, 2026

0.3.10

Mar 19, 2026

0.3.9

Mar 19, 2026

0.3.8

Mar 15, 2026

0.3.7

Mar 12, 2026

This version

0.3.6

Mar 12, 2026

0.3.5

Mar 7, 2026

0.3.4

Mar 6, 2026

0.3.3

Mar 6, 2026

0.3.2

Mar 6, 2026

0.3.1

Mar 5, 2026

0.3.0

Mar 3, 2026

0.2.3

Mar 3, 2026

0.2.2

Mar 3, 2026

0.2.1

Mar 2, 2026

0.2.0

Feb 28, 2026

0.1.0

Feb 27, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

taskclf-0.3.6.tar.gz (35.0 MB view details)

Uploaded Mar 12, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

taskclf-0.3.6-py3-none-any.whl (36.8 MB view details)

Uploaded Mar 12, 2026 Python 3

File details

Details for the file taskclf-0.3.6.tar.gz.

File metadata

Download URL: taskclf-0.3.6.tar.gz
Upload date: Mar 12, 2026
Size: 35.0 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for taskclf-0.3.6.tar.gz
Algorithm	Hash digest
SHA256	`e8d6471a6ba29bc90fd05e7c4989c90ce4bda4f2666c36ac6d159b00180d5a3f`
MD5	`9c49e18df0b9064128581e49a1728df6`
BLAKE2b-256	`0fd15d6ee839ec44c1ec13402c2c5c690c5a9cc236e9002f1201f3a7bef5f32c`

See more details on using hashes here.

Provenance

The following attestation bundles were made for taskclf-0.3.6.tar.gz:

Publisher: publish.yml on fruitiecutiepie/taskclf

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: taskclf-0.3.6.tar.gz
- Subject digest: e8d6471a6ba29bc90fd05e7c4989c90ce4bda4f2666c36ac6d159b00180d5a3f
- Sigstore transparency entry: 1090162940
- Sigstore integration time: Mar 12, 2026
Source repository:
- Permalink: fruitiecutiepie/taskclf@1025ef3d473627b22b8317cc85b6bd8adf0f5961
- Branch / Tag: refs/tags/v0.3.6
- Owner: https://github.com/fruitiecutiepie
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@1025ef3d473627b22b8317cc85b6bd8adf0f5961
- Trigger Event: push

File details

Details for the file taskclf-0.3.6-py3-none-any.whl.

File metadata

Download URL: taskclf-0.3.6-py3-none-any.whl
Upload date: Mar 12, 2026
Size: 36.8 MB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for taskclf-0.3.6-py3-none-any.whl
Algorithm	Hash digest
SHA256	`cf76891cc75da0254a5bd9f032ee9a3ebb6a3c7c8324d6cb4be75c186452284f`
MD5	`ea74ac8200a4e63b9a929b3d370c7e2d`
BLAKE2b-256	`99d8a78bb312c4b5d096a0bc2a59abf93169f14921d13b4da3a40a9b9f98eed9`

See more details on using hashes here.

Provenance

The following attestation bundles were made for taskclf-0.3.6-py3-none-any.whl:

Publisher: publish.yml on fruitiecutiepie/taskclf

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: taskclf-0.3.6-py3-none-any.whl
- Subject digest: cf76891cc75da0254a5bd9f032ee9a3ebb6a3c7c8324d6cb4be75c186452284f
- Sigstore transparency entry: 1090162976
- Sigstore integration time: Mar 12, 2026
Source repository:
- Permalink: fruitiecutiepie/taskclf@1025ef3d473627b22b8317cc85b6bd8adf0f5961
- Branch / Tag: refs/tags/v0.3.6
- Owner: https://github.com/fruitiecutiepie
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@1025ef3d473627b22b8317cc85b6bd8adf0f5961
- Trigger Event: push

taskclf 0.3.6

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

taskclf — Task Type Classifier from Local Activity Signals

Goals

Non-Goals

Labels (v1)

Data Flow Overview

Structures (pipelines)

Batch (repeatable)

Online (real-time)

Privacy & Safety

Quick Start

Requirements

Install

Ingest (ActivityWatch)

Build features

Import labels

Train

Run batch inference

Run online inference

Run baseline (no model needed)

Produce report

CLI Reference

Repo Layout

Model Artifact Contract

Development

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance