Local CLI tax-prep briefing tool powered by xAI + RAG
Project description
taxgrok
█████████████ ██
███████████████████ ███ █████████ █████
██████░░░░░░░░░██████░ ████ ████████████ █████░
█████░░░░░░░░██████████░ █████████ █████████ █████ █████ ██████░░░░██░░ ████████ █████████ █████░██████
███░░░░░░░░██████░░░████░ █████████░██████████ ██████████░█████░░░██████░░████████████████████ ███████████░░
███░░░░░░█████░░░░░░████░░ ░████░░░░██████████░ ░██████░░░█████░░░██████░░████░░░░████░░░░████░████████░░░░░░
███░░░░█████░░░░░░░░████░░ ███████░███████████░░ ████████░░░███████░░████░░████░░░░██████░█████░██████████░░░░░
██████████░░░░░░░░░████░░░░ ██████████████████░██████░█████░░████████████░░████░░░░░██████████░░█████░██████░░░░
███████░░░░░░░░░░████░░░░░░ ░████░░░████░░███░████░░░░████░░░░░███████░░░░████░░░░░░░██████░░░░░███░░░░████░░░
██████████████░░░░ ░░░░░░░░░ ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ ░░░░░░░░░░░░░░░░░░░░ ░░░░░░░░░░░░░░░░░░░░░░░░
████░█████████░░ ░░░░░░░░ ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ ░░░░░░░░░░░░░░░░░░░ ░░░░░░░░░░░░░░░░░░░░░░░░
░░░░░░░░░░░░░░░ ░░░░░░ ░░░░░░░░░░░░░░░░░░░░░░░░░ ░░░░░ ░░░░░░░░░░░░ ░░░░░ ░░░░░░░░░░ ░░░░░ ░░░░░░
░░░░░░░░░░░░░░░ ░░░░ ░░░░ ░░░░ ░░░ ░░░░ ░░░░ ░░░░░░░ ░░░░ ░░░░░░ ░░░ ░░░░
░░░░░░░░░░░░░░░
░░░░ ░░░░░░░░░
taxgrok is a local Python CLI app for generating a tax-prep briefing from user documents using xAI + RAG.
Planned behavior:
- User runs
taxgrokfrom terminal. - Startup shows a black-themed Unicode logo + dashboard in TTY terminals (auto-fits terminal width).
- Menu lets user add one file or an entire folder.
- Accepted input types:
.txt,.md,.pdf,.png. - Before analysis, app asks taxpayer name + filing status (single/MFJ/MFS/HOH/QSS/not sure).
- App analyzes content and writes
TAXGROK-<username>.md. - Output provides practical filing guidance: what to file, checklist, common mistakes, and refund/payment expectation notes.
- One-run privacy default: uploaded remote files are deleted after report generation.
Product scope
This tool is for educational planning and organization, not legal/tax advice.
Primary goals:
- Fast local ingestion workflow for mixed document types.
- RAG-grounded report with citations and explicit unknowns.
- Up-to-date IRS grounding data used as baseline context.
- Packaged for PyPI with
pip install taxgrok. - Single-user local experience per install.
xAI API assumptions (verified Feb 10, 2026)
Current official xAI docs indicate:
- Base REST API:
https://api.x.ai - Preferred text generation API:
POST /v1/responses - Legacy chat API (still available):
POST /v1/chat/completions - Files API:
POST /v1/filesand related file routes (API key) - Files attached to chats automatically trigger document retrieval (
attachment_search) for RAG-style workflows. - Collections search API:
POST /v1/documents/search(API key for querying collection content). - Collections management API base:
https://management-api.x.ai(only needed if creating/managing collections programmatically).
Important auth detail:
- For this v1 design, only
XAI_API_KEYis required (Files + chat/reasoning flow). - A Management key is only needed if we later adopt Collections lifecycle operations.
Proposed architecture
- CLI Layer
taxgrokcommand with interactive menu.- Commands: add file, add folder, review queue, run analysis, exit.
- Ingestion Layer
- File validation and MIME detection.
.txt,.md,.pdfrouted to text extraction..pngrouted to image understanding pipeline, converted into structured text notes.
- Retrieval Layer (RAG)
- Upload accepted files for the current run using Files API.
- Attach uploaded files to model requests so xAI performs server-side document retrieval (
attachment_search). - Keep retrieval ephemeral: delete uploaded files after report output.
- Tax Reasoning Layer
- Prepend strict system prompt for tax assistant behavior.
- Use IRS baseline corpus plus user corpus.
- Force output schema (sections/checklists/warnings/citations).
- Output Layer
- Render
TAXGROK-<username>.md. - Include generation timestamp, data sources, confidence notes, and disclaimer.
IRS grounding plan
Use authoritative IRS pages/documents as curated source list with refresh metadata:
- Forms, Instructions and Publications (latest index)
- Publication 17 (current year)
- Form 1040 Instructions (current year)
- Inflation-adjusted tax items by tax year
- Relevant IRS news releases for threshold updates
The app will record the IRS source URL + reviewed date in report metadata.
Packaging and distribution
Target packaging:
pyproject.toml+setup.pysetuptools package.- Console script entrypoint:
taxgrok = taxgrok.cli:main
- Python 3.9+ baseline.
- Publishable to PyPI under package name
taxgrok(if available; otherwise reserve fallback). - Required env var:
XAI_API_KEY
Current status
Phases 1, 2, 3, and 4 are implemented:
- Installable local package with
taxgrokCLI entrypoint. - Interactive menu for add file, add folder, view queue, run analysis, and exit.
- Input filtering for
.txt,.md,.pdf,.png. - Config validation with clear errors for missing
XAI_API_KEY. - Local ingestion adapters for
.txt,.md,.pdf, and.png. .pngfiles are analyzed with xAI and normalized into markdown artifacts.- Artifacts are uploaded run-scoped via xAI Files API and attached for retrieval generation.
- Generation uses
POST /v1/responsesfirst, with fallback to chat completions for compatibility. - If all uploads fail, pipeline falls back to local-context mode (no remote file attachments).
- In local-context mode, extracted text (after local redaction when enabled) is sent as prompt content.
- If generation endpoints are denied (
403/1010) or return empty text, pipeline falls back to local heuristic structured guidance. - Strict JSON guidance schema is requested and rendered into final report sections.
- IRS source loader is integrated and writes reviewed-source metadata into report output.
- Remote uploaded files are deleted by default after generation.
- Report now includes federal filing checklist, what to file, reminders, mistakes, rough expectation, missing info, citations, and cleanup metadata.
- Optional local PII redaction pass before upload.
- PII-safe logging filter for runtime logs.
- Expanded unit/integration tests and CI workflow for lint/test/package checks.
Quickstart (local development)
- Create and activate a virtual environment.
- Install the project.
- Export
XAI_API_KEY. - Run
taxgrok.
python3 -m venv .venv
source .venv/bin/activate
pip install .
export XAI_API_KEY="your-xai-api-key"
taxgrok
Global command setup (run from anywhere)
If taxgrok is not found outside this repo, create a global launcher symlink:
ln -sf "$PWD/.venv/bin/taxgrok" "$HOME/.local/bin/taxgrok"
Then verify:
command -v taxgrok
taxgrok --help
If command -v taxgrok is empty, ensure ~/.local/bin is in your shell PATH.
For zsh, add this to ~/.zshrc if needed:
export PATH="$HOME/.local/bin:$PATH"
You can also put config in .env at repo/runtime directory:
cp .env.example .env
# then edit .env
Optional runtime env vars:
TAXGROK_MODEL(default:grok-4-fast)TAXGROK_TIMEOUT_SECONDS(default:90)TAXGROK_XAI_BASE_URL(default:https://api.x.ai)TAXGROK_KEEP_REMOTE_FILES=1to disable auto-delete during debuggingTAXGROK_REFRESH_IRS_SOURCES=1to run live IRS URL HEAD checks before generationTAXGROK_LOCAL_REDACTION=0to disable local PII redaction (enabled by default)TAXGROK_NO_STYLE=1to force plain menu mode (skip ASCII intro/dashboard)
Phase 3 notes:
- Startup includes a
taxgrokUnicode intro and a dashboard-style menu in TTY terminals. - Logo rendering is width-aware and auto-compacts for smaller terminal windows.
pypdfis included as a package dependency for local PDF text extraction.- If local PDF extraction quality is poor, the pipeline attempts an xAI OCR fallback before report generation.
- If OCR fallback still returns weak text, the original PDF is uploaded for retrieval as a final fallback.
- If structured JSON parsing fails, report generation falls back to raw model text and records a warning.
- If you see repeated
403+error code: 1010, tryTAXGROK_XAI_BASE_URL=https://us-east-1.api.x.aiand verify key permissions with xAI support. - If all xAI generation endpoints fail, report generation continues with a local heuristic fallback and explicit low-confidence warnings.
CLI debug/security options:
taxgrok --debug-keep-remote-filestaxgrok --refresh-irs-sourcestaxgrok --no-styletaxgrok --local-redactiontaxgrok --no-local-redaction
Interactive run behavior:
- Analysis start prompts for taxpayer name and filing status before uploading/processing.
- Report filename uses the entered name (
TAXGROK-<sanitized-name>.md) instead of OS username. - While analysis runs, CLI shows a processing indicator until report generation completes.
GitHub safety defaults
.envand.env.*are ignored; keep secrets in.envonly and never commit real keys.- Generated reports (
TAXGROK-*.md) are ignored by default. - Local tax document folders are ignored by default (
morales-taxes-2025/,user-docs/,reports/). - Keep only sanitized examples in the repo (
.env.exampleand synthetic test fixtures).
Document quality tips
- Prefer text-based PDFs over scanned image PDFs when possible.
- For scans/screenshots, use high resolution and clear contrast (avoid blur/shadows).
- Crop large screenshots to just the relevant form area before upload.
- If extraction warnings mention missing/unclear fields, re-export/re-scan and rerun analysis.
Quality and release
- CI workflow:
.github/workflows/ci.yml - Security notes:
SECURITY.md - Changelog:
CHANGELOG.md - Release and rollback checklist:
RELEASE.md
Locked v1 decisions
-
Single user profile per install
-
One-time use workflow
- Remote uploaded files are deleted after report generation.
- No persistent cloud index by default.
- Federal scope only
- IRS/federal guidance only in v1 (no state-specific coverage).
- PNG strategy
- PNG screenshots are analyzed and converted into text notes before reasoning.
- Estimate strictness
- Report provides rough expectation ranges and qualitative drivers only, with explicit disclaimer.
References used for planning
xAI docs:
- https://docs.x.ai/docs/overview
- https://docs.x.ai/docs/api-reference
- https://docs.x.ai/docs/guides/rag
- https://docs.x.ai/docs/guides/collections
- https://docs.x.ai/docs/guides/chat-with-files
- https://docs.x.ai/docs/guides/images
IRS sources:
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file taxgrok-0.1.2.tar.gz.
File metadata
- Download URL: taxgrok-0.1.2.tar.gz
- Upload date:
- Size: 44.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
13228ce30c17d79c0b7710a984ee9272390c3b78842486fd2aba57ca61a88246
|
|
| MD5 |
4aa8204d4adcd95268e20cdfc3fb31cb
|
|
| BLAKE2b-256 |
0e1b877994e50331a1a12066fc171916c33fd0e69f853cbc3aa4215e83fd63b7
|
Provenance
The following attestation bundles were made for taxgrok-0.1.2.tar.gz:
Publisher:
pypi-publish.yml on lalomorales22/taxgrok
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
taxgrok-0.1.2.tar.gz -
Subject digest:
13228ce30c17d79c0b7710a984ee9272390c3b78842486fd2aba57ca61a88246 - Sigstore transparency entry: 939439756
- Sigstore integration time:
-
Permalink:
lalomorales22/taxgrok@c3c9be3f2a2104af518efc11fbc4b1f2fe0cb848 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/lalomorales22
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi-publish.yml@c3c9be3f2a2104af518efc11fbc4b1f2fe0cb848 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file taxgrok-0.1.2-py3-none-any.whl.
File metadata
- Download URL: taxgrok-0.1.2-py3-none-any.whl
- Upload date:
- Size: 37.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
810b2e7c5cde879df5488adae83b5586c1d556462b5e60a6391cec7de76c0eba
|
|
| MD5 |
5d798193ae5d77d5bc8ec2e538b7c74f
|
|
| BLAKE2b-256 |
55579d0542cf71524266e74faf93174d128576a5a430f0500f3bfb703a0f5e1d
|
Provenance
The following attestation bundles were made for taxgrok-0.1.2-py3-none-any.whl:
Publisher:
pypi-publish.yml on lalomorales22/taxgrok
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
taxgrok-0.1.2-py3-none-any.whl -
Subject digest:
810b2e7c5cde879df5488adae83b5586c1d556462b5e60a6391cec7de76c0eba - Sigstore transparency entry: 939439764
- Sigstore integration time:
-
Permalink:
lalomorales22/taxgrok@c3c9be3f2a2104af518efc11fbc4b1f2fe0cb848 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/lalomorales22
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi-publish.yml@c3c9be3f2a2104af518efc11fbc4b1f2fe0cb848 -
Trigger Event:
workflow_dispatch
-
Statement type: