Comparative attribution analysis tool for take-home hiring submissions on GitHub.
Project description
byline
byline is a comparative attribution toolkit for hiring reviewers. It compares the writing surface of a take-home submission (its READMEs, comments, and shell scripts) against the same candidate's prior public GitHub writing, and reports where the two diverge. Output is framed as a set of stylistic signals for human review, not as a judgement about who wrote the code.
- Source: https://github.com/rupivbluegreen/byline
- PyPI: https://pypi.org/project/byline-audit/
- Author / maintainer: rupivbluegreen
- License: Apache 2.0
What this is, what this isn't
byline measures eight stylistic signals on a candidate's submission (em-dash density, emoji-in-headers ratio, sentence length, lexical diversity, typo rate, vocabulary sophistication, banner-comment density, and progress-UX scaffolding), compares each against the candidate's own GitHub baseline, and surfaces the deltas. It also runs a catalogue of phrasing and structural fingerprints against the prose, plus disproportion checks on doc-to-code ratio, diagram inventory, numbered-diagram patterns, and comment density. The intent is to give a reviewer one extra structured data point when deciding whether a submission warrants a follow-up conversation.
byline does not classify writing as machine-authored or human-authored. Nothing in the output asserts authorship, and the framing is comparative attribution analysis rather than detection. The tool is built around the assumption that humans and machines both write prose, and the only signal worth surfacing is whether the submission diverges sharply from the candidate's own observable writing history. Treat every report as one input among many, alongside the interview, code review, and reference checks.
Install
macOS users: see
docs/install-macos.mdfor a step-by-step setup.
byline ships in two modes. The base install is deterministic and has no LLM dependency; the [llm] extra adds the Anthropic and OpenAI SDKs and unlocks the subcommands that call an LLM.
The distribution is published on PyPI as byline-audit (the byline name on PyPI was already taken by an unrelated abandoned project); the import name and CLI entry point are still byline.
# Base install (deterministic features only)
pip install byline-audit
# Full install (includes LLM-powered subcommands)
pip install 'byline-audit[llm]'
The [llm] extra installs both the Anthropic and OpenAI SDKs. byline defaults to Claude but can be pointed at OpenAI's API or any OpenAI-compatible self-hosted endpoint (Ollama, vLLM, LM Studio, llama.cpp) via the BYLINE_LLM_PROVIDER / OPENAI_BASE_URL env vars. See docs/llm-providers.md for setup snippets.
The questions and chat subcommands require both the [llm] extra and a working API key for the configured provider. The align subcommand runs deterministically by default; the optional semantic pass is enabled via flag and also needs the extra and a provider key. Every other command (scan, baseline, audit, and the deterministic align) works on the base install.
Quickstart
# Audit (now includes history forensics, alignment, voice, boilerplate by default)
byline audit ./candidate-submission --candidate candidate-username
# Standalone alignment check
byline align ./candidate-submission
# Standalone alignment with semantic mode (requires LLM)
byline align ./candidate-submission --with-llm
# Generate interview questions (requires LLM)
byline questions ./candidate-submission --candidate candidate-username -n 8
# Open an interactive chat session over the audit (requires LLM)
byline chat ./candidate-submission --candidate candidate-username
# Just the target, without a baseline
byline scan ./candidate-submission
# Build the baseline alone
byline baseline candidate-username
A byline scan run produces a short Markdown summary. The shape (illustrative, abbreviated):
## Overall signal
**Overall signal: mixed.** Some metrics diverge from baseline while others
align; treat as a soft signal worth a closer look.
## Fingerprint findings
### README.md
- phrase / <pattern>: "<excerpt around the matched phrase>"
- structure / emoji_header_cluster: 6 emoji-prefixed headers
- phrase / <pattern>: "<excerpt around the matched phrase>"
Run byline --help (or byline <command> --help) for the full flag list.
What's new in v0.2
- Commit history forensics: timeline burst detection, first-commit paste detection, commit-message style profiling against the README, and author identity drift across the commit log.
- Documentation-implementation alignment: a deterministic pass cross-checks README-documented CLI flags, env vars, commands, and dependencies against the code; an optional semantic pass under
[llm]adds LLM-driven gap detection. - Within-repo self-baseline: compares the stylistic profile of commit messages, the README, and code comments, and reports the within-repo divergence as
consistent,notable, orsignificant. - Voice and AI-use disclosure: first-person voice density in the README is reported as a positive presence signal, and explicit AI-use disclosure is surfaced as a positive trust signal that shifts the overall label toward
aligned. - Boilerplate meta-file density: measures how completely a canonical meta-file slate is populated, with a severity bump for small repos where a full set is more notable.
- New CLI subcommands:
byline align(deterministic by default),byline questions(interview-question generator, LLM required), andbyline chat(interactive REPL over the audit, LLM required). - New opt-out flag on
byline auditto skip the commit-forensics pass when the input is a directory of files rather than a real git repo.
Limitations
This report presents stylistic signals comparing a candidate's submission to their own observable writing baseline. It is one input into a hiring decision, never a determination of authorship, and must not be treated as evidence of misconduct. False positives are possible — non-native English writers, proofread submissions, tutorial-derived code, and team-authored repos can all produce divergent signals.
Gameability. A candidate who knows byline is part of the process can adjust their style: rewrite the README in their own voice, strip emoji headers, drop the banner-comment scaffolding from shell scripts. The tool is most useful when the comparison is run silently, and when the reviewer treats a clean report as no signal rather than positive evidence.
False positives. Several plausible candidate profiles can produce divergent signals without any underlying authorship problem. Non-native English writers may show shifted typo rates and sentence-length distributions versus a baseline collected from native-language prose. Candidates who proofread their submission heavily can look stylistically different from their casual GitHub commits. Tutorial-derived code carries the tutorial author's voice. Team-authored repositories blend multiple writers. Candidates with sparse public GitHub history have small baselines, and small baselines produce noisy deltas; the report flags this case.
Scope. byline is English-only and GitHub-only. It reads prose and shell scripts; it does not analyse the code itself for stylistic signals, and it does not attempt to identify which model (if any) generated a passage. Those questions are out of scope.
History forensics requires a real git repo. The commit-history pass (history.timeline, history.messages, history.identity, history.file_evolutions) reads from .git. A directory of files that was extracted from a zip or copied without its git history will produce a zeroed HistoryFindings result. When you want history signals, clone the submission with its full git history rather than unpacking a snapshot. The audit command also exposes a flag to skip this pass entirely when you know the input has no usable history.
Ethical use
Disclose to candidates that this analysis is part of your review process before they submit. Use the report as one input alongside the interview, code walkthrough, and reference checks, never as the sole basis for a decision. Do not share the report with the candidate as an accusation; if a follow-up conversation is warranted, ask open questions about how the submission was put together and let the candidate explain. Reviewer judgement matters more than the tool's output. When in doubt, weight the human signal.
How it works
The audit pipeline computes a StyleProfile for the candidate's baseline corpus and a matching profile for the submission, then emits a per-metric ComparativeDelta with a severity bucket (aligned, notable, significant, extreme). A separate pass scans Markdown and shell files for catalogued phrasing and structural patterns; a third pass measures doc-to-code, diagram inventory, and comment-density ratios. The three streams combine into a single overall signal label. See docs/methodology.md for the full breakdown of metrics, fingerprint patterns, disproportion heuristics, and severity scoring.
Contributing
Run tests with pytest. Lint with ruff check .. PRs are welcome; please run the full test suite before submitting, and keep new prose in the docs framed comparatively (signals, divergence, indicators) rather than as detection language.
License
Apache 2.0. See LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file byline_audit-0.3.1.tar.gz.
File metadata
- Download URL: byline_audit-0.3.1.tar.gz
- Upload date:
- Size: 162.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e286e606129139435fc62ff3d9a701a5cf0ac01d76c521144587701a09624e09
|
|
| MD5 |
d473e89834eb6ab778180d556dedf1b8
|
|
| BLAKE2b-256 |
4386efef0f4bf3a57b0905d24b5bf7f05130c28d0d53da5252d429cee261f778
|
Provenance
The following attestation bundles were made for byline_audit-0.3.1.tar.gz:
Publisher:
release.yml on rupivbluegreen/byline
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
byline_audit-0.3.1.tar.gz -
Subject digest:
e286e606129139435fc62ff3d9a701a5cf0ac01d76c521144587701a09624e09 - Sigstore transparency entry: 1643787957
- Sigstore integration time:
-
Permalink:
rupivbluegreen/byline@58fafd3b3d87a3d3316821eb5813aa1eb1181d4c -
Branch / Tag:
refs/tags/v0.3.1 - Owner: https://github.com/rupivbluegreen
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@58fafd3b3d87a3d3316821eb5813aa1eb1181d4c -
Trigger Event:
push
-
Statement type:
File details
Details for the file byline_audit-0.3.1-py3-none-any.whl.
File metadata
- Download URL: byline_audit-0.3.1-py3-none-any.whl
- Upload date:
- Size: 125.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
059965ed8769ee9dbf8b15b12908489ef1b4218f3135fcb06fad3d8cd2594d4f
|
|
| MD5 |
743902ca28e836afe9a07a0d48786827
|
|
| BLAKE2b-256 |
e39f0b21686a0100b38aff341ab68029939061154894bfca3e40c5f1cacb459c
|
Provenance
The following attestation bundles were made for byline_audit-0.3.1-py3-none-any.whl:
Publisher:
release.yml on rupivbluegreen/byline
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
byline_audit-0.3.1-py3-none-any.whl -
Subject digest:
059965ed8769ee9dbf8b15b12908489ef1b4218f3135fcb06fad3d8cd2594d4f - Sigstore transparency entry: 1643788003
- Sigstore integration time:
-
Permalink:
rupivbluegreen/byline@58fafd3b3d87a3d3316821eb5813aa1eb1181d4c -
Branch / Tag:
refs/tags/v0.3.1 - Owner: https://github.com/rupivbluegreen
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@58fafd3b3d87a3d3316821eb5813aa1eb1181d4c -
Trigger Event:
push
-
Statement type: