Verify that your sources still say what you think they say
Project description
apysource
AIs hallucinate citations. Link rot silently breaks the real ones. Silent edits change what your sources actually say.
apysource is an automated verifier: define what text you expect at which URL, and it fetches, caches, and checks that it still matches. Use it as a CI gate, a research notebook guard, or a self-correction layer for AI-generated content — the tool can verify its own output.
Install
pip install apysource
Requires Python 3.12+.
Quick start
1. Define your sources
Create sources.yaml:
sources:
- label: "UN Charter"
url: "https://www.un.org/en/about-us/un-charter/full-text"
type: text/html
fragments:
- label: "Preamble"
section: "Preamble"
snippet: "to save succeeding generations from the scourge of war"
- label: "Article 2 principles"
section: "Article 2, paragraph 1"
snippet: "The Organization and its Members, in pursuit of the Purposes stated in Article 1, shall act in accordance with the following Principles"
2. Check
apysource check sources.yaml
apysource fetches the page (caching it on disk), finds the section by name, and checks that your snippet appears in the result. Cached pages aren't re-fetched on subsequent runs.
======================================================================
apysource Verification Report
======================================================================
[PASS] Fragments: cache resolution.................. 2/2
[PASS] Fragments: content extraction................ 2/2
[PASS] Fragments: snippet verified.................. 2/2
======================================================================
Summary: 3 PASS, 0 FAIL, 0 WARN
EXIT CODE: 0 (all checks passed)
======================================================================
3. Discover
Use locate to find how apysource would target a snippet, then add to save it:
# Find where a snippet lives in a page
apysource locate "https://www.un.org/en/about-us/un-charter/full-text" \
"to save succeeding generations from the scourge of war"
# Add it directly to your sources file
apysource add sources.yaml "https://www.un.org/en/about-us/un-charter/full-text" \
"to save succeeding generations from the scourge of war" \
--label "Preamble"
locate outputs a YAML fragment you can paste directly. add writes it to the file for you. Use locate --ttl for Turtle output with full Web Annotation alignment.
Targeting content
apysource supports several ways to pinpoint where in a document your snippet lives:
| Targetter | Key | Example | Best for |
|---|---|---|---|
| Section | section |
"Chapter I, Article 1" |
Structured documents (HTML, Markdown, Wikitext, RFC) |
| CSS selector | selector |
"div.content p" |
HTML pages |
| Line range | lines |
"40-41" |
Plain text, RFCs |
| Repo location | location |
"chapter:1" |
Repository modules (Gutenberg, Wikisource, etc.) |
Section selectors are the most versatile — they work across HTML, Markdown, Wikitext, and RFC plain text. They support roman numeral equivalence (Chapter IV = Chapter 4), nested paths (Chapter I, Article 1, paragraph 2), and quoted titles ('The Fox and the Grapes').
CSS selectors target HTML elements directly. Useful when section headings aren't available or you need a specific element.
Line ranges extract by line number (1-based, inclusive). Useful for plain text and RFCs.
If no targetter is given, apysource checks the full page text for your snippet.
YAML schema
Each YAML file has a top-level sources list. Each source has nested fragments.
Source properties
| Key | What it does |
|---|---|
label |
Name of the source (required) |
url |
URL to fetch (required) |
type |
IANA media type: text/html, text/plain, text/markdown, etc. Short names (html, plain-text) also accepted. Auto-detected if omitted. |
language |
Language code, RFC 5646 (metadata) |
title |
Document title (metadata) |
date |
Publication or access date (metadata) |
part_of |
Parent source label (for hierarchical sources) |
isbn |
International Standard Book Number |
doi |
Digital Object Identifier |
publisher |
Publisher name |
edition |
Edition or version |
license |
License URI |
Fragment properties
| Key | What it does |
|---|---|
label |
Name of the fragment (required) |
snippet |
The text you expect to find |
selector |
CSS selector to narrow extraction (HTML) |
lines |
Line range to extract, e.g. 30-35 |
section |
Human-readable section selector, e.g. Chapter I, Article 1 |
location |
Repo-specific location hint (e.g. chapter:1) |
page_start |
Starting page number (for print sources) |
page_end |
Ending page number (for print sources) |
CLI
apysource [-c config.toml] <command> [args...]
| Command | What it does |
|---|---|
check [sources.yaml] [--provenance file.ttl] |
Fetch, extract, and verify all snippets |
locate <url> <snippet> |
Find a snippet in a page, show the targetter |
add <file> <url> <snippet> |
Locate a snippet and add it to a YAML file |
validate |
Check that .ttl files parse correctly (with optional SHACL) |
Without -c, apysource uses built-in defaults (all built-in repos enabled). Pass -c config.toml to customize repos and HTTP settings (requires pip install apysource[dev]).
Pass --provenance file.ttl to check to write a PROV-O graph recording which fragments were verified, when, and by which activity.
Advanced Features
For RDF support, Python API, custom source repositories and more, see docs/advanced.md.
License
ISC
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file apysource-0.3.1.tar.gz.
File metadata
- Download URL: apysource-0.3.1.tar.gz
- Upload date:
- Size: 56.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b78ec54241e0bdebeb7bda9890f1fd5868a908e25053472f6c29b0d95707080f
|
|
| MD5 |
65a37a45bdce3dfc076d61e0c2da2ecc
|
|
| BLAKE2b-256 |
63a2de5ed084856b3476d832c490e4db7cd2a10726288b89370a5b5741919dbe
|
File details
Details for the file apysource-0.3.1-py3-none-any.whl.
File metadata
- Download URL: apysource-0.3.1-py3-none-any.whl
- Upload date:
- Size: 48.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
914818cf89822f0081550b705a8aa93f29c52527c18e7a1246444bdfb8c438a0
|
|
| MD5 |
6afc5f672cea826cb5463a1d57dba650
|
|
| BLAKE2b-256 |
b7f1b5e586189f1fdb918cfefaf864e798a86ab79ebaa048cb00b827417fa7d9
|