Skip to main content

Verify that your sources still say what you think they say

Project description

apysource

Tests Python 3.12+ License: ISC

AIs hallucinate citations. Link rot silently breaks the real ones. Silent edits change what your sources actually say.

apysource is an automated verifier: define what text you expect at which URL, and it fetches, caches, and checks that it still matches. Use it as a CI gate, a research notebook guard, or a self-correction layer for AI-generated content — the tool can verify its own output.

Install

pip install apysource

Requires Python 3.12+.

Quick start

1. Define your sources

Create sources.yaml:

sources:
  - label: "UN Charter"
    url: "https://www.un.org/en/about-us/un-charter/full-text"
    type: text/html
    fragments:
      - label: "Preamble"
        section: "Preamble"
        snippet: "to save succeeding generations from the scourge of war"
      - label: "Article 2 principles"
        section: "Article 2, paragraph 1"
        snippet: "The Organization and its Members, in pursuit of the Purposes stated in Article 1, shall act in accordance with the following Principles"

2. Check

apysource check sources.yaml

apysource fetches the page (caching it on disk), finds the section by name, and checks that your snippet appears in the result. Cached pages aren't re-fetched on subsequent runs.

======================================================================
  apysource Verification Report
======================================================================

  [PASS] Fragments: cache resolution.................. 2/2
  [PASS] Fragments: content extraction................ 2/2
  [PASS] Fragments: snippet verified.................. 2/2

  ======================================================================
  Summary: 3 PASS, 0 FAIL, 0 WARN
  EXIT CODE: 0 (all checks passed)
  ======================================================================

3. Discover

Use locate to find how apysource would target a snippet, then add to save it:

# Find where a snippet lives in a page
apysource locate "https://www.un.org/en/about-us/un-charter/full-text" \
  "to save succeeding generations from the scourge of war"

# Add it directly to your sources file
apysource add sources.yaml "https://www.un.org/en/about-us/un-charter/full-text" \
  "to save succeeding generations from the scourge of war" \
  --label "Preamble"

locate outputs a YAML fragment you can paste directly. add writes it to the file for you. Use locate --ttl for Turtle output with full Web Annotation alignment.

Targeting content

apysource supports several ways to pinpoint where in a document your snippet lives:

Targetter Key Example Best for
Section section "Chapter I, Article 1" Structured documents (HTML, Markdown, Wikitext, RFC)
CSS selector selector "div.content p" HTML pages
Line range lines "40-41" Plain text, RFCs
Repo location location "chapter:1" Repository modules (Gutenberg, Wikisource, etc.)

Section selectors are the most versatile — they work across HTML, Markdown, Wikitext, and RFC plain text. They support roman numeral equivalence (Chapter IV = Chapter 4), nested paths (Chapter I, Article 1, paragraph 2), and quoted titles ('The Fox and the Grapes').

CSS selectors target HTML elements directly. Useful when section headings aren't available or you need a specific element.

Line ranges extract by line number (1-based, inclusive). Useful for plain text and RFCs.

If no targetter is given, apysource checks the full page text for your snippet.

YAML schema

Each YAML file has a top-level sources list. Each source has nested fragments.

Source properties

Key What it does
label Name of the source (required)
url URL to fetch (required)
type IANA media type: text/html, text/plain, text/markdown, etc. Short names (html, plain-text) also accepted. Auto-detected if omitted.
language Language code, RFC 5646 (metadata)
title Document title (metadata)
date Publication or access date (metadata)
part_of Parent source label (for hierarchical sources)
isbn International Standard Book Number
doi Digital Object Identifier
publisher Publisher name
edition Edition or version
license License URI

Fragment properties

Key What it does
label Name of the fragment (required)
snippet The text you expect to find
selector CSS selector to narrow extraction (HTML)
lines Line range to extract, e.g. 30-35
section Human-readable section selector, e.g. Chapter I, Article 1
location Repo-specific location hint (e.g. chapter:1)
page_start Starting page number (for print sources)
page_end Ending page number (for print sources)

CLI

apysource [-c config.toml] <command> [args...]
Command What it does
check [sources.yaml] [--provenance file.ttl] Fetch, extract, and verify all snippets
locate <url> <snippet> Find a snippet in a page, show the targetter
add <file> <url> <snippet> Locate a snippet and add it to a YAML file
validate Check that .ttl files parse correctly (with optional SHACL)

Without -c, apysource uses built-in defaults (all built-in repos enabled). Pass -c config.toml to customize repos and HTTP settings (requires pip install apysource[dev]).

Pass --provenance file.ttl to check to write a PROV-O graph recording which fragments were verified, when, and by which activity.

Advanced Features

For RDF support, Python API, custom source repositories and more, see docs/advanced.md.

License

ISC

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

apysource-0.3.1.tar.gz (56.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

apysource-0.3.1-py3-none-any.whl (48.4 kB view details)

Uploaded Python 3

File details

Details for the file apysource-0.3.1.tar.gz.

File metadata

  • Download URL: apysource-0.3.1.tar.gz
  • Upload date:
  • Size: 56.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for apysource-0.3.1.tar.gz
Algorithm Hash digest
SHA256 b78ec54241e0bdebeb7bda9890f1fd5868a908e25053472f6c29b0d95707080f
MD5 65a37a45bdce3dfc076d61e0c2da2ecc
BLAKE2b-256 63a2de5ed084856b3476d832c490e4db7cd2a10726288b89370a5b5741919dbe

See more details on using hashes here.

File details

Details for the file apysource-0.3.1-py3-none-any.whl.

File metadata

  • Download URL: apysource-0.3.1-py3-none-any.whl
  • Upload date:
  • Size: 48.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for apysource-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 914818cf89822f0081550b705a8aa93f29c52527c18e7a1246444bdfb8c438a0
MD5 6afc5f672cea826cb5463a1d57dba650
BLAKE2b-256 b7f1b5e586189f1fdb918cfefaf864e798a86ab79ebaa048cb00b827417fa7d9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page