Verify answer citations refer to supplied source ids and that cited sources actually support the claims. Python port of @mukundakatta/citation-integrity-check.
Project description
citation-integrity-check
Verify answer citations refer to supplied source ids and that cited sources actually support the claims. Zero runtime dependencies.
Python port of @mukundakatta/citation-integrity-check. The JS sibling has the original API; this README sticks to the Python surface.
Install
pip install citation-integrity-check
Usage
from citation_integrity_check import verify
sources = [
{"id": "1", "text": "Photosynthesis converts light into chemical energy in plants."},
{"id": "abc123", "text": "Chlorophyll absorbs red and blue wavelengths of light."},
]
answer = (
"Plants use photosynthesis to convert light into energy [1]. "
"Chlorophyll absorbs red and blue light [id:abc123]."
)
result = verify(answer, sources)
result.ok # True if no missing ids and no unsupported claims
result.missing # list[str] -- cited ids that don't exist in sources
result.unsupported # list[Claim] -- sentences with no valid supporting citation
result.coverage # float in [0, 1] -- fraction of sentences with a valid citation
Citation forms
Two markers are recognized inside the answer:
| Form | Resolves to |
|---|---|
[1] |
sources[0] (1-based index) and source.id == "1" |
[id:abc] |
source.id == "abc" |
Anything else inside brackets (like [Note]) is ignored, so stylistic prose doesn't count as a citation.
How "unsupported" is decided
A sentence is unsupported when any of these is true:
- It has no citation marker at all (
reason="no_citation"). - All cited ids are missing from
sources(reason="missing_source"). - The cited source's text doesn't share enough non-stopword tokens with the sentence (
reason="insufficient_overlap").
Token-overlap is |claim_tokens & source_tokens| / |claim_tokens|, with a small built-in stopword list. The threshold is tunable:
verify(answer, sources, support_threshold=0.5) # stricter
API differences from the JS sibling
- Returns a
CitationResultdataclass withunsupportedclaims (per-sentence) instead of the JSunusedids list. - Adds the
[id:foo]named-citation form alongside numeric[N]. - Adds the token-overlap
support_thresholdto verify the cited source actually mentions the claim.
See the JS sibling's README for the full design notes.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file citation_integrity_check-0.1.0.tar.gz.
File metadata
- Download URL: citation_integrity_check-0.1.0.tar.gz
- Upload date:
- Size: 7.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1ad29342327a1db1ad5596bfa8164e11eccedacef876e428693a17c51b4a186d
|
|
| MD5 |
994e73b3b9fe4d6f9d4a9e8bfafda365
|
|
| BLAKE2b-256 |
d32b01bd3925f79165b4076ca03a429838bf8d0c3b211d0019110d8e95b5ef11
|
File details
Details for the file citation_integrity_check-0.1.0-py3-none-any.whl.
File metadata
- Download URL: citation_integrity_check-0.1.0-py3-none-any.whl
- Upload date:
- Size: 7.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ee8184e312640fa644487de43d6c78e6b8b244b9e81fbccb2318271c6d37dec2
|
|
| MD5 |
2988eca09836c0e1606ceda9d6ba3b14
|
|
| BLAKE2b-256 |
4ae876cb2186ff8c3400470be02da7c528e11e58c57744168bb1c44748ff288e
|