A library for constructing citable prompts and attributing contributions of context sources in RAG systems
Project description
Attribution
A Python library for attributing contributions of context sources in Retrieval-Augmented Generation (RAG) systems.
Installation
pip install attribution-lib
Features
- LLM-Based Citation: Explicit citations during LLM generation using DSPy
- Embedding-Based Citation: Post-processing attribution via semantic similarity
- Profit Share Calculation: Weighted citation aggregation for fair attribution
Quick Start
Embedding-Based Attribution
from attribution import run_embedding_attribution_pipeline
# Your LLM response (without citations)
response = "The capital of France is Paris. It is known for the Eiffel Tower."
# Context sources with their indices
context = {
1: "Paris is the capital and largest city of France.",
2: "The Eiffel Tower is a famous landmark in Paris.",
}
# Map node indices to server/source names
node_map = {
1: "wikipedia",
2: "travel_guide",
}
# Run the pipeline
result = run_embedding_attribution_pipeline(
response_text=response,
context=context,
node_map=node_map,
)
print(result["cited_response"]) # Response with <cite:X>...</cite> tags
print(result["stats"]) # Citation statistics per server
print(result["profit_share"]) # Normalized contribution scores
LLM-Based Attribution
from attribution import run_llm_attribution_pipeline
# Pre-generated response with citation tags
cited_response = "<cite:[1]>Paris is the capital of France.</cite> <cite:[2]>The Eiffel Tower is iconic.</cite>"
# Map node indices to server names
node_map = {
1: "wikipedia",
2: "travel_guide",
}
result = run_llm_attribution_pipeline(
generated_response=cited_response,
node_map=node_map,
)
print(result["profit_share"])
Constructing Citation Prompts
from attribution import construct_citation_prompt
prompt = construct_citation_prompt(
query="What is the capital of France?",
context={
1: "Paris is the capital of France.",
2: "France is a country in Europe.",
}
)
API Reference
Core Functions
aggregate_server_citations(llm_response, node_to_server_map): Parse citations and count per servercalculate_contribution(stats, k_multi=1.5, k_single=1.0): Calculate normalized profit shares
LLM-Based Citation
construct_citation_prompt(query, context): Build DSPy prompt for citation generationrun_llm_attribution_pipeline(generated_response, node_map, ...): Full LLM attribution pipeline
Embedding-Based Citation
auto_cite_response(response, context, server_map, ...): Add citations using embeddingsrun_embedding_attribution_pipeline(response_text, context, node_map, ...): Full embedding pipeline
Configuration
Embedding-Based Settings
DEFAULT_EMBEDDING_MODEL:'BAAI/bge-small-en-v1.5'DEFAULT_SIMILARITY_THRESHOLD:0.75DEFAULT_N_GRAM_SIZE:5
Profit Share Weights
k_multi: Weight for multi-source citations (default: 1.5)k_single: Weight for single-source citations (default: 1.0)
License
MIT License - see LICENSE for details.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
attribution_lib-0.2.0.tar.gz
(10.3 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file attribution_lib-0.2.0.tar.gz.
File metadata
- Download URL: attribution_lib-0.2.0.tar.gz
- Upload date:
- Size: 10.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
49f85046ca909ad0fb4192b5ade570638320121754f3d88b40cd1f482cc6b1ff
|
|
| MD5 |
865cb775f291be0b7fe1a4a9cd47adba
|
|
| BLAKE2b-256 |
e144deafa7ecf33821d6771b83c80e494ed22381940213e91abdc42e5690825e
|
File details
Details for the file attribution_lib-0.2.0-py3-none-any.whl.
File metadata
- Download URL: attribution_lib-0.2.0-py3-none-any.whl
- Upload date:
- Size: 10.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4c1a9061f8240dc333ac7a7f80aca18684d7d62d9f07f20905a35dc041766b81
|
|
| MD5 |
bd25bd894f39920cb098193bafa29763
|
|
| BLAKE2b-256 |
e31eebd724f6fe087de38cd43b120bd85617f8ec5ba4e12cc7dad61a82cb3d76
|