A library for constructing citable prompts and attributing contributions of context sources in RAG systems
Project description
Attribution
A Python library for attributing contributions of context sources in Retrieval-Augmented Generation (RAG) systems.
Installation
pip install attribution
Features
- LLM-Based Citation: Explicit citations during LLM generation using DSPy
- Embedding-Based Citation: Post-processing attribution via semantic similarity
- Profit Share Calculation: Weighted citation aggregation for fair attribution
Quick Start
Embedding-Based Attribution
from attribution import run_embedding_attribution_pipeline
# Your LLM response (without citations)
response = "The capital of France is Paris. It is known for the Eiffel Tower."
# Context sources with their indices
context = {
1: "Paris is the capital and largest city of France.",
2: "The Eiffel Tower is a famous landmark in Paris.",
}
# Map node indices to server/source names
node_map = {
1: "wikipedia",
2: "travel_guide",
}
# Run the pipeline
result = run_embedding_attribution_pipeline(
response_text=response,
context=context,
node_map=node_map,
)
print(result["cited_response"]) # Response with <cite:X>...</cite> tags
print(result["stats"]) # Citation statistics per server
print(result["profit_share"]) # Normalized contribution scores
LLM-Based Attribution
from attribution import run_llm_attribution_pipeline
# Pre-generated response with citation tags
cited_response = "<cite:[1]>Paris is the capital of France.</cite> <cite:[2]>The Eiffel Tower is iconic.</cite>"
# Map node indices to server names
node_map = {
1: "wikipedia",
2: "travel_guide",
}
result = run_llm_attribution_pipeline(
generated_response=cited_response,
node_map=node_map,
)
print(result["profit_share"])
Constructing Citation Prompts
from attribution import construct_citation_prompt
prompt = construct_citation_prompt(
query="What is the capital of France?",
context={
1: "Paris is the capital of France.",
2: "France is a country in Europe.",
}
)
API Reference
Core Functions
aggregate_server_citations(llm_response, node_to_server_map): Parse citations and count per servercalculate_contribution(stats, k_multi=1.5, k_single=1.0): Calculate normalized profit shares
LLM-Based Citation
construct_citation_prompt(query, context): Build DSPy prompt for citation generationrun_llm_attribution_pipeline(generated_response, node_map, ...): Full LLM attribution pipeline
Embedding-Based Citation
auto_cite_response(response, context, server_map, ...): Add citations using embeddingsrun_embedding_attribution_pipeline(response_text, context, node_map, ...): Full embedding pipeline
Configuration
Embedding-Based Settings
DEFAULT_EMBEDDING_MODEL:'BAAI/bge-small-en-v1.5'DEFAULT_SIMILARITY_THRESHOLD:0.75DEFAULT_N_GRAM_SIZE:5
Profit Share Weights
k_multi: Weight for multi-source citations (default: 1.5)k_single: Weight for single-source citations (default: 1.0)
License
MIT License - see LICENSE for details.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
attribution_lib-0.1.0.tar.gz
(10.3 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file attribution_lib-0.1.0.tar.gz.
File metadata
- Download URL: attribution_lib-0.1.0.tar.gz
- Upload date:
- Size: 10.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
60bd6934d9e0ab4ced3cdc92fbdd8e54ec7c3a8e398ba146d1da524a7cbb362e
|
|
| MD5 |
a7a3bbb27787d2df046a35eea13b0194
|
|
| BLAKE2b-256 |
f2b9d725113f7bd3ff7bef7e4538cb8856a5cc2decae0556750ce85c3072e34a
|
File details
Details for the file attribution_lib-0.1.0-py3-none-any.whl.
File metadata
- Download URL: attribution_lib-0.1.0-py3-none-any.whl
- Upload date:
- Size: 10.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7c98cf5cf14dbfc23c539134c0631c5a8cbaa19834a6265447760ec96b873179
|
|
| MD5 |
71f40c40268ed67714ce2352162f9cee
|
|
| BLAKE2b-256 |
54ccb3394cf3d965f0be8f4e9fbf78c9210ef11c4f30e7390006de4a76d2113a
|