A library for constructing citable prompts and attributing contributions of context sources in RAG systems
Project description
Attribution
A Python library for attributing contributions of context sources in Retrieval-Augmented Generation (RAG) systems.
Installation
pip install Cite-Attribution
Features
- LLM-Based Citation: Explicit citations during LLM generation using DSPy
- Embedding-Based Citation: Post-processing attribution via semantic similarity
- Profit Share Calculation: Weighted citation aggregation for fair attribution
Quick Start
Embedding-Based Attribution
from attribution import run_embedding_attribution_pipeline
# Your LLM response (without citations)
response = "The capital of France is Paris. It is known for the Eiffel Tower."
# Context sources with their indices
context = {
1: "Paris is the capital and largest city of France.",
2: "The Eiffel Tower is a famous landmark in Paris.",
}
# Map node indices to server/source names
node_map = {
1: "wikipedia",
2: "travel_guide",
}
# Run the pipeline
result = run_embedding_attribution_pipeline(
response_text=response,
context=context,
node_map=node_map,
)
print(result["cited_response"]) # Response with <cite:X>...</cite> tags
print(result["stats"]) # Citation statistics per server
print(result["profit_share"]) # Normalized contribution scores
LLM-Based Attribution
from attribution import run_llm_attribution_pipeline
# Pre-generated response with citation tags
cited_response = "<cite:[1]>Paris is the capital of France.</cite> <cite:[2]>The Eiffel Tower is iconic.</cite>"
# Map node indices to server names
node_map = {
1: "wikipedia",
2: "travel_guide",
}
result = run_llm_attribution_pipeline(
generated_response=cited_response,
node_map=node_map,
)
print(result["profit_share"])
Constructing Citation Prompts
from attribution import construct_citation_prompt
prompt = construct_citation_prompt(
query="What is the capital of France?",
context={
1: "Paris is the capital of France.",
2: "France is a country in Europe.",
}
)
API Reference
Core Functions
aggregate_server_citations(llm_response, node_to_server_map): Parse citations and count per servercalculate_contribution(stats, k_multi=1.5, k_single=1.0): Calculate normalized profit shares
LLM-Based Citation
construct_citation_prompt(query, context): Build DSPy prompt for citation generationrun_llm_attribution_pipeline(generated_response, node_map, ...): Full LLM attribution pipeline
Embedding-Based Citation
auto_cite_response(response, context, server_map, ...): Add citations using embeddingsrun_embedding_attribution_pipeline(response_text, context, node_map, ...): Full embedding pipeline
Configuration
Embedding-Based Settings
DEFAULT_EMBEDDING_MODEL:'BAAI/bge-small-en-v1.5'DEFAULT_SIMILARITY_THRESHOLD:0.75DEFAULT_N_GRAM_SIZE:5
Profit Share Weights
k_multi: Weight for multi-source citations (default: 1.5)k_single: Weight for single-source citations (default: 1.0)
License
MIT License - see LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cite_attribution-0.2.0.tar.gz.
File metadata
- Download URL: cite_attribution-0.2.0.tar.gz
- Upload date:
- Size: 9.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f7d5d136c91992d117607d3dfceebc06edb3fef62295abe5c12cdc248c46601b
|
|
| MD5 |
03c2202da08f94390b6a94e312538f07
|
|
| BLAKE2b-256 |
44186d80bbba089f27feb6e6ae9e57b0ea9bf463401b12ce55ef2f22959e5d78
|
File details
Details for the file cite_attribution-0.2.0-py3-none-any.whl.
File metadata
- Download URL: cite_attribution-0.2.0-py3-none-any.whl
- Upload date:
- Size: 10.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
45c4d138ac03751e33c3ad26aa25a1d1b5f15a97c2f3212bb06cf4f42990024d
|
|
| MD5 |
572440d437a6112892f187e64384e0b6
|
|
| BLAKE2b-256 |
41638ff15819e5a4196ffc139336fd7f72fe89408507786ecc07c10b8901b6ae
|