Skip to main content

A library for constructing citable prompts and attributing contributions of context sources in RAG systems

Project description

Attribution

A Python library for attributing contributions of context sources in Retrieval-Augmented Generation (RAG) systems.

Installation

pip install attribution-lib

Features

  • LLM-Based Citation: Prompt construction for citation-aware generation and response decoding (no LLM calls made by the library)
  • Embedding-Based Citation: Post-processing attribution via semantic similarity
  • Profit Share Calculation: Weighted citation aggregation for fair attribution

Quick Start

Embedding-Based Attribution

from attribution import run_embedding_attribution_pipeline

# Your LLM response (without citations)
response = "The capital of France is Paris. It is known for the Eiffel Tower."

# Context sources with their indices
context = {
    1: "Paris is the capital and largest city of France.",
    2: "The Eiffel Tower is a famous landmark in Paris.",
}

# Map node indices to server/source names
node_map = {
    1: "wikipedia",
    2: "travel_guide",
}

# Run the pipeline
result = run_embedding_attribution_pipeline(
    response_text=response,
    context=context,
    node_map=node_map,
)

print(result["cited_response"])  # Response with <cite:X>...</cite> tags
print(result["stats"])           # Citation statistics per server
print(result["profit_share"])    # Normalized contribution scores

LLM-Based Attribution

from attribution import run_llm_attribution_pipeline

# Pre-generated response with citation tags
cited_response = "<cite:[1]>Paris is the capital of France.</cite> <cite:[2]>The Eiffel Tower is iconic.</cite>"

# Map node indices to server names
node_map = {
    1: "wikipedia",
    2: "travel_guide",
}

result = run_llm_attribution_pipeline(
    generated_response=cited_response,
    node_map=node_map,
)

print(result["profit_share"])

Constructing Citation Prompts

from attribution import construct_citation_prompt

prompt = construct_citation_prompt(
    query="What is the capital of France?",
    context={
        1: "Paris is the capital of France.",
        2: "France is a country in Europe.",
    },
    additional_instructions="Please cite your sources appropriately." # Optional
)

API Reference

Core Functions

  • aggregate_server_citations(llm_response, node_to_server_map): Parse citations and count per server
  • calculate_contribution(stats, k_multi=1.5, k_single=1.0): Calculate normalized profit shares

LLM-Based Citation

  • construct_citation_prompt(query, context): Build a citation-aware prompt for your LLM (library does not call any LLM)
  • run_llm_attribution_pipeline(generated_response, node_map, ...): Decode LLM response and compute attribution scores

Embedding-Based Citation

  • auto_cite_response(response, context, server_map, ...): Add citations using embeddings
  • run_embedding_attribution_pipeline(response_text, context, node_map, ...): Full embedding pipeline

Configuration

Embedding-Based Settings

  • DEFAULT_EMBEDDING_MODEL: 'BAAI/bge-small-en-v1.5'
  • DEFAULT_SIMILARITY_THRESHOLD: 0.75
  • DEFAULT_N_GRAM_SIZE: 5

Profit Share Weights

  • k_multi: Weight for multi-source citations (default: 1.5)
  • k_single: Weight for single-source citations (default: 1.0)

License

MIT License - see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

attribution_lib-0.4.0.tar.gz (10.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

attribution_lib-0.4.0-py3-none-any.whl (10.7 kB view details)

Uploaded Python 3

File details

Details for the file attribution_lib-0.4.0.tar.gz.

File metadata

  • Download URL: attribution_lib-0.4.0.tar.gz
  • Upload date:
  • Size: 10.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for attribution_lib-0.4.0.tar.gz
Algorithm Hash digest
SHA256 1ab4c0baf1f5265c1bbae89228cbdf9cd5d523541150472681cf1ffc5a3f7b8d
MD5 a1f19678ce7de65910faf90575f528a3
BLAKE2b-256 7d677cee805cac5ad597c1ec27dff03e7f39026664bb1dc6a7834e2724806db1

See more details on using hashes here.

File details

Details for the file attribution_lib-0.4.0-py3-none-any.whl.

File metadata

File hashes

Hashes for attribution_lib-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e808444c04ee39f26f8e5de381dc3acb2c3ccbc05bd4652d5e6dcc38a46bd096
MD5 6da00cccd421b69ad442016be69ff089
BLAKE2b-256 3da1583b6ef8f585b72a8f32de4b87d5a8765242544fd018a8a4f817278e0f6b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page