Skip to main content

A library for constructing citable prompts and attributing contributions of context sources in RAG systems

Project description

Attribution

A Python library for attributing contributions of context sources in Retrieval-Augmented Generation (RAG) systems.

Installation

pip install attribution-lib

Features

  • LLM-Based Citation: Prompt construction for citation-aware generation and response decoding (no LLM calls made by the library)
  • Embedding-Based Citation: Post-processing attribution via semantic similarity
  • Profit Share Calculation: Weighted citation aggregation for fair attribution

Quick Start

Embedding-Based Attribution

from attribution import run_embedding_attribution_pipeline

# Your LLM response (without citations)
response = "The capital of France is Paris. It is known for the Eiffel Tower."

# Context sources with their indices
context = {
    1: "Paris is the capital and largest city of France.",
    2: "The Eiffel Tower is a famous landmark in Paris.",
}

# Map node indices to server/source names
node_map = {
    1: "wikipedia",
    2: "travel_guide",
}

# Run the pipeline
result = run_embedding_attribution_pipeline(
    response_text=response,
    context=context,
    node_map=node_map,
)

print(result["cited_response"])  # Response with <cite:X>...</cite> tags
print(result["stats"])           # Citation statistics per server
print(result["profit_share"])    # Normalized contribution scores

LLM-Based Attribution

from attribution import run_llm_attribution_pipeline

# Pre-generated response with citation tags
cited_response = "<cite:[1]>Paris is the capital of France.</cite> <cite:[2]>The Eiffel Tower is iconic.</cite>"

# Map node indices to server names
node_map = {
    1: "wikipedia",
    2: "travel_guide",
}

result = run_llm_attribution_pipeline(
    generated_response=cited_response,
    node_map=node_map,
)

print(result["profit_share"])

Constructing Citation Prompts

from attribution import construct_citation_prompt

prompt = construct_citation_prompt(
    query="What is the capital of France?",
    context={
        1: "Paris is the capital of France.",
        2: "France is a country in Europe.",
    }
)

API Reference

Core Functions

  • aggregate_server_citations(llm_response, node_to_server_map): Parse citations and count per server
  • calculate_contribution(stats, k_multi=1.5, k_single=1.0): Calculate normalized profit shares

LLM-Based Citation

  • construct_citation_prompt(query, context): Build a citation-aware prompt for your LLM (library does not call any LLM)
  • run_llm_attribution_pipeline(generated_response, node_map, ...): Decode LLM response and compute attribution scores

Embedding-Based Citation

  • auto_cite_response(response, context, server_map, ...): Add citations using embeddings
  • run_embedding_attribution_pipeline(response_text, context, node_map, ...): Full embedding pipeline

Configuration

Embedding-Based Settings

  • DEFAULT_EMBEDDING_MODEL: 'BAAI/bge-small-en-v1.5'
  • DEFAULT_SIMILARITY_THRESHOLD: 0.75
  • DEFAULT_N_GRAM_SIZE: 5

Profit Share Weights

  • k_multi: Weight for multi-source citations (default: 1.5)
  • k_single: Weight for single-source citations (default: 1.0)

License

MIT License - see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

attribution_lib-0.3.0.tar.gz (10.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

attribution_lib-0.3.0-py3-none-any.whl (10.6 kB view details)

Uploaded Python 3

File details

Details for the file attribution_lib-0.3.0.tar.gz.

File metadata

  • Download URL: attribution_lib-0.3.0.tar.gz
  • Upload date:
  • Size: 10.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for attribution_lib-0.3.0.tar.gz
Algorithm Hash digest
SHA256 c67bbfdefea278c703353b704223571d4ae4f52771ffd2a1c1412241d224622d
MD5 eeda9a0d5b3bd2459853015dd39b9f3f
BLAKE2b-256 03ab9ea315d4ff4dad0f8658a3d4edaaaeca3a3f5fa69c3854e8ce111ae0d508

See more details on using hashes here.

File details

Details for the file attribution_lib-0.3.0-py3-none-any.whl.

File metadata

File hashes

Hashes for attribution_lib-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b2a075c5bcfe1facbb721bd04c0d61da1dd2434a2b37ac27692e7c9b635e3872
MD5 fa65f8eeec3b650d03641d18b25ad196
BLAKE2b-256 b421604bdd24a532efaa6b65f9bc4d6dacdf99dc202153e457c0d662bf1a191c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page