Skip to main content

A library for constructing citable prompts and attributing contributions of context sources in RAG systems

Project description

Attribution

A Python library for attributing contributions of context sources in Retrieval-Augmented Generation (RAG) systems.

Installation

pip install attribution

Features

  • LLM-Based Citation: Explicit citations during LLM generation using DSPy
  • Embedding-Based Citation: Post-processing attribution via semantic similarity
  • Profit Share Calculation: Weighted citation aggregation for fair attribution

Quick Start

Embedding-Based Attribution

from attribution import run_embedding_attribution_pipeline

# Your LLM response (without citations)
response = "The capital of France is Paris. It is known for the Eiffel Tower."

# Context sources with their indices
context = {
    1: "Paris is the capital and largest city of France.",
    2: "The Eiffel Tower is a famous landmark in Paris.",
}

# Map node indices to server/source names
node_map = {
    1: "wikipedia",
    2: "travel_guide",
}

# Run the pipeline
result = run_embedding_attribution_pipeline(
    response_text=response,
    context=context,
    node_map=node_map,
)

print(result["cited_response"])  # Response with <cite:X>...</cite> tags
print(result["stats"])           # Citation statistics per server
print(result["profit_share"])    # Normalized contribution scores

LLM-Based Attribution

from attribution import run_llm_attribution_pipeline

# Pre-generated response with citation tags
cited_response = "<cite:[1]>Paris is the capital of France.</cite> <cite:[2]>The Eiffel Tower is iconic.</cite>"

# Map node indices to server names
node_map = {
    1: "wikipedia",
    2: "travel_guide",
}

result = run_llm_attribution_pipeline(
    generated_response=cited_response,
    node_map=node_map,
)

print(result["profit_share"])

Constructing Citation Prompts

from attribution import construct_citation_prompt

prompt = construct_citation_prompt(
    query="What is the capital of France?",
    context={
        1: "Paris is the capital of France.",
        2: "France is a country in Europe.",
    }
)

API Reference

Core Functions

  • aggregate_server_citations(llm_response, node_to_server_map): Parse citations and count per server
  • calculate_contribution(stats, k_multi=1.5, k_single=1.0): Calculate normalized profit shares

LLM-Based Citation

  • construct_citation_prompt(query, context): Build DSPy prompt for citation generation
  • run_llm_attribution_pipeline(generated_response, node_map, ...): Full LLM attribution pipeline

Embedding-Based Citation

  • auto_cite_response(response, context, server_map, ...): Add citations using embeddings
  • run_embedding_attribution_pipeline(response_text, context, node_map, ...): Full embedding pipeline

Configuration

Embedding-Based Settings

  • DEFAULT_EMBEDDING_MODEL: 'BAAI/bge-small-en-v1.5'
  • DEFAULT_SIMILARITY_THRESHOLD: 0.75
  • DEFAULT_N_GRAM_SIZE: 5

Profit Share Weights

  • k_multi: Weight for multi-source citations (default: 1.5)
  • k_single: Weight for single-source citations (default: 1.0)

License

MIT License - see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

attribution_lib-0.1.0.tar.gz (10.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

attribution_lib-0.1.0-py3-none-any.whl (10.6 kB view details)

Uploaded Python 3

File details

Details for the file attribution_lib-0.1.0.tar.gz.

File metadata

  • Download URL: attribution_lib-0.1.0.tar.gz
  • Upload date:
  • Size: 10.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for attribution_lib-0.1.0.tar.gz
Algorithm Hash digest
SHA256 60bd6934d9e0ab4ced3cdc92fbdd8e54ec7c3a8e398ba146d1da524a7cbb362e
MD5 a7a3bbb27787d2df046a35eea13b0194
BLAKE2b-256 f2b9d725113f7bd3ff7bef7e4538cb8856a5cc2decae0556750ce85c3072e34a

See more details on using hashes here.

File details

Details for the file attribution_lib-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for attribution_lib-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7c98cf5cf14dbfc23c539134c0631c5a8cbaa19834a6265447760ec96b873179
MD5 71f40c40268ed67714ce2352162f9cee
BLAKE2b-256 54ccb3394cf3d965f0be8f4e9fbf78c9210ef11c4f30e7390006de4a76d2113a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page