Skip to main content

A library for constructing citable prompts and attributing contributions of context sources in RAG systems

Project description

Attribution

A Python library for attributing contributions of context sources in Retrieval-Augmented Generation (RAG) systems.

Installation

pip install attribution-lib

Features

  • LLM-Based Citation: Explicit citations during LLM generation using DSPy
  • Embedding-Based Citation: Post-processing attribution via semantic similarity
  • Profit Share Calculation: Weighted citation aggregation for fair attribution

Quick Start

Embedding-Based Attribution

from attribution import run_embedding_attribution_pipeline

# Your LLM response (without citations)
response = "The capital of France is Paris. It is known for the Eiffel Tower."

# Context sources with their indices
context = {
    1: "Paris is the capital and largest city of France.",
    2: "The Eiffel Tower is a famous landmark in Paris.",
}

# Map node indices to server/source names
node_map = {
    1: "wikipedia",
    2: "travel_guide",
}

# Run the pipeline
result = run_embedding_attribution_pipeline(
    response_text=response,
    context=context,
    node_map=node_map,
)

print(result["cited_response"])  # Response with <cite:X>...</cite> tags
print(result["stats"])           # Citation statistics per server
print(result["profit_share"])    # Normalized contribution scores

LLM-Based Attribution

from attribution import run_llm_attribution_pipeline

# Pre-generated response with citation tags
cited_response = "<cite:[1]>Paris is the capital of France.</cite> <cite:[2]>The Eiffel Tower is iconic.</cite>"

# Map node indices to server names
node_map = {
    1: "wikipedia",
    2: "travel_guide",
}

result = run_llm_attribution_pipeline(
    generated_response=cited_response,
    node_map=node_map,
)

print(result["profit_share"])

Constructing Citation Prompts

from attribution import construct_citation_prompt

prompt = construct_citation_prompt(
    query="What is the capital of France?",
    context={
        1: "Paris is the capital of France.",
        2: "France is a country in Europe.",
    }
)

API Reference

Core Functions

  • aggregate_server_citations(llm_response, node_to_server_map): Parse citations and count per server
  • calculate_contribution(stats, k_multi=1.5, k_single=1.0): Calculate normalized profit shares

LLM-Based Citation

  • construct_citation_prompt(query, context): Build DSPy prompt for citation generation
  • run_llm_attribution_pipeline(generated_response, node_map, ...): Full LLM attribution pipeline

Embedding-Based Citation

  • auto_cite_response(response, context, server_map, ...): Add citations using embeddings
  • run_embedding_attribution_pipeline(response_text, context, node_map, ...): Full embedding pipeline

Configuration

Embedding-Based Settings

  • DEFAULT_EMBEDDING_MODEL: 'BAAI/bge-small-en-v1.5'
  • DEFAULT_SIMILARITY_THRESHOLD: 0.75
  • DEFAULT_N_GRAM_SIZE: 5

Profit Share Weights

  • k_multi: Weight for multi-source citations (default: 1.5)
  • k_single: Weight for single-source citations (default: 1.0)

License

MIT License - see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

attribution_lib-0.2.0.tar.gz (10.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

attribution_lib-0.2.0-py3-none-any.whl (10.6 kB view details)

Uploaded Python 3

File details

Details for the file attribution_lib-0.2.0.tar.gz.

File metadata

  • Download URL: attribution_lib-0.2.0.tar.gz
  • Upload date:
  • Size: 10.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for attribution_lib-0.2.0.tar.gz
Algorithm Hash digest
SHA256 49f85046ca909ad0fb4192b5ade570638320121754f3d88b40cd1f482cc6b1ff
MD5 865cb775f291be0b7fe1a4a9cd47adba
BLAKE2b-256 e144deafa7ecf33821d6771b83c80e494ed22381940213e91abdc42e5690825e

See more details on using hashes here.

File details

Details for the file attribution_lib-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for attribution_lib-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4c1a9061f8240dc333ac7a7f80aca18684d7d62d9f07f20905a35dc041766b81
MD5 bd25bd894f39920cb098193bafa29763
BLAKE2b-256 e31eebd724f6fe087de38cd43b120bd85617f8ec5ba4e12cc7dad61a82cb3d76

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page