Skip to main content

A library for constructing citable prompts and attributing contributions of context sources in RAG systems

Project description

Attribution

A Python library for attributing contributions of context sources in Retrieval-Augmented Generation (RAG) systems.

Installation

pip install Cite-Attribution

Features

  • LLM-Based Citation: Explicit citations during LLM generation using DSPy
  • Embedding-Based Citation: Post-processing attribution via semantic similarity
  • Profit Share Calculation: Weighted citation aggregation for fair attribution

Quick Start

Embedding-Based Attribution

from attribution import run_embedding_attribution_pipeline

# Your LLM response (without citations)
response = "The capital of France is Paris. It is known for the Eiffel Tower."

# Context sources with their indices
context = {
    1: "Paris is the capital and largest city of France.",
    2: "The Eiffel Tower is a famous landmark in Paris.",
}

# Map node indices to server/source names
node_map = {
    1: "wikipedia",
    2: "travel_guide",
}

# Run the pipeline
result = run_embedding_attribution_pipeline(
    response_text=response,
    context=context,
    node_map=node_map,
)

print(result["cited_response"])  # Response with <cite:X>...</cite> tags
print(result["stats"])           # Citation statistics per server
print(result["profit_share"])    # Normalized contribution scores

LLM-Based Attribution

from attribution import run_llm_attribution_pipeline

# Pre-generated response with citation tags
cited_response = "<cite:[1]>Paris is the capital of France.</cite> <cite:[2]>The Eiffel Tower is iconic.</cite>"

# Map node indices to server names
node_map = {
    1: "wikipedia",
    2: "travel_guide",
}

result = run_llm_attribution_pipeline(
    generated_response=cited_response,
    node_map=node_map,
)

print(result["profit_share"])

Constructing Citation Prompts

from attribution import construct_citation_prompt

prompt = construct_citation_prompt(
    query="What is the capital of France?",
    context={
        1: "Paris is the capital of France.",
        2: "France is a country in Europe.",
    }
)

API Reference

Core Functions

  • aggregate_server_citations(llm_response, node_to_server_map): Parse citations and count per server
  • calculate_contribution(stats, k_multi=1.5, k_single=1.0): Calculate normalized profit shares

LLM-Based Citation

  • construct_citation_prompt(query, context): Build DSPy prompt for citation generation
  • run_llm_attribution_pipeline(generated_response, node_map, ...): Full LLM attribution pipeline

Embedding-Based Citation

  • auto_cite_response(response, context, server_map, ...): Add citations using embeddings
  • run_embedding_attribution_pipeline(response_text, context, node_map, ...): Full embedding pipeline

Configuration

Embedding-Based Settings

  • DEFAULT_EMBEDDING_MODEL: 'BAAI/bge-small-en-v1.5'
  • DEFAULT_SIMILARITY_THRESHOLD: 0.75
  • DEFAULT_N_GRAM_SIZE: 5

Profit Share Weights

  • k_multi: Weight for multi-source citations (default: 1.5)
  • k_single: Weight for single-source citations (default: 1.0)

License

MIT License - see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cite_attribution-0.2.0.tar.gz (9.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cite_attribution-0.2.0-py3-none-any.whl (10.6 kB view details)

Uploaded Python 3

File details

Details for the file cite_attribution-0.2.0.tar.gz.

File metadata

  • Download URL: cite_attribution-0.2.0.tar.gz
  • Upload date:
  • Size: 9.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for cite_attribution-0.2.0.tar.gz
Algorithm Hash digest
SHA256 f7d5d136c91992d117607d3dfceebc06edb3fef62295abe5c12cdc248c46601b
MD5 03c2202da08f94390b6a94e312538f07
BLAKE2b-256 44186d80bbba089f27feb6e6ae9e57b0ea9bf463401b12ce55ef2f22959e5d78

See more details on using hashes here.

File details

Details for the file cite_attribution-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for cite_attribution-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 45c4d138ac03751e33c3ad26aa25a1d1b5f15a97c2f3212bb06cf4f42990024d
MD5 572440d437a6112892f187e64384e0b6
BLAKE2b-256 41638ff15819e5a4196ffc139336fd7f72fe89408507786ecc07c10b8901b6ae

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page