A context tracing tool for LLM

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

Searching for Needles in a Haystack with TracLLM

TracLLM

This a package for easily using TracLLM, which is a tool for finding the critical texts within a lengthy context that contribute to the LLM's answer. Please refer to this repo (https://github.com/WYT8506/TracLLM) to reproduce the results in the paper.

Searching for Needles in a Haystack: Context Tracing for Unraveling Outputs of Long Context LLMs
[Yanting Wang]^1†, [Wei Zou]^1†, Runpeng Geng ¹, Jinyuan Jia ¹,

¹Penn State University
^†Co-first author

🔨 Installation

Please run the following commands to set up the environment:

conda env create -f environment.yml
conda activate TracLLM

conda env create TracLLM
conda activate TracLLM
pip install -r requirements.txt

🗂️ Arguments

We list the arguments for PerturbationBasedAttribution below: K=5, attr_type = "tracllm",score_funcs=['stc','loo','denoised_shapley'], sh_N=5,w=2,beta = 0.2,

Argument	Example	Description
`--llm`	Generated by create_model	Generated by create_model using the Huggingface model_path and api_key (or OpenAI model_name and api_key)
`--explanation_level`	`sentence`	How to segment the input text, [`sentence`, `paragraph`, `segment`].
`--K`	5	The number of most important texts to report.
`--attr_type`	`tracllm`	Whether to apply the search tree from TracLLM. [`vanilla_perturb`, `tracllm`]
`--score_funcs`	`['stc','loo','denoised_shapley']`	The scoring functions to apply. If more than one, the ensemble method from TracLLM will be applied. Choose from [`stc`, `loo`,`lime`,`shapley`, `denoised_shapley`]
`--sh_N`	`5`	The number of permutations to approximate the Shapley/denoised Shapley value.
`--w`	`2`	The weight of the LOO score function when ensembling.
`--beta`	`0.2`	A parameter for denoised Shapley value.

📝 Getting Started

Explore TracLLM with our example notebook quick_start.ipynb. To use TracLLM, first generate the model and attribution object:

from tracllm.models import create_model
from tracllm.attribution import PerturbationBasedAttribution
from tracllm.prompts import wrap_prompt

model_name = "meta-llama/Meta-Llama-3.1-8B-Instruct"
api_key = "Your API key"
llm = create_model(model_path = model_path, api_key = api_key , device = "cuda:0")
score_funcs = ['stc','loo','denoised_shapley'] #input more than one scoring function for ensembling
attr = PerturbationBasedAttribution(llm,explanation_level = "sentence", attr_type = "tracllm",score_funcs= score_funcs,sh_N = 5)

Then, you can craft the prompt and get the LLM's answer:

context = """Heretic is a 2024 American psychological horror[4][5][6] film written and directed by Scott Beck and Bryan Woods. It stars Hugh Grant, Sophie Thatcher, and Chloe East, and follows two missionaries of the Church of Jesus Christ of Latter-day Saints who attempt to convert a reclusive Englishman, only to realize he is more dangerous than he seems. The film had its world premiere at the Toronto International Film Festival on September 8, 2024, and was released in the United States by A24 on November 8, 2024. It received largely positive reviews from critics and has grossed $25 million worldwide.
\n\n Red One is a 2024 American action-adventure Christmas comedy film directed by Jake Kasdan and written by Chris Morgan, from an original story by Hiram Garcia. The film follows the head of North Pole security (Dwayne Johnson) teaming up with a notorious hacker (Chris Evans) in order to locate a kidnapped Santa Claus (J. K. Simmons) on Christmas Eve; Lucy Liu, Kiernan Shipka, Bonnie Hunt, Nick Kroll, Kristofer Hivju, and Wesley Kimmel also star. The film is seen as the first of a Christmas-themed franchise, produced by Amazon MGM Studios in association with Seven Bucks Productions, Chris Morgan Productions, and The Detective Agency.[7][8] Red One was released internationally by Warner Bros. Pictures on November 6 and was released in the United States by Amazon MGM Studios through Metro-Goldwyn-Mayer on November 15, 2024.[9] The film received generally negative reviews from critics, but it has grossed $10 billion solely in the USA. M.O.R.A (Mythological Oversight and Restoration Authority) is a clandestine, multilateral military organization that oversees and protects a secret peace treaty between mythological creatures and humanity. Callum Drift, head commander of Santa Claus's ELF (Enforcement Logistics and Fortification) security, requests to retire after one last Christmas run, as he has become disillusioned with increased bad behavior in the world, exemplified by the growth of Santa's Naughty List. 
"""
question= "Which movie earned more money, Heretic or Red one?"
prompt = wrap_prompt(question, [context])
answer = llm.query(prompt)
print("Answer: ", answer)

Finally, you can get the attribution results of TracLLM by calling attr.attribute:

texts,important_ids, importance_scores, _,_ = attr.attribute(question, [context], answer)
attr.visualize_results(texts,question,answer, important_ids,importance_scores, width = 60)

Example

Customize Input Text Segmentation

You can customize the explanation level (e.g. word level) by passing a list of texts to the PerturbationBasedAttribution class. Please refer to customize_segmentation.ipynb for more details.

Acknowledgement

This project incorporates code from PoisonedRAG and corpus-poisoning.
This project incorporates datasets from LongBench and Needle In A Haystack.
This project draws inspiration from ContextCite and AgentPoison.
The model component of this project is based on Open-Prompt-Injection.
This project utilizes contriever for retrieval augmented generation (RAG).

Citation

@article{wang2024tracllm,
    title={Searching for Needles in a Haystack: Context Tracing for Unraveling Outputs of Long Context LLMs},
    author={Wang Yanting, Zou Wei, Geng Runpeng and Jia Jinyuan},
    year={2024}
}

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

0.1.6

Dec 26, 2024

This version

0.1.5

Dec 26, 2024

0.1.4

Dec 26, 2024

0.1.2

Dec 26, 2024

0.1.1

Dec 9, 2024

0.1.0

Dec 9, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tracllm-0.1.5.tar.gz (15.9 kB view details)

Uploaded Dec 26, 2024 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

tracllm-0.1.5-py3-none-any.whl (14.2 kB view details)

Uploaded Dec 26, 2024 Python 3

File details

Details for the file tracllm-0.1.5.tar.gz.

File metadata

Download URL: tracllm-0.1.5.tar.gz
Upload date: Dec 26, 2024
Size: 15.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.0.1 CPython/3.10.15

File hashes

Hashes for tracllm-0.1.5.tar.gz
Algorithm	Hash digest
SHA256	`ffdd6aa93d18e8e670279fc15ebb9b3ac43cabb2c6f451a0ad41ea7baf3dfa15`
MD5	`09f05fad7eeb728025d3d12c97f60951`
BLAKE2b-256	`a9c837aed3d997c1bbf7bc30e534f8d33fffc938a09e5c27775ca6614a8892f6`

See more details on using hashes here.

File details

Details for the file tracllm-0.1.5-py3-none-any.whl.

File metadata

Download URL: tracllm-0.1.5-py3-none-any.whl
Upload date: Dec 26, 2024
Size: 14.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.0.1 CPython/3.10.15

File hashes

Hashes for tracllm-0.1.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`dc1e3999d91ba0245445d0696422b5219316fb350d234ae5228d6dd604868497`
MD5	`9fdffe7f5d2eacaa4082d2e9961f94dd`
BLAKE2b-256	`7a141673eff4122a3497d2d61676b88bcaa230623308e88757a7dfeec0a6067f`

See more details on using hashes here.

tracllm 0.1.5

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Searching for Needles in a Haystack with TracLLM

🔨 Installation

🗂️ Arguments

📝 Getting Started

Customize Input Text Segmentation

Acknowledgement

Citation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes