Utilities for harm dataset generation

These details have not been verified by PyPI

Project description

Generating updated artifacts locally

This project can be used to generate updated artifacts and models used in the the langkit metrics such as the ChromaDB used to compute similarity to known patterns such as prompt injections and or other harmful content. In order to generate these artifacts locally you can try running:

make train-chroma

which will call the data/scriptstraining/train_chroma.py script and outputs chromadb artifacts for each encoder and for each topic. The artifacts are stored in the local file system, in the results folder, something like:

llm-toolkit/results/dirname/model/AllMiniLML6V2/chromadb/injection/chroma.sqlite3

This artifact is then used by langkit metrics such as the injections_metric_chroma, see llm-toolkit/langkit/metrics/injections_chroma.py where the local_path can specify how to instantiate the ChromaVectorDB

Developing

Install requirements first:

[Git LFS][https://git-lfs.com/] for the data we commit to this repo

To get changes merged into mainline you'll want to create a PR from any branch.

git push origin HEAD:my-branch

To run the CI checks locally run the following.

make lint format test

Before make test will work, you need to run make install and make data-load to download the assets. You also need to provide an org-0 WhyLabs production API key.

The songbird-client is downloaded from a private gitlab repo - you will need a personal access token to access it. You can set poetry to use your PAT with the following command:

poetry config http-basic.songbird_client_gitlab <gitlab-user-name> <gitlab-pat>

And to automatically fix everything that can be fixed you can run

make fix

If you are making dataset changes, it is possible that the exclusion list will change. If you need to update the exclusion list, you can run the following command:

make generate-exclusion-list

Releasing

Releasing has a few steps to it. The first step is to bump the version number, which will usually look like this if you're bumping the patch version.

# bump-patch will go from 0.1.0 to 0.1.1-dev0
make bump-patch

# bump-release will just shave off the -dev0 if you don't need a dev build
make bump-release

That will generate changes that you can review and commit to git.

git add -p
git commit -m "Bumping for release"

Then you can send a PR for these changes to mainline just to make sure CI isn't broken.

# Create a PR after pushing to a branch
git push origin HEAD:my-branch-name

After the CI clears and the PR merges you can open another PR from mainline into release via the Gitlab UI. When that PR merges it will kick off a release CI pipeline that will publish the python package to the private Gitlab repo as the new version.

Poetry lock on a Mac

If you're on a Mac and make dependency updates, you may need to use a linux VM to get a lock file that is consistent with the CI pipeline.

docker run --rm -it -v $(pwd):/work python:3.10-bookworm sh
cd /work
pip install poetry==1.7.1
poetry config http-basic.songbird_client_gitlab <gitlab-user> <gitlab-pat>>
poetry lock --no-update

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.37

Jan 22, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

whylabs_llm_toolkit-0.1.37.tar.gz (124.9 kB view details)

Uploaded Jan 22, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

whylabs_llm_toolkit-0.1.37-py3-none-any.whl (196.9 kB view details)

Uploaded Jan 22, 2025 Python 3

File details

Details for the file whylabs_llm_toolkit-0.1.37.tar.gz.

File metadata

Download URL: whylabs_llm_toolkit-0.1.37.tar.gz
Upload date: Jan 22, 2025
Size: 124.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.7.1 CPython/3.12.3 Linux/6.8.0-51-generic

File hashes

Hashes for whylabs_llm_toolkit-0.1.37.tar.gz
Algorithm	Hash digest
SHA256	`286d1d5c3d7b1b2c3fe022d3e66897473c4ce0fd48edeba73d37045aadc443cc`
MD5	`ef483922438212c9e0731919e5c8fc79`
BLAKE2b-256	`ecfdaef160340ece4ccf41ee5a68d899a04a5a969343c9714ec2d1111dc7d1ac`

See more details on using hashes here.

File details

Details for the file whylabs_llm_toolkit-0.1.37-py3-none-any.whl.

File metadata

Download URL: whylabs_llm_toolkit-0.1.37-py3-none-any.whl
Upload date: Jan 22, 2025
Size: 196.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.7.1 CPython/3.12.3 Linux/6.8.0-51-generic

File hashes

Hashes for whylabs_llm_toolkit-0.1.37-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f631d32877f7161e5d2e8c8673f0c58de1091351a4248a7385023d7069e34ed7`
MD5	`b4742d2841f5d63dca9e46442da52e21`
BLAKE2b-256	`fa490a45ecfd9f4038f893a08409059fe6f0398ca63ad75bf9b28f97d274d1aa`

See more details on using hashes here.

whylabs-llm-toolkit 0.1.37

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

Generating updated artifacts locally

Developing

Releasing

Poetry lock on a Mac

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes