Utilities for harm dataset generation
Project description
Generating updated artifacts locally
This project can be used to generate updated artifacts and models used in the the langkit metrics such as the ChromaDB used to compute similarity to known patterns such as prompt injections and or other harmful content. In order to generate these artifacts locally you can try running:
make train-chroma
which will call the data/scriptstraining/train_chroma.py script and outputs chromadb artifacts for each encoder and for each topic. The artifacts are stored in the local file system, in the results folder, something like:
llm-toolkit/results/dirname/model/AllMiniLML6V2/chromadb/injection/chroma.sqlite3
This artifact is then used by langkit metrics such as the injections_metric_chroma, see llm-toolkit/langkit/metrics/injections_chroma.py where the local_path can specify how to instantiate the ChromaVectorDB
Developing
Install requirements first:
- [Git LFS][https://git-lfs.com/] for the data we commit to this repo
To get changes merged into mainline you'll want to create a PR from any branch.
git push origin HEAD:my-branch
To run the CI checks locally run the following.
make lint format test
Before make test will work, you need to run make install and make data-load to download the assets.
You also need to provide an org-0 WhyLabs production API key.
The songbird-client is downloaded from a private gitlab repo - you will need a personal access token to access it. You can set poetry to use your PAT with the following command:
poetry config http-basic.songbird_client_gitlab <gitlab-user-name> <gitlab-pat>
And to automatically fix everything that can be fixed you can run
make fix
If you are making dataset changes, it is possible that the exclusion list will change. If you need to update the exclusion list, you can run the following command:
make generate-exclusion-list
Releasing
Releasing has a few steps to it. The first step is to bump the version number, which will usually look like this if you're bumping the patch version.
# bump-patch will go from 0.1.0 to 0.1.1-dev0
make bump-patch
# bump-release will just shave off the -dev0 if you don't need a dev build
make bump-release
That will generate changes that you can review and commit to git.
git add -p
git commit -m "Bumping for release"
Then you can send a PR for these changes to mainline just to make sure CI isn't broken.
# Create a PR after pushing to a branch
git push origin HEAD:my-branch-name
After the CI clears and the PR merges you can open another PR from mainline into release via the Gitlab UI. When that PR merges it will
kick off a release CI pipeline that will publish the python package to the private Gitlab repo as the new version.
Poetry lock on a Mac
If you're on a Mac and make dependency updates, you may need to use a linux VM to get a lock file that is consistent with the CI pipeline.
docker run --rm -it -v $(pwd):/work python:3.10-bookworm sh
cd /work
pip install poetry==1.7.1
poetry config http-basic.songbird_client_gitlab <gitlab-user> <gitlab-pat>>
poetry lock --no-update
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file whylabs_llm_toolkit-0.1.37.tar.gz.
File metadata
- Download URL: whylabs_llm_toolkit-0.1.37.tar.gz
- Upload date:
- Size: 124.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.7.1 CPython/3.12.3 Linux/6.8.0-51-generic
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
286d1d5c3d7b1b2c3fe022d3e66897473c4ce0fd48edeba73d37045aadc443cc
|
|
| MD5 |
ef483922438212c9e0731919e5c8fc79
|
|
| BLAKE2b-256 |
ecfdaef160340ece4ccf41ee5a68d899a04a5a969343c9714ec2d1111dc7d1ac
|
File details
Details for the file whylabs_llm_toolkit-0.1.37-py3-none-any.whl.
File metadata
- Download URL: whylabs_llm_toolkit-0.1.37-py3-none-any.whl
- Upload date:
- Size: 196.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.7.1 CPython/3.12.3 Linux/6.8.0-51-generic
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f631d32877f7161e5d2e8c8673f0c58de1091351a4248a7385023d7069e34ed7
|
|
| MD5 |
b4742d2841f5d63dca9e46442da52e21
|
|
| BLAKE2b-256 |
fa490a45ecfd9f4038f893a08409059fe6f0398ca63ad75bf9b28f97d274d1aa
|