Skip to main content

Utilities for harm dataset generation

Project description

Generating updated artifacts locally

This project can be used to generate updated artifacts and models used in the the langkit metrics such as the ChromaDB used to compute similarity to known patterns such as prompt injections and or other harmful content. In order to generate these artifacts locally you can try running:

make train-chroma

which will call the data/scriptstraining/train_chroma.py script and outputs chromadb artifacts for each encoder and for each topic. The artifacts are stored in the local file system, in the results folder, something like:

llm-toolkit/results/dirname/model/AllMiniLML6V2/chromadb/injection/chroma.sqlite3

This artifact is then used by langkit metrics such as the injections_metric_chroma, see llm-toolkit/langkit/metrics/injections_chroma.py where the local_path can specify how to instantiate the ChromaVectorDB

Developing

Install requirements first:

  • [Git LFS][https://git-lfs.com/] for the data we commit to this repo

To get changes merged into mainline you'll want to create a PR from any branch.

git push origin HEAD:my-branch

To run the CI checks locally run the following.

make lint format test

Before make test will work, you need to run make install and make data-load to download the assets. You also need to provide an org-0 WhyLabs production API key.

The songbird-client is downloaded from a private gitlab repo - you will need a personal access token to access it. You can set poetry to use your PAT with the following command:

poetry config http-basic.songbird_client_gitlab <gitlab-user-name> <gitlab-pat>

And to automatically fix everything that can be fixed you can run

make fix

If you are making dataset changes, it is possible that the exclusion list will change. If you need to update the exclusion list, you can run the following command:

make generate-exclusion-list

Releasing

Releasing has a few steps to it. The first step is to bump the version number, which will usually look like this if you're bumping the patch version.

# bump-patch will go from 0.1.0 to 0.1.1-dev0
make bump-patch

# bump-release will just shave off the -dev0 if you don't need a dev build
make bump-release

That will generate changes that you can review and commit to git.

git add -p
git commit -m "Bumping for release"

Then you can send a PR for these changes to mainline just to make sure CI isn't broken.

# Create a PR after pushing to a branch
git push origin HEAD:my-branch-name

After the CI clears and the PR merges you can open another PR from mainline into release via the Gitlab UI. When that PR merges it will kick off a release CI pipeline that will publish the python package to the private Gitlab repo as the new version.

Poetry lock on a Mac

If you're on a Mac and make dependency updates, you may need to use a linux VM to get a lock file that is consistent with the CI pipeline.

docker run --rm -it -v $(pwd):/work python:3.10-bookworm sh
cd /work
pip install poetry==1.7.1
poetry config http-basic.songbird_client_gitlab <gitlab-user> <gitlab-pat>>
poetry lock --no-update

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

whylabs_llm_toolkit-0.1.37.tar.gz (124.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

whylabs_llm_toolkit-0.1.37-py3-none-any.whl (196.9 kB view details)

Uploaded Python 3

File details

Details for the file whylabs_llm_toolkit-0.1.37.tar.gz.

File metadata

  • Download URL: whylabs_llm_toolkit-0.1.37.tar.gz
  • Upload date:
  • Size: 124.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.7.1 CPython/3.12.3 Linux/6.8.0-51-generic

File hashes

Hashes for whylabs_llm_toolkit-0.1.37.tar.gz
Algorithm Hash digest
SHA256 286d1d5c3d7b1b2c3fe022d3e66897473c4ce0fd48edeba73d37045aadc443cc
MD5 ef483922438212c9e0731919e5c8fc79
BLAKE2b-256 ecfdaef160340ece4ccf41ee5a68d899a04a5a969343c9714ec2d1111dc7d1ac

See more details on using hashes here.

File details

Details for the file whylabs_llm_toolkit-0.1.37-py3-none-any.whl.

File metadata

  • Download URL: whylabs_llm_toolkit-0.1.37-py3-none-any.whl
  • Upload date:
  • Size: 196.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.7.1 CPython/3.12.3 Linux/6.8.0-51-generic

File hashes

Hashes for whylabs_llm_toolkit-0.1.37-py3-none-any.whl
Algorithm Hash digest
SHA256 f631d32877f7161e5d2e8c8673f0c58de1091351a4248a7385023d7069e34ed7
MD5 b4742d2841f5d63dca9e46442da52e21
BLAKE2b-256 fa490a45ecfd9f4038f893a08409059fe6f0398ca63ad75bf9b28f97d274d1aa

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page