Skip to main content

GenAI DLP and Prompt generator

Project description

GenAI DLP Prompt Generator

GenAI DLP Prompt Generator is a Python tool designed to scrape DLP data and use it to generate GenAI prompt. It has three main modules:

  • It fetches DLP test sample data from specified URLs, saves the data in text format, and then converts these text files into PDFs.
  • It uses an OpenAI Assistant to generate DLP mock data
  • It uses OpenAI Chat Completions to generate prompts for each DLP category

The output is suitable for benchmarking DLP systems or Generative AI Language Learning Models (GenAI LLMs).

Features

  • Web scraping from specified URLs.
  • Data extraction and saving in text format.
  • Conversion of text data to PDF format, ideal for benchmarking DLP systems or GenAI LLMs.

Installing

To install DLP Data Scraper, clone the repository and install the required packages:

git clone https://github.com/BenderScript/DLPDataScraper.git
cd DLPDataScraper/dlp_data_scraper
pip3 install -r requirements.txt

Usage

Make sure you have a OpenAI API key and set it as an environment variable:

export OPENAI_API_KEY=<your key here>

The file with DLP categories currently under tests/dlp_categories.md. Need to be copies to a new location and the path passed to the OpenAIDLPAssistant or OpenAIChat classes.

To use the DLP Data Scraper:

The scraper access a URL with dynamic content, waits for it to load and extracts all DLP categories

from dlp_data_scraper.umbrella import Umbrella
from file_utils.FileUtils import FileUtils

pdf_data = "umbrella/pdf_data"
text_data = "umbrella/text_data"
file_utils = FileUtils()
url = (
    'https://support.umbrella.com/hc/en-us/articles/4402023980692-Data-Loss-Prevention-DLP-Test-Sample-Data-for'
    '-Built-In-Data-Identifiers')
scraper = Umbrella(url=url, text_data=text_data, pdf_data=pdf_data)
html_content = scraper.initialize_browser()
scraped_data = scraper.scrape_data()
scraper.save_data_to_files()
file_utils.convert_txt_to_pdf(text_data, pdf_data)
print("Scraping and conversion to PDF completed.")

After the run is over, the generated data under the umbrella/text_data and umbrella/pdf_data directory. There will be one file for each DLP category.

To use the OpenAI Assistant DLP generator:

from dlp_data_gen.openai_dlp_assistant import OpenAIDLPAssistant

dlp_gen_assistant = OpenAIDLPAssistant(text_data="openai_dlp/text_data", pdf_data="openai_dlp/pdf_data",
                                           dlp_categories_file="dlp/dlp_categories.md")
dlp_gen_assistant.run()

After the run is over, the generated data will be under the openai_dlp/text_data and openai_dlp/pdf_data directory. There will be a single file with mock DLP data for each category.

To use the OpenAI DLP Prompt Generator

from prompt_gen.openai_chat import OpenAIChat

chat_gen = OpenAIChat(text_data="openai_chat_prompt/text_data",
                          pdf_data="openai_chat_prompt/pdf_data",
                          dlp_categories_file="dlp/dlp_categories.md")

chat_gen.run()

Contributing

Contributions to DLP Data Scraper are welcome. Please feel free to submit pull requests or open issues to improve the project.


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

genai_dlp_prompter-0.1.0.tar.gz (11.8 kB view details)

Uploaded Source

Built Distribution

genai_dlp_prompter-0.1.0-py3-none-any.whl (14.3 kB view details)

Uploaded Python 3

File details

Details for the file genai_dlp_prompter-0.1.0.tar.gz.

File metadata

  • Download URL: genai_dlp_prompter-0.1.0.tar.gz
  • Upload date:
  • Size: 11.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.7.1 CPython/3.11.7 Darwin/23.2.0

File hashes

Hashes for genai_dlp_prompter-0.1.0.tar.gz
Algorithm Hash digest
SHA256 fde6cfea1d422f4cc86f4496ddd671c4bf8c359bd79fa24dfc137518686fd673
MD5 8379d7a8e5423aa627f36dd733d08c0b
BLAKE2b-256 12b437a5cf6c248a5949ceadcf21f3dcb308b50b7d066dc6362e42a5657adbdb

See more details on using hashes here.

File details

Details for the file genai_dlp_prompter-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for genai_dlp_prompter-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d6adee052595e8d827ae5158bf0ca758c30f4de3c5ec3e1376487be733f7d24d
MD5 a1917ed6734e7ca9149939fa0c8c7151
BLAKE2b-256 3a273154a935dbd6d5aa0a85796831bffce713f13eb0773c201ee11bb81e666e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page