Skip to main content

Shield against LLM generated insecure code

Project description

CodeShield

Shield against LLM generated insecure code

CodeShield is a robust inference time filtering tool engineered to prevent the introduction of insecure code generated by LLMs into production systems. LLMs, while instrumental in automating coding tasks and aiding developers, can sometimes output insecure code, even when they have been security-conditioned. CodeShield stands as a guardrail to help ensure that such code is intercepted and filtered out before making it into the codebase.

Overview

LLMs have become an integral part of the coding process, automating coding tasks and serving as a co-pilot for developers. However, our study CyberSecEval, revealed that it is not uncommon for these code-producing models to inadvertently generate insecure code. This poses a significant risk when developers incorporate this insecure code without verification, especially for those who do not have strong cybersecurity background CodeShield helps mitigate this risk by intercepting and blocking insecure code generated by LLMs in a configurable way. CodeShield leverages a static analysis library, the Insecure Code Detector (ICD), to identify insecure code. ICD uses a suite of static analysis tools to perform the analysis across 7 programming languages, covering more than 50+ CWEs. For more details, please see here

Use Cases

CodeShield is designed to be applicable for various scenarios, here are a few example use cases

  • LLM is utilized as a coding assistant. CodeShield is an ideal fit for AI Coding assistants integrated with IDEs like VSCode or any other development framework, where it is able to block insecure code suggestions
  • Chatbots are used to help with coding tasks. It has become a common practice for developers to ask LLMs for code snippets. Consequently, more and more code is produced by LLMs nowadays. Codeshield is able to fortify any code-producing LLMs by either adding a warning message or completely blocking the response

ImageA

Fig1: Depicting the flow of how CodeShield should be used for output scanning from LLM before the suggestions are propagated for the user facing applications.

Latency

CodeShield is optimized for production environments where latency is a critical factor for user experience. It is designed to swiftly process the input by a two-layer scanning solution. Specifically, CodeShield will first identify alarming code patterns in the to be scanned content, and perform a more comprehensive analysis if the content is deemed suspicious in the first step.

Our studies indicate that in production environments, over 98% of the traffic is classified as benign and does not necessitate comprehensive scanning. This means that in approximately 99% of cases, requests are processed within a swift 70ms window. For the remaining traffic that requires more thorough scanning, the p90 latency is 450ms in modern production server environments. This optimization ensures that CodeShield provides robust security without compromising on performance, making it an ideal choice for production environments where both security and speed are crucial.

Security Signals

CodeShield's primary function is to flag insecure code snippets, acting as a preventative shield to enforce secure coding guidelines. As such, it may not only flag directly exploitable vulnerabilities, but also focuses on enhancing code hygiene by preventing insecure coding practices.

Signals generated from CodeShield can be used in different ways. For example, one can expedite the productionization of benign code. Some applications might opt to prevent insecure code from being suggested at all. Alternatively, they could display a warning message to developers about potential security issues within a code snippet.

Getting Started

Follow the instructions and examples as shown in the notebook

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

codeshield-1.0.1.tar.gz (274.2 kB view details)

Uploaded Source

Built Distribution

codeshield-1.0.1-py3-none-any.whl (173.4 kB view details)

Uploaded Python 3

File details

Details for the file codeshield-1.0.1.tar.gz.

File metadata

  • Download URL: codeshield-1.0.1.tar.gz
  • Upload date:
  • Size: 274.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.2

File hashes

Hashes for codeshield-1.0.1.tar.gz
Algorithm Hash digest
SHA256 61866b9281c506f9e176995408daab931d52832e625f6056bba273e80a81139f
MD5 8d814ebf7eaec194efd74446340055d0
BLAKE2b-256 dd0ecb79d48ba05eda459a5a2e90b6056019cf7f41441cdee2a17e8dd63e5502

See more details on using hashes here.

Provenance

File details

Details for the file codeshield-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: codeshield-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 173.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.2

File hashes

Hashes for codeshield-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 cd3516a5006002e0e7400a98e5a4592256a37ce6caf5e162d45ed093eb548377
MD5 9c0e8e1b2ff81e304d8973f0ac416daa
BLAKE2b-256 99a81ce1dcbdc8593e04048b1ea469db8eb92783e82e351405a375e929979f0b

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page