Skip to main content

Shield against LLM generated insecure code

Project description

CodeShield

Shield against LLM generated insecure code

CodeShield is a robust inference time filtering tool engineered to prevent the introduction of insecure code generated by LLMs into production systems. LLMs, while instrumental in automating coding tasks and aiding developers, can sometimes output insecure code, even when they have been security-conditioned. CodeShield stands as a guardrail to help ensure that such code is intercepted and filtered out before making it into the codebase.

Overview

LLMs have become an integral part of the coding process, automating coding tasks and serving as a co-pilot for developers. However, our study CyberSecEval, revealed that it is not uncommon for these code-producing models to inadvertently generate insecure code. This poses a significant risk when developers incorporate this insecure code without verification, especially for those who do not have strong cybersecurity background CodeShield helps mitigate this risk by intercepting and blocking insecure code generated by LLMs in a configurable way. CodeShield leverages a static analysis library, the Insecure Code Detector (ICD), to identify insecure code. ICD uses a suite of static analysis tools to perform the analysis across 7 programming languages, covering more than 50+ CWEs. For more details, please see here

Use Cases

CodeShield is designed to be applicable for various scenarios, here are a few example use cases

  • LLM is utilized as a coding assistant. CodeShield is an ideal fit for AI Coding assistants integrated with IDEs like VSCode or any other development framework, where it is able to block insecure code suggestions
  • Chatbots are used to help with coding tasks. It has become a common practice for developers to ask LLMs for code snippets. Consequently, more and more code is produced by LLMs nowadays. Codeshield is able to fortify any code-producing LLMs by either adding a warning message or completely blocking the response

ImageA

Fig1: Depicting the flow of how CodeShield should be used for output scanning from LLM before the suggestions are propagated for the user facing applications.

Latency

CodeShield is optimized for production environments where latency is a critical factor for user experience. It is designed to swiftly process the input by a two-layer scanning solution. Specifically, CodeShield will first identify alarming code patterns in the to be scanned content, and perform a more comprehensive analysis if the content is deemed suspicious in the first step.

Our studies indicate that in production environments, over 98% of the traffic is classified as benign and does not necessitate comprehensive scanning. This means that in approximately 99% of cases, requests are processed within a swift 70ms window. For the remaining traffic that requires more thorough scanning, the p90 latency is 450ms in modern production server environments. This optimization ensures that CodeShield provides robust security without compromising on performance, making it an ideal choice for production environments where both security and speed are crucial.

Security Signals

CodeShield's primary function is to flag insecure code snippets, acting as a preventative shield to enforce secure coding guidelines. As such, it may not only flag directly exploitable vulnerabilities, but also focuses on enhancing code hygiene by preventing insecure coding practices.

Signals generated from CodeShield can be used in different ways. For example, one can expedite the productionization of benign code. Some applications might opt to prevent insecure code from being suggested at all. Alternatively, they could display a warning message to developers about potential security issues within a code snippet.

Getting Started

Follow the instructions and examples as shown in the notebook

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

codeshield-1.0.0.tar.gz (279.3 kB view details)

Uploaded Source

Built Distribution

codeshield-1.0.0-py3-none-any.whl (174.2 kB view details)

Uploaded Python 3

File details

Details for the file codeshield-1.0.0.tar.gz.

File metadata

  • Download URL: codeshield-1.0.0.tar.gz
  • Upload date:
  • Size: 279.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.2

File hashes

Hashes for codeshield-1.0.0.tar.gz
Algorithm Hash digest
SHA256 68e3d363f7abe0c74c5cfbba175148c38f398cdf9fb889a2cd6ad6d6fedb6bf2
MD5 999ab23d36f2585cb15e532c2a312daa
BLAKE2b-256 e0a4d9fadcc8fa645fe2893a1137c9fe98d294a3e1a258e290f6d87e9497d0cb

See more details on using hashes here.

File details

Details for the file codeshield-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: codeshield-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 174.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.2

File hashes

Hashes for codeshield-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ef79206f2a6f2781ff381caccf61f3462cc042fb375d1f28ac36b01bc9dbefe3
MD5 de7e66267832b4625a7ba2d4b978c7c6
BLAKE2b-256 b39bb0c592321b073a14ecda31e6ca4edb86f88916d42812a4ad92bcc887c746

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page