Skip to main content

A security-focused tool that uses LLMs to analyze shell scripts

Project description

Baish Logo

Baish (Bash AI Shield)

Baish is a security-focused tool that uses Large Language Models (LLMs) and other heuristics to analyse shell scripts before they are executed. It's designed to be used as a more secure alternative to the common curl | bash pattern.

Importantly, Baish is a cybersecurity learning project, where the developers have a relatively narrow solution to implement, but still learn a lot about the problem space. For example, how to use LLMs, how to secure them, and how to take and understand untrusted input.

Perhaps it's unlikely that anyone who would curl | bash would curl | baish --shield | bash first (perhaps in a CI/CD pipeline). That said, ultimately, as an industry, we run a lot of unknown code, so there may be uses for this.

The underlying problems are the same in almost every application, and we are trying to use different heuristics in combination with general AI capabilities to build a cybersecurity tool. So there are two parts to the project:

  1. Build a tool that uses LLMs to help improve security
  2. Understand how to use LLMs themselves more securely

About TAICO

The Toronto Artificial Intelligence and Cybersecurity Organization (TAICO) is a group of AI and cybersecurity experts who meet monthly to discuss the latest trends and technologies in the field. Baish is a project of TAICO.

Caveats and Disclaimers

⚠️ Baish's analysis is not foolproof! This is a proof of concept! To be completely sure that a script is safe, you would have to review and analyze it yourself.

⚠️ Different LLM providers will give different results. One provider and one model may give a script a low risk score, while another model or provider gives a high risk score. You would have to experiment with different providers and models to see which one you trust the most.

⚠️ Baish is in heavy development. Expect breaking changes.

⚠️ Using local Ollama for local LLMs is still experimental and may not work as expected.

Features

  • Accepts files on stdin, ala the curl | bash pattern, but instead you would do curl | baish --shield | bash
  • Can analyze any file, not just shell scripts curled to bash
  • Analyzes scripts using various configurable LLMs for potential security risks
  • Provides a harm score (1-10) indicating potential dangerous operations (higher is more dangerous)
  • Provides a complexity score (1-10) indicating how complex the script is (higher is more complex)
  • Saves downloaded scripts for later review
  • Logs all requests and responses from LLMs along with the script ID
  • Uses YARA rules and other heuristics to detect potential prompt injection

Large Language Model Provider Support

Baish currently supports the following providers:

  • Groq
  • Anthropic
  • Experimental support for Ollama for local LLMs, e.g. llama3, mistral, etc.

It is straightforward to add support for other providers, pretty much anything LangChain supports, and contributions are welcome!

Prerequisites

  • An API key from the a supported LLM provider.
  • Knowing which model from the provider you are going to use.
  • Python 3.10 or later
  • libmagic (for file type detection)
    • Ubuntu/Debian: apt install libmagic1
    • RHEL/CentOS: dnf install file-libs
    • macOS: brew install libmagic

Installation

From PyPI

Virtual Environment

  • Best to create a virtual environment to install baish.
python3 -m venv baish-env
source baish-env/bin/activate

From PyPI

pip install baish

From Source

  • Checkout the repo:
git clone https://github.com/taicodotca/baish.git
cd baish
  • Install the dependencies:
pip install -r requirements.txt
  • Set the API key for the LLM provider in the config.yaml file. You can also specify a different model, temperature, etc.

  • Run baish:

$ ./baish 
Error: No input provided
Usage: cat script.sh | baish

Usage

  • Technically, you can pipe any file to baish, but it's really meant to be used with shell scripts, especiall via the curl evil.com/evil.sh | baish pattern.
curl -sSL https://thisisapotentiallyunsafescript.com/script.sh | baish

Baish will output the harm score, complexity score, and an explanation for why the script is either safe or not.

Setting Provider and Model

You can set the provider and model in the config.yaml file.

E.g. config.yaml:

default_llm: haiku # default model to use
llms:
  haiku: # memorable name
    provider: anthropic # provider name
    model: claude-3-5-haiku-latest # model name
    temperature: 0.1 # temperature

  other_model:
    provider: groq
    model: llama3-70b-8192
    temperature: 0.1

Using Ollama

If using Ollama, you can also specify the base URL, though it will default to http://localhost:11434 if not specified.

other_model:
  provider: ollama
  model: llama3:latest
  url: http://localhost:11434

Currently our prompt is quite long, and for example when using llama3, the promt lenght is 2048 by default, so you may see errors like this:

time=2024-12-08T11:22:33.343-05:00 level=WARN source=runner.go:129 msg="truncating input prompt" limit=2048 prompt=2815 keep=25 new=2048

You can increase the context window with the following command:

$ ollama run llama3 /set parameter num_ctx 4096
You've set the `num_ctx` parameter to 4096. This parameter is used in some machine 
learning models, such as transformer-based architectures, and specifies the number of 
context windows or attention heads to use.
<output abbreviated>

Examples

Here's a few examples of real world scripts that Baish can help you analyze before execution. These are mostly about installing real world software.

$ curl -fsSL https://ollama.com/install.sh | ./baish
⠙ Analyzing file...
╭────────────────────────────── Baish - Bash AI Shield ───────────────────────────────╮
│ Analysis Results - script_1732984526.sh                                             │
│                                                                                     │
│ Harm Score:       2/10 ████────────────────                                         │
│ Complexity Score: 8/10 ████████████████────                                         │
│ Uses Root:    True                                                                  │
│                                                                                     │
│ File type: text/x-shellscript                                                       │
│                                                                                     │
│ Explanation:                                                                        │
│ This script is a Linux installer for Ollama, a software package. It installs Ollama │
│ on the system, detects the operating system architecture, and installs the          │
│ appropriate version of Ollama. It also checks for and installs NVIDIA CUDA drivers  │
│ if necessary. The script uses various tools and commands to perform these tasks,    │
│ including curl, tar, and dpkg. The script is designed to be run as root and         │
│ modifies the system by installing software and configuring system settings.         │
│                                                                                     │
│ Script saved to: /home/curtis/.baish/scripts/script_1732984526.sh                   │
│ To execute, run: bash /home/curtis/.baish/scripts/script_1732984526.sh              │
│                                                                                     │
│ ⚠️  AI-based analysis is not perfect and should not be considered a complete         │
│ security audit. For complete trust in a script, you should analyze it in detail     │
│ yourself. Baish has downloaded the script so you can review and execute it in your  │
│ own environment.                                                                    │
╰─────────────────────────────────────────────────────────────────────────────────────╯

Install rvm:

curl -sSL https://get.rvm.io | ./baish --debug

Install rust:

curl --silent https://sh.rustup.rs | ./baish

Install docker:

curl -fsSL https://get.docker.com | ./baish --debug

Shield Mode

Baish can also be used in "shield" mode, which will error out if the script is not safe.

curl -sSL https://thisisapotentiallyunsafescript.com/script.sh | ./baish -s | bash

E.g. of running an unsafe script through baish in shield mode, where bash will execute the output of baish, in this case outputting an error message:

$ cat tests/fixtures/secret-upload.sh | ./baish -s | bash
Script unsafe: High risk score detected

Or without piping to bash. Note how the output is a "script" itself, echoing the output to the terminal which bash will then execute:

$ cat tests/fixtures/secret-upload.sh | ./baish -s
echo "Script unsafe: High risk score detected"

Logging and Stored Scripts

Baish logs all requests and responses from LLMs along with the script ID. It also saves the script to disk with the ID so it can be reviewed later.

Below we see the results of one Baish run.

$ tree ~/.baish/
/home/ubuntu/.baish/
├── logs
│   └── 2024-12-05_15-50-43_c6f3de91_llm.jsonl
└── scripts
    └── 2024-12-05_15-50-43_c6f3de91_script.sh

3 directories, 2 files

Known Issues

  • LLMs with short context windows (like some local models) may fail to analyze longer scripts due to prompt length limitations. Even commercial models with short context windows can fail to analyze longer scripts.

Future Work and TODOs

Feature Status Description Details
JSON Output DONE Structured output format Enables programmatic parsing of Baish results
LLM Logging DONE Request/response tracking Log all LLM interactions with script IDs for audit trails
Prompt Injection Detection DONE YARA-based detection Use YARA rules to identify potential prompt injection attempts
Shield Mode DONE Safe execution pipeline Enable curl | baish | bash pattern with security controls
System Prompts DONE LLM prompt configuration Configure system prompts for supported LLM providers
Prompt Injection Detection with YARA DONE YARA-based detection Use YARA rules to identify potential prompt injection attempts
Root Usage Detection IN PROGRESS Improve detection Enhance accuracy of root privilege usage detection
End to End Tests IN PROGRESS Dockerized tests Run end to end tests in Docker
Atomic Red Team Integration TODO Use ART for testing Use Atomic Red Team tests to validate Baish's detection capabilities against known malicious patterns
CI/CD Mode TODO Add pipeline integration Create a specialized mode for CI/CD environments
Directory Analysis TODO Bulk file scanning Analyze multiple files and generate comprehensive security reports
Custom YARA Rules TODO User-defined rules Allow users to add their own YARA rules for custom threat detection
N/A Scoring TODO Better non-script handling Display N/A instead of scores for non-scripts or prompt injection cases
Vector DB Memory TODO Long-term analysis storage Implement vector database for historical analysis and pattern recognition
LLM Self-evaluation TODO Prompt injection checks Enable LLMs to self-evaluate for prompt injection vulnerabilities
Token Length Management TODO Better chunking Improve text chunking for large scripts using LangChain
Custom Prompts TODO User-defined prompts Allow users to specify custom analysis prompts
Guardrails Integration TODO Add guardrails-ai Integrate with guardrails-ai for additional security checks
Script Deobfuscation TODO Pre-analysis cleanup Implement deobfuscation using tools like debash
VM Sandbox TODO Isolated execution Run scripts in VM sandbox before actual execution
Shell Compilation TODO Compiled shell scripts Support for compiled shell scripts using shc
One-time API Keys TODO Temporary credentials Implement single-use API keys for safer execution
Base64 Detection TODO Encoded content handling Detect and handle base64 encoded content
VirusTotal Integration TODO Hash checking Check script hashes against VirusTotal database
VM Detonation TODO Dynamic analysis Execute scripts in isolated environments for behavior analysis
Ollama JSON Support TODO JSON output Support for Ollama JSON output format which comes in version 0.5
Fix Debug logging TODO Debug logging Right now many if debug statements are left in the code
Results Manager Coverage TODO Results Manager Results Manager should manage logs, results, and scripts

Further Reading

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

baish-0.1.0a4.tar.gz (24.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

baish-0.1.0a4-py3-none-any.whl (23.6 kB view details)

Uploaded Python 3

File details

Details for the file baish-0.1.0a4.tar.gz.

File metadata

  • Download URL: baish-0.1.0a4.tar.gz
  • Upload date:
  • Size: 24.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.3

File hashes

Hashes for baish-0.1.0a4.tar.gz
Algorithm Hash digest
SHA256 8373a25d7c2216465ab2bbdd42cd7c6b6583b384a406321fe183b070e89e7f93
MD5 e1ddac7cec68506b4fedf16b41cdaeca
BLAKE2b-256 88055fd7797058a22b0fed270dd139997384e0b5d588c0564c6ade19b5000296

See more details on using hashes here.

File details

Details for the file baish-0.1.0a4-py3-none-any.whl.

File metadata

  • Download URL: baish-0.1.0a4-py3-none-any.whl
  • Upload date:
  • Size: 23.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.3

File hashes

Hashes for baish-0.1.0a4-py3-none-any.whl
Algorithm Hash digest
SHA256 ad97bcedafe47ff114cc150b7d9329bb1c248d4d3ea8ca6789c6fad0ace99ff2
MD5 062d3f0f2c4d376ef22690d5335be414
BLAKE2b-256 9db0491384a82ae6e42cb352370da81d0f73550276b68c94680bdb9a40b1314e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page