Fast and syntax-aware semantic code pattern search for many languages: like grep but for code
Project description
Semgrep
semgrep is a tool for easily detecting and preventing bugs and anti-patterns in
your codebase. It combines the convenience of grep with the correctness of
syntactical and semantic search. Developers, DevOps engineers, and security engineers
use semgrep to write code with confidence.
Try it now: https://semgrep.live
Overview
Language support:
| Python | Javascript | Go | Java | C | Typescript | PHP |
|---|---|---|---|---|---|---|
| ✅ | ✅ | ✅ | ✅ | ✅ | Coming... | Coming... |
Example patterns:
| Pattern | Matches |
|---|---|
$X == $X |
if (node.id == node.id): ... |
requests.get(..., verify=False, ...) |
requests.get(url, timeout=3, verify=False) |
os.system(...) |
from os import system; system('echo semgrep') |
$ELEMENT.innerHTML |
el.innerHTML = "<img src='x' onerror='alert(`XSS`)'>"; |
$TOKEN.SignedString([]byte("...")) |
ss, err := token.SignedString([]byte("HARDCODED KEY")) |
→ see more example patterns in the live registry viewer
Installation
On macOS, binaries are available via Homebrew:
brew install returntocorp/semgrep/semgrep
On Ubuntu, an install script is available on each release here
./semgrep-v0.9.0-ubuntu-generic.sh
To try semgrep without installation, you can also run it via Docker:
docker run --rm -v "${PWD}:/home/repo" returntocorp/semgrep --help
Usage
Example Usage
Here is a simple Python example, test.py. We want to retrieve an object by ID:
def get_node(node_id, nodes):
for node in nodes:
if node.id == node.id: # Oops, supposed to be 'node_id'
return node
return None
This is a bug. Let's use semgrep to find bugs like it, using a simple search pattern: $X == $X. It will find all places in our code where the left- and right-hand sides of a comparison are the same expression:
$ semgrep --lang python --pattern '$X == $X' test.py
test.py
3: if node.id == node.id: # Oops, supposed to be 'node_id'
Configuration
For simple patterns use the --lang and --pattern flags. This mode of
operation is useful for quickly iterating on a pattern on a single file or
folder:
semgrep --lang javascript --pattern 'eval(...)' path/to/file.js
Configuration Files
For advanced configuration use the --config flag. This flag automagically
handles a multitude of input configuration types:
--config <file|folder|yaml_url|tarball_url|registy_name>
In the absence of this flag, a default configuration is loaded from .semgrep.yml
or multiple files matching .semgrep/**/*.yml.
Registry
As mentioned above, you may also specify a registry_name as configuration.
r2c provides a registry
of rules. These rules have been tuned on thousands of repositories
using our analysis platform.
You can browse the registry at semgrep.live/r. To run a set of rules, use a rule ID or namespace.
# Run a specific rule
semgrep --config=https://semgrep.live/c/r/java.spring.security.audit.cookie-missing-samesite
# Run a set of rules
semgrep --config=https://semgrep.live/c/r/java.spring.security
The registry features rules for many programming errors, including security issues and correctness bugs. Security rules are annotated with CWE and OWASP metadata when applicable. OWASP rule coverage per language is displayed below.
Pattern Features
semgrep patterns make use of two primary features:
- Metavariables like
$X,$WIDGET, or$USERS_2. Metavariable names can only contain uppercase characters, or_, or digits, and must start with an uppercase character or_. Names like$xor$some_valueare invalid. Metavariables are used to track a variable across a specific code scope. - The
...(ellipsis) operator. The ellipsis operator abstracts away sequences of zero or more arguments, statements, characters, and more.
For example,
$FILE = open(...)
will find all occurrences in your code where the result of an open() call with zero or more arguments is assigned
to a variable.
Composing Patterns
You can also construct rules by composing multiple patterns together.
Let's consider an example:
rules:
- id: open-never-closed
patterns:
- pattern: $FILE = open(...)
- pattern-not-inside: |
$FILE = open(...)
...
$FILE.close()
message: "file object opened without corresponding close"
languages: [python]
severity: ERROR
This rule looks for files that are opened but never closed. It accomplishes
this by looking for the open(...) pattern and not a following close()
pattern. The $FILE metavariable ensures that the same variable name is used
in the open and close calls. The ellipsis operator allows for any arguments
to be passed to open and any sequence of code statements in-between the open
and close calls. We don't care how open is called or what happens up to
a close call, we just need to make sure close is called.
For more information on rule fields like patterns and pattern-not-inside
see the configuration documentation.
Equivalences
Equivalences are another key concept in semgrep. semgrep automatically searches
for code that is semantically equivalent. For example, the following patterns
are semantically equivalent. The pattern subprocess.Popen(...) will fire on both.
subprocess.Popen("ls")
from subprocess import Popen as sub_popen
result = sub_popen("ls")
For a full list of semgrep feature support by language see the
language matrix.
Programmatic Usage
To integrate semgrep's results with other tools,
you can get results in machine-readable JSON format with the --json option,
or formatted according to the
SARIF standard
with the --sarif flag.
See our output documentation for details.
Resources
- Semgrep presentation and slides at Bay Area OWASP meetup. Check out the r2c YouTube channel for more videos.
- Pattern features documentation
- Configuration files documentation
- Integrations
- Output
- Development
- Bug reports
Contribution
semgrep is LGPL-licensed, feel free to help out: CONTRIBUTING.
semgrep is a frontend to a larger program analysis library named pfff. pfff began and was open-sourced at Facebook but is now archived. The primary maintainer now works at r2c. semgrep was originally named sgrep and was renamed to avoid collisons with existing projects.
Commercial Support
semgrep is proudly supported by r2c. We're hiring!
Interested in a fully-supported, hosted version of semgrep? Drop your email and we'll ping you!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file semgrep-0.9.0.tar.gz.
File metadata
- Download URL: semgrep-0.9.0.tar.gz
- Upload date:
- Size: 36.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/46.0.0 requests-toolbelt/0.8.0 tqdm/4.36.1 CPython/3.7.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3387ecf72c853a6a0949e8d41b38166e8ec4d773f5c8e26e47bf78178660fa61
|
|
| MD5 |
701b1e0649f28da6fa993518272e4a46
|
|
| BLAKE2b-256 |
e9144e508813f544a3f2390e8660a57e47ebe4873eab6c80ff9402633dce730c
|
File details
Details for the file semgrep-0.9.0-cp36.cp37.cp38.py36.py37.py38-none-manylinux1_x86_64.whl.
File metadata
- Download URL: semgrep-0.9.0-cp36.cp37.cp38.py36.py37.py38-none-manylinux1_x86_64.whl
- Upload date:
- Size: 1.8 MB
- Tags: CPython 3.6, CPython 3.7, CPython 3.8, Python 3.6, Python 3.7, Python 3.8
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/46.0.0 requests-toolbelt/0.8.0 tqdm/4.36.1 CPython/3.7.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b983f4daf629d2da51b690bb3d45150412b51fb8b1e3698debed29b413a41bdd
|
|
| MD5 |
cf2b480a27485bab2a177bfee15f9511
|
|
| BLAKE2b-256 |
a7f7df58997a22ab6a782e0bbb43b99c0e580a7b0da95999803dae3baf5c95fc
|
File details
Details for the file semgrep-0.9.0-cp36.cp37.cp38.py36.py37.py38-none-macosx_10_14_x86_64.whl.
File metadata
- Download URL: semgrep-0.9.0-cp36.cp37.cp38.py36.py37.py38-none-macosx_10_14_x86_64.whl
- Upload date:
- Size: 1.4 MB
- Tags: CPython 3.6, CPython 3.7, CPython 3.8, Python 3.6, Python 3.7, Python 3.8, macOS 10.14+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/46.0.0 requests-toolbelt/0.8.0 tqdm/4.36.1 CPython/3.7.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dfd5e8d17418732d510e0e3607feae6d167253821940c91292d3ece7adbc07a2
|
|
| MD5 |
28c73727a9bd2e44468967e1b7bd07e9
|
|
| BLAKE2b-256 |
cd7f6809151ea39ac9e5c7ffcf151e04663340bd764a2c9713f4f0e0ccaa2ac8
|