A better tool for secrets search
Project description
DeepSecrets - a better tool for secret scanning
Yet another tool - why?
Existing tools don't really "understand" code. Instead, they mostly parse texts.
DeepSecrets expands classic regex-search approaches with semantic analysis, dangerous variable detection, and more efficient usage of entropy analysis. Code understanding supports 500+ languages and formats and is achieved by lexing and parsing - techniques commonly used in SAST tools.
DeepSecrets also introduces a new way to find secrets: just use hashed values of your known secrets and get them found plain in your code.
Installation
From Github via pip
$ pip install git+https://github.com/avito-tech/deepsecrets.git
From PyPi
$ pip install deepsecrets
Scanning
The easiest way:
$ deepsecrets --target-dir /path/to/your/code --outfile report.json
This will run a scan against /path/to/your/code
using the default configuration:
- Regex checks by the built-in ruleset
- Semantic checks (variable detection, entropy checks)
Report will be saved to report.json
Fine-tuning
Run deepsecrets --help
for details.
Basically, you can use your own ruleset by specifying --regex-rules
. Paths to be excluded from scanning can be set via --excluded-paths
.
Building rulesets
Regex
The built-in ruleset for regex checks is located in /deepsecrets/rules/regexes.json
. You're free to follow the format and create a custom ruleset.
HashedSecret
Example ruleset for regex checks is located in /deepsecrets/rules/regexes.json
. You're free to follow the format and create a custom ruleset.
Contributing
Under the hood
There are several core concepts:
File
Tokenizer
Token
Engine
Finding
ScanMode
File
Just a pythonic representation of a file with all needed methods for management.
Tokenizer
A component able to break the content of a file into pieces - Tokens - by its logic. There are four types of tokenizers available:
FullContentTokenizer
: treats all content as a single token. Useful for regex-based search.PerWordTokenizer
: breaks given content by words and line breaks.LexerTokenizer
: uses language-specific smarts to break code into semantically correct pieces with additional context for each token.
Token
A string with additional information about its semantic role, corresponding file, and location inside it.
Engine
A component performing secrets search for a single token by its own logic. Returns a set of Findings. There are three engines available:
RegexEngine
: checks tokens' values through a special rulesetSemanticEngine
: checks tokens produced by the LexerTokenizer using additional context - variable names and valuesHashedSecretEngine
: checks tokens' values by hashing them and trying to find coinciding hashes inside a special ruleset
Finding
This is a data structure representing a problem detected inside code. Features information about the precise location inside a file and a rule that found it.
ScanMode
This component is responsible for the scan process.
- Defines the scope of analysis for a given work directory respecting exceptions
- Allows declaring a
PerFileAnalyzer
- the method called against each file, returning a list of findings. The primary usage is to initialize necessary engines, tokenizers, and rulesets. - Runs the scan: a multiprocessing pool analyzes every file in parallel.
- Prepares results for output and outputs them.
The current implementation has a CliScanMode
built by the user-provided config through the cli args.
Local development
The project is supposed to be developed using VSCode and 'Remote containers' feature.
Steps:
- Clone the repository
- Open the cloned folder with VSCode
- Agree with 'Reopen in container'
- Wait until the container is built and necessary extensions are installed
- You're ready
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for deepsecrets-1.0.3-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4cfa7c63bf6eff22e42a9894cc95f96de24958c97898a9a339395159cb0e92f5 |
|
MD5 | e04f2fef5c96da96d59039323ade57b3 |
|
BLAKE2b-256 | bba6311da48dc57cf332d8f245d5eee46ae98a55b0759a3151b3e47143cd472d |