A Git Repository Secrets Scanner written in Rust
Project description
A Git Repository Secrets Scanner written in Rust
Table of Contents
About The Project
PyRepScan is a python library written in Rust. The library uses git2-rs for repository parsing and traversing, regex for regex pattern matching and crossbeam for concurrency. The library was written to achieve high performance and python bindings.
Built With
Performance
CPU
Library | Time | Peak Memory |
---|---|---|
PyRepScan | 8.74s | 1,149,152 kb |
gitleaks | 1118s | 1,146,300 kb |
Installation
pip3 install PyRepScan
Documentation
class GitRepositoryScanner:
def __init__(
self,
) -> None
This class holds all the added rules for fast reuse.
def add_content_rule(
self,
name: str,
pattern: str,
whitelist_patterns: typing.List[str],
blacklist_patterns: typing.List[str],
) -> None
The add_content_rule
function adds a new rule to an internal list of rules that could be reused multiple times against different repositories. The same name can be used multiple times and would lead to results which can hold the same name. Content rule means that the regex pattern would be tested against the content of the files.
name
- The name of the rule so it can be identified.pattern
- The regex pattern (Rust Regex syntax) to match against the content of the commited files.whitelist_patterns
- A list of regex patterns (Rust Regex syntax) to match against the content of the committed file to filter in results. Only one of the patterns should be matched to pass through the result. There is an OR relation between the patterns.blacklist_patterns
- A list of regex patterns (Rust Regex syntax) to match against the content of the committed file to filter out results. Only one of the patterns should be matched to omit the result. There is an OR relation between the patterns.
def add_file_path_rule(
self,
name: str,
pattern: str,
) -> None
The add_file_path_rule
function adds a new rule to an internal list of rules that could be reused multiple times against different repositories. The same name can be used multiple times and would lead to results which can hold the same name. File name rule means that the regex pattern would be tested against the file paths.
name
- The name of the rule so it can be identified.pattern
- The regex pattern (Rust Regex syntax) to match against the file paths of the commited files.
def add_file_extension_to_skip(
self,
file_extension: str,
) -> None
The add_file_extension_to_skip
function adds a new file extension to the filtering phase to reduce the amount of inspected files and to increase the performance of the scan.
file_extension
- A file extension, without a leading dot, to filter out from the scan.
def add_file_path_to_skip(
self,
file_path: str,
) -> None
The add_file_path_to_skip
function adds a new file path pattern to the filtering phase to reduce the amount of inspected files and to increase the performance of the scan. Every file path that would include the file_path
substring would be left out of the scanned files.
file_path
- If the inspected file path would include this substring, it won't be scanned. This parameter is a free text.
def scan(
self,
repository_path: str,
branch_glob_pattern: typing.Optional[str],
from_timestamp: typing.Optional[int],
) -> typing.List[typing.Dict[str, str]]
The scan
function is the main function in the library. Calling this function would trigger a new scan that would return a list of matches. The scan function is a multithreaded operation, that would utilize all the available core in the system. The results would not include the file content but only the regex matching group. To retrieve the full file content one should take the results['oid']
and to call get_file_content
function.
repository_path
- The git repository folder path.branch_glob_pattern
- A glob pattern to filter branches for the scan. If None is sent, defaults to*
.from_timestamp
- A UTC timestamp (Int) that only commits that were created after this timestamp would be included in the scan. If None is sent, defaults to0
.
A sample result would look like this:
{
'rule_name': 'First Rule',
'author_email': 'author@email.email',
'author_name': 'Author Name',
'commit_id': '1111111111111111111111111111111111111111',
'commit_message': 'The commit message',
'commit_time': '2020-01-01T00:00:00e',
'file_path': 'full/file/path',
'file_oid': '47d2739ba2c34690248c8f91b84bb54e8936899a',
'match': 'The matched group',
}
def scan_from_url(
self,
url: str,
repository_path: str,
branch_glob_pattern: typing.Optional[str],
from_timestamp: typing.Optional[int],
) -> typing.List[typing.Dict[str, str]]
The same as scan
function but also clones a repository from a given URL into the provided repository path.
url
- URL of a git repository.repository_path
- The path to clone the repository tobranch_glob_pattern
- A glob pattern to filter branches for the scan. If None is sent, defaults to*
.from_timestamp
- A UTC timestamp (Int) that only commits that were created after this timestamp would be included in the scan. If None is sent, defaults to0
.
def get_file_content(
self,
repository_path: str,
file_oid: str,
) -> bytes
The get_file_content
function exists to retrieve the content of a file that was previously matched. The full file content is omitted from the results to reduce the results list size and to deliver better performance.
repository_path
- The git repository folder path.file_oid
- A string representing the file oid. This parameter exists in the results dictionary returned by thescan
function.
Usage
import pyrepscan
grs = pyrepscan.GitRepositoryScanner()
# Adds a specific rule, can be called multiple times or none
grs.add_content_rule(
name='First Rule',
pattern=r'(-----BEGIN PRIVATE KEY-----)',
whitelist_patterns=[],
blacklist_patterns=[],
)
grs.add_file_path_rule(
name='Second Rule',
pattern=r'.+\.pem',
)
grs.add_file_path_rule(
name='Third Rule',
pattern=r'(prod|dev|stage).+key',
)
# Add file extensions to ignore during the search
grs.add_file_extension_to_skip(
file_extension='bin',
)
grs.add_file_extension_to_skip(
file_extension='jpg',
)
# Add file paths to ignore during the search. Free text is allowed
grs.add_file_path_to_skip(
file_path='site-packages',
)
grs.add_file_path_to_skip(
file_path='node_modules',
)
# Scans a repository
results = grs.scan(
repository_path='/repository/path',
branch_glob_pattern='*',
)
# Results is a list of dicts. Each dict is in the following format:
{
'rule_name': 'First Rule',
'author_email': 'author@email.email',
'author_name': 'Author Name',
'commit_id': '1111111111111111111111111111111111111111',
'commit_message': 'The commit message',
'commit_time': '2020-01-01T00:00:00e',
'file_path': 'full/file/path',
'file_oid': '47d2739ba2c34690248c8f91b84bb54e8936899a',
'match': 'The matched group',
}
# Fetch the file_oid full content
file_content = grs.get_file_content(
repository_path='/repository/path',
file_oid='47d2739ba2c34690248c8f91b84bb54e8936899a',
)
# file_content
b'binary data'
# Creating a RulesManager directly
rules_manager = pyrepscan.RulesManager()
# For testing purposes, check your regexes pattern using check_pattern function
rules_manager.check_pattern(
content='some content1 to check, another content2 in the same line\nanother content3 in another line\n',
pattern=r'(content\d)',
)
# Results are the list of captured matches
[
'content1',
'content2',
'content3',
]
License
Distributed under the MIT License. See LICENSE
for more information.
Contact
Gal Ben David - gal@intsights.com
Project Link: https://github.com/intsights/PyRepScan
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
File details
Details for the file pyrepscan-0.12.0-cp311-none-win_amd64.whl
.
File metadata
- Download URL: pyrepscan-0.12.0-cp311-none-win_amd64.whl
- Upload date:
- Size: 1.2 MB
- Tags: CPython 3.11, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/0.12.20
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2507f112b85f4d816362699a8c82945cd3e44d6db4fb631915b62af0fec75e62 |
|
MD5 | a414bccb563fc2f3630ce71d74994e43 |
|
BLAKE2b-256 | 3a8457193d54e6a2249424391b744e587cad9dc69448dddd75b4706e62ec27da |
File details
Details for the file pyrepscan-0.12.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
.
File metadata
- Download URL: pyrepscan-0.12.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 2.4 MB
- Tags: CPython 3.11, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/0.12.20
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d6a8ef952c6f381cad1d76a8328d1e52badd9ede5e410ccb0475106a033429ef |
|
MD5 | aab3260b080d817b7f41dab13316947e |
|
BLAKE2b-256 | 0b87ec40afc36e5677dc8daac950e9292e3635b1d941279c4dcab595f3ae1646 |
File details
Details for the file pyrepscan-0.12.0-cp311-cp311-macosx_11_0_arm64.whl
.
File metadata
- Download URL: pyrepscan-0.12.0-cp311-cp311-macosx_11_0_arm64.whl
- Upload date:
- Size: 1.7 MB
- Tags: CPython 3.11, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/0.12.20
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7dccf5c53787b9ed9e24c0b41c3fe85e6c40ea081a95137ce14dc05224bc4876 |
|
MD5 | cfff8a17cedc9ed80e41816c970406e1 |
|
BLAKE2b-256 | c78aacd284a392da72ebc1cd66df21cb0b12b396489c299bdf963cd4bfb89297 |
File details
Details for the file pyrepscan-0.12.0-cp311-cp311-macosx_10_7_x86_64.whl
.
File metadata
- Download URL: pyrepscan-0.12.0-cp311-cp311-macosx_10_7_x86_64.whl
- Upload date:
- Size: 1.9 MB
- Tags: CPython 3.11, macOS 10.7+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/0.12.20
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 696ae781abf6fc17b571d47fda0a7430b851ffb0174758f7ea153b19ac54c0c0 |
|
MD5 | 0e07c0d9fe07e9506a0fcca3bd8578d1 |
|
BLAKE2b-256 | 17b261a9779e20e37bdc430c5575cf9fc605ff796610eeb2086b4510e5f04fc2 |
File details
Details for the file pyrepscan-0.12.0-cp310-none-win_amd64.whl
.
File metadata
- Download URL: pyrepscan-0.12.0-cp310-none-win_amd64.whl
- Upload date:
- Size: 1.2 MB
- Tags: CPython 3.10, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/0.12.20
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 42d3a49f90b220c4b4873e0dc37638e00bcb283f981febd2be02def9b882b6a8 |
|
MD5 | 105cfa9fde50f8616481fa05a2b3de8e |
|
BLAKE2b-256 | 7b24f7dc3a1d1ca719c1e11d1eb90ff5b53eaac6ff250ffed4bd9471c89b1448 |
File details
Details for the file pyrepscan-0.12.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
.
File metadata
- Download URL: pyrepscan-0.12.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 2.4 MB
- Tags: CPython 3.10, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/0.12.20
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f984ea79e25ebbbf745d5af6270eda9c8395c6519d109f3a8b154a7091662440 |
|
MD5 | 855c38676e7da427849c1c7f854b8223 |
|
BLAKE2b-256 | a0398a6dcbd7d7bd32c3509589ac31198e95a1caea4274d473ca5766c6c0fadb |
File details
Details for the file pyrepscan-0.12.0-cp310-cp310-macosx_11_0_arm64.whl
.
File metadata
- Download URL: pyrepscan-0.12.0-cp310-cp310-macosx_11_0_arm64.whl
- Upload date:
- Size: 1.7 MB
- Tags: CPython 3.10, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/0.12.20
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d535654f1d62c2bd8e70bd40e18d56ef11526cef8104b68c10a0e889d876a011 |
|
MD5 | 10d7381c0214006d87ed7511789bf45e |
|
BLAKE2b-256 | 2cdbbaad8ddc0403118dac2214702b5ee7dc97329a5018541ec5eb0c920af663 |
File details
Details for the file pyrepscan-0.12.0-cp310-cp310-macosx_10_7_x86_64.whl
.
File metadata
- Download URL: pyrepscan-0.12.0-cp310-cp310-macosx_10_7_x86_64.whl
- Upload date:
- Size: 1.9 MB
- Tags: CPython 3.10, macOS 10.7+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/0.12.20
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9cee6d0f399ff19c049afafe8998bc92ff2fd235fcd00153237b222088f5a9a6 |
|
MD5 | 2e9719222855f028dc296ff7db4cb552 |
|
BLAKE2b-256 | bc5ecd4b34e44535f9779c614be7fefce3dd1eb13d0137a9eb357634085c25c8 |
File details
Details for the file pyrepscan-0.12.0-cp39-none-win_amd64.whl
.
File metadata
- Download URL: pyrepscan-0.12.0-cp39-none-win_amd64.whl
- Upload date:
- Size: 1.2 MB
- Tags: CPython 3.9, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/0.12.20
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | dd5fbd29d58afd221386c1bd52995f75eac4899421dc75891f90f349fa7d2714 |
|
MD5 | 533227ffdf68cc7bd41deb98a56da89c |
|
BLAKE2b-256 | 158915a70016355388548e91ce6bbdc2bdc6900eb271d24c362ed8243d2d8aaa |
File details
Details for the file pyrepscan-0.12.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
.
File metadata
- Download URL: pyrepscan-0.12.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 2.4 MB
- Tags: CPython 3.9, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/0.12.20
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 028f2a13e109c429e10c3b4649dcb641eee803083472e4acce40bf77c080a648 |
|
MD5 | 5760ec0c574d63337506742b44bd6661 |
|
BLAKE2b-256 | 96c7f562ad375ba9bcd6909d2112501cdc45dbbad465c3832fd36817aff2d723 |
File details
Details for the file pyrepscan-0.12.0-cp39-cp39-macosx_11_0_arm64.whl
.
File metadata
- Download URL: pyrepscan-0.12.0-cp39-cp39-macosx_11_0_arm64.whl
- Upload date:
- Size: 1.7 MB
- Tags: CPython 3.9, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/0.12.20
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c6f69f379c1338db3434193ad7069b4749c77f85b4f9624e48f6f313b2570ca3 |
|
MD5 | 8a4ffccb5f7040f5f78be6d254ec56e0 |
|
BLAKE2b-256 | 374792f5c290350df8851d404b036f46ca6089b012483f7bae34f7b6dcf55c89 |
File details
Details for the file pyrepscan-0.12.0-cp39-cp39-macosx_10_7_x86_64.whl
.
File metadata
- Download URL: pyrepscan-0.12.0-cp39-cp39-macosx_10_7_x86_64.whl
- Upload date:
- Size: 1.9 MB
- Tags: CPython 3.9, macOS 10.7+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/0.12.20
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 14c838d860c4f22cd62cdd4d65e2bce2217caee2d8dbdea020be3fe9ece8e797 |
|
MD5 | dcd06a6c6f5f02583e71559cd221ff92 |
|
BLAKE2b-256 | 58e45636383d44a6f6911b7b9a2693720c966e1076ffad11a754e48b2da13753 |
File details
Details for the file pyrepscan-0.12.0-cp38-none-win_amd64.whl
.
File metadata
- Download URL: pyrepscan-0.12.0-cp38-none-win_amd64.whl
- Upload date:
- Size: 1.2 MB
- Tags: CPython 3.8, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/0.12.20
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 733aaef442a1073581f33a1a775d07d411f2954f823ae559c95dbec69c1dccbd |
|
MD5 | 040aac834a3178878035c663fb45985f |
|
BLAKE2b-256 | 6066129eff526cce06ca41974fa5394ae14cbddf3c1fd6a29396b165e9554ced |
File details
Details for the file pyrepscan-0.12.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
.
File metadata
- Download URL: pyrepscan-0.12.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 2.4 MB
- Tags: CPython 3.8, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/0.12.20
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6a9a738af5269088502737e8d3887e3b488ebb86343782780867f6ee90e4e7b6 |
|
MD5 | 10e309db76fb98f303a83d902585ba0e |
|
BLAKE2b-256 | fef9cbbdc29e75850695f6843cc742b5b0e33e40bd35b876769dbee15aef5f09 |
File details
Details for the file pyrepscan-0.12.0-cp38-cp38-macosx_11_0_arm64.whl
.
File metadata
- Download URL: pyrepscan-0.12.0-cp38-cp38-macosx_11_0_arm64.whl
- Upload date:
- Size: 1.7 MB
- Tags: CPython 3.8, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/0.12.20
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e004ed7db4351111225c017fa4867a29cb42d31781fa77c121e106bca6bbc0cc |
|
MD5 | 97f37fd8e9332763993ed43ba5688a41 |
|
BLAKE2b-256 | a648ac24eaf7ba8e7bec1d7e1f146d01de6e108440c8dfc99624ef44054f7040 |
File details
Details for the file pyrepscan-0.12.0-cp38-cp38-macosx_10_7_x86_64.whl
.
File metadata
- Download URL: pyrepscan-0.12.0-cp38-cp38-macosx_10_7_x86_64.whl
- Upload date:
- Size: 1.9 MB
- Tags: CPython 3.8, macOS 10.7+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/0.12.20
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 95cefa9aeabcbc4aa2978cf66205f11c85cd1e9a96fb741a06a0e67973169379 |
|
MD5 | ef3a8a147425a0520c6a624e843cb9cc |
|
BLAKE2b-256 | 2e05b6bb6779f7e6d0f835c69e146c296027699d66e5fa239bb108cc745c6ae0 |
File details
Details for the file pyrepscan-0.12.0-cp37-none-win_amd64.whl
.
File metadata
- Download URL: pyrepscan-0.12.0-cp37-none-win_amd64.whl
- Upload date:
- Size: 1.2 MB
- Tags: CPython 3.7, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/0.12.20
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | bed291648143bcb9d53343402a17b0afe810881819f740e62e6a5abc2ca95961 |
|
MD5 | 38f658071621e9d7ae37bdea2a2ff65b |
|
BLAKE2b-256 | 7f1c3775a72f9f372e78733fcac3736ac6a026960b01613ce013581f6ec8ec10 |
File details
Details for the file pyrepscan-0.12.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
.
File metadata
- Download URL: pyrepscan-0.12.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 2.4 MB
- Tags: CPython 3.7m, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/0.12.20
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b71c5997e78d442c119bdf95535c2394b34e9fa6f7e2d546fdaa816fafd175ab |
|
MD5 | 38696ed5c65221453029fb4d26d9372f |
|
BLAKE2b-256 | 2c4d27acd9a28cfbd3ccd9ecb76a3fb11f92bc5d20735901338267e77f44efc6 |
File details
Details for the file pyrepscan-0.12.0-cp37-cp37m-macosx_11_0_arm64.whl
.
File metadata
- Download URL: pyrepscan-0.12.0-cp37-cp37m-macosx_11_0_arm64.whl
- Upload date:
- Size: 1.7 MB
- Tags: CPython 3.7m, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/0.12.20
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c1d8797e18aa001aabeb59183cb6eacc254085c0ec814a4be7ec4424be4adc10 |
|
MD5 | c017efdd7e3d4e4a9151e603e9969055 |
|
BLAKE2b-256 | be76958d498639a60267aed3e968e1ac49c83a01e45802f18247a6cad2837cb0 |
File details
Details for the file pyrepscan-0.12.0-cp37-cp37m-macosx_10_7_x86_64.whl
.
File metadata
- Download URL: pyrepscan-0.12.0-cp37-cp37m-macosx_10_7_x86_64.whl
- Upload date:
- Size: 1.9 MB
- Tags: CPython 3.7m, macOS 10.7+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/0.12.20
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 00b595e4495be903f83814edef3cf80cfa0a7d99f0d4a25c68121da2136fdd7c |
|
MD5 | 3e33f18e1ee9a7f3b2cbf3b2e584ebd4 |
|
BLAKE2b-256 | 1e3e26830eea6a6d7b1cca4625377d5fb6e87045e31ac4c17f698c421fe6e1ff |