A Git Repository Leaks Scanner Python library written in C++
Project description
A Git Repository Leaks Scanner Python library written in C++
Table of Contents
About The Project
PyRepScan is a python library written in C++. The library uses libgit2 for repository parsing and traversing, re2 for regex pattern matching and cpp-taskflow for concurrency. The library was written to achieve high performance and python bindings.
Built With
Performance
CPU
Library | Time | Improvement Factor |
---|---|---|
PyRepScan | 2.18s | 1.0x |
gitleaks | 63.0s | 28.9x |
Prerequisites
In order to compile this package you should have GCC & Python development package installed.
- Fedora
sudo dnf install python3-devel gcc-c++ libgit2-devel re2-devel
- Ubuntu 18.04
sudo apt install python3-dev g++-9 libgit2-dev libre2-dev
Installation
pip3 install PyRepScan
Usage
import pyrepscan
grs = pyrepscan.GitRepositoryScanner()
# Adds a specific rule, can be called multiple times or none
grs.add_rule(
name='First Rule',
match_pattern=r'''(-----BEGIN PRIVATE KEY-----)''',
match_whitelist_patterns=[],
match_blacklist_patterns=[],
)
# Compiles the rules. Should be called only once after all the rules were added
grs.compile_rules()
# Add file extensions to ignore during the search
grs.add_ignored_file_extension(
file_extension='bin',
)
grs.add_ignored_file_extension(
file_extension='jpg',
)
# Add file paths to ignore during the search. Free text is allowed
grs.add_ignored_file_path(
file_path='site-packages',
)
grs.add_ignored_file_path(
file_path='node_modules',
)
# Scans a repository
results = grs.scan(
repository_path='/repository/path',
)
# Results is a list of dicts. Each dict is in the following format:
{
'rule_name': 'First Rule',
'author_email': 'author@email.email',
'author_name': 'Author Name',
'commit_id': '1111111111111111111111111111111111111111',
'commit_message': 'The commit message',
'commit_time': '2020-01-01T00:00:00e',
'file_path': 'full/file/path',
'file_oid': '47d2739ba2c34690248c8f91b84bb54e8936899a',
'match': 'The matched group',
}
# Fetch the file_oid full content
file_content = grs.get_file_content(
repository_path='/repository/path',
file_oid='47d2739ba2c34690248c8f91b84bb54e8936899a',
)
# file_content
b'binary data'
License
Distributed under the MIT License. See LICENSE
for more information.
Contact
Gal Ben David - gal@intsights.com
Project Link: https://github.com/intsights/PyRepScan
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
PyRepScan-0.2.0.tar.gz
(252.6 kB
view hashes)