Skip to main content

A Git Repository Secrets Scanner written in Rust

Project description

Logo

A Git Repository Secrets Scanner written in Rust

license Python Build PyPi

Table of Contents

About The Project

PyRepScan is a python library written in Rust. The library uses git2-rs for repository parsing and traversing, regex for regex pattern matching and crossbeam for concurrency. The library was written to achieve high performance and python bindings.

Built With

Performance

CPU

Library Time Peak Memory
PyRepScan 8.74s 1,149,152 kb
gitleaks 1118s 1,146,300 kb

Installation

pip3 install PyRepScan

Documentation

class GitRepositoryScanner:
    def __init__(
      self,
    ) -> None

This class holds all the added rules for fast reuse.

def add_content_rule(
    self,
    name: str,
    pattern: str,
    whitelist_patterns: typing.List[str],
    blacklist_patterns: typing.List[str],
) -> None

The add_content_rule function adds a new rule to an internal list of rules that could be reused multiple times against different repositories. The same name can be used multiple times and would lead to results which can hold the same name. Content rule means that the regex pattern would be tested against the content of the files.

  • name - The name of the rule so it can be identified.
  • pattern - The regex pattern (Rust Regex syntax) to match against the content of the commited files.
  • whitelist_patterns - A list of regex patterns (Rust Regex syntax) to match against the content of the committed file to filter in results. Only one of the patterns should be matched to pass through the result. There is an OR relation between the patterns.
  • blacklist_patterns - A list of regex patterns (Rust Regex syntax) to match against the content of the committed file to filter out results. Only one of the patterns should be matched to omit the result. There is an OR relation between the patterns.
def add_file_path_rule(
    self,
    name: str,
    pattern: str,
) -> None

The add_file_path_rule function adds a new rule to an internal list of rules that could be reused multiple times against different repositories. The same name can be used multiple times and would lead to results which can hold the same name. File name rule means that the regex pattern would be tested against the file paths.

  • name - The name of the rule so it can be identified.
  • pattern - The regex pattern (Rust Regex syntax) to match against the file paths of the commited files.
def add_file_extension_to_skip(
    self,
    file_extension: str,
) -> None

The add_file_extension_to_skip function adds a new file extension to the filtering phase to reduce the amount of inspected files and to increase the performance of the scan.

  • file_extension - A file extension, without a leading dot, to filter out from the scan.
def add_file_path_to_skip(
    self,
    file_path: str,
) -> None

The add_file_path_to_skip function adds a new file path pattern to the filtering phase to reduce the amount of inspected files and to increase the performance of the scan. Every file path that would include the file_path substring would be left out of the scanned files.

  • file_path - If the inspected file path would include this substring, it won't be scanned. This parameter is a free text.
def scan(
    self,
    repository_path: str,
    branch_glob_pattern: typing.Optional[str],
    from_timestamp: typing.Optional[int],
) -> typing.List[typing.Dict[str, str]]

The scan function is the main function in the library. Calling this function would trigger a new scan that would return a list of matches. The scan function is a multithreaded operation, that would utilize all the available core in the system. The results would not include the file content but only the regex matching group. To retrieve the full file content one should take the results['oid'] and to call get_file_content function.

  • repository_path - The git repository folder path.
  • branch_glob_pattern - A glob pattern to filter branches for the scan. If None is sent, defaults to *.
  • from_timestamp - A UTC timestamp (Int) that only commits that were created after this timestamp would be included in the scan. If None is sent, defaults to 0.

A sample result would look like this:

{
    'rule_name': 'First Rule',
    'author_email': 'author@email.email',
    'author_name': 'Author Name',
    'commit_id': '1111111111111111111111111111111111111111',
    'commit_message': 'The commit message',
    'commit_time': '2020-01-01T00:00:00e',
    'file_path': 'full/file/path',
    'file_oid': '47d2739ba2c34690248c8f91b84bb54e8936899a',
    'match': 'The matched group',
}
def scan_from_url(
    self,
    url: str,
    repository_path: str,
    branch_glob_pattern: typing.Optional[str],
    from_timestamp: typing.Optional[int],
) -> typing.List[typing.Dict[str, str]]

The same as scan function but also clones a repository from a given URL into the provided repository path.

  • url - URL of a git repository.
  • repository_path - The path to clone the repository to
  • branch_glob_pattern - A glob pattern to filter branches for the scan. If None is sent, defaults to *.
  • from_timestamp - A UTC timestamp (Int) that only commits that were created after this timestamp would be included in the scan. If None is sent, defaults to 0.
def get_file_content(
    self,
    repository_path: str,
    file_oid: str,
) -> bytes

The get_file_content function exists to retrieve the content of a file that was previously matched. The full file content is omitted from the results to reduce the results list size and to deliver better performance.

  • repository_path - The git repository folder path.
  • file_oid - A string representing the file oid. This parameter exists in the results dictionary returned by the scan function.

Usage

import pyrepscan

grs = pyrepscan.GitRepositoryScanner()

# Adds a specific rule, can be called multiple times or none
grs.add_content_rule(
    name='First Rule',
    pattern=r'(-----BEGIN PRIVATE KEY-----)',
    whitelist_patterns=[],
    blacklist_patterns=[],
)
grs.add_file_path_rule(
    name='Second Rule',
    pattern=r'.+\.pem',
)
grs.add_file_path_rule(
    name='Third Rule',
    pattern=r'(prod|dev|stage).+key',
)

# Add file extensions to ignore during the search
grs.add_file_extension_to_skip(
    file_extension='bin',
)
grs.add_file_extension_to_skip(
    file_extension='jpg',
)

# Add file paths to ignore during the search. Free text is allowed
grs.add_file_path_to_skip(
    file_path='site-packages',
)
grs.add_file_path_to_skip(
    file_path='node_modules',
)

# Scans a repository
results = grs.scan(
    repository_path='/repository/path',
    branch_glob_pattern='*',
)

# Results is a list of dicts. Each dict is in the following format:
{
    'rule_name': 'First Rule',
    'author_email': 'author@email.email',
    'author_name': 'Author Name',
    'commit_id': '1111111111111111111111111111111111111111',
    'commit_message': 'The commit message',
    'commit_time': '2020-01-01T00:00:00e',
    'file_path': 'full/file/path',
    'file_oid': '47d2739ba2c34690248c8f91b84bb54e8936899a',
    'match': 'The matched group',
}

# Fetch the file_oid full content
file_content = grs.get_file_content(
    repository_path='/repository/path',
    file_oid='47d2739ba2c34690248c8f91b84bb54e8936899a',
)

# file_content
b'binary data'

# Creating a RulesManager directly
rules_manager = pyrepscan.RulesManager()

# For testing purposes, check your regexes pattern using check_pattern function
rules_manager.check_pattern(
    content='some content1 to check, another content2 in the same line\nanother content3 in another line\n',
    pattern=r'(content\d)',
)

# Results are the list of captured matches
[
    'content1',
    'content2',
    'content3',
]

License

Distributed under the MIT License. See LICENSE for more information.

Contact

Gal Ben David - gal@intsights.com

Project Link: https://github.com/intsights/PyRepScan

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

pyrepscan-0.12.0-cp311-none-win_amd64.whl (1.2 MB view details)

Uploaded CPython 3.11 Windows x86-64

pyrepscan-0.12.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.4 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

pyrepscan-0.12.0-cp311-cp311-macosx_11_0_arm64.whl (1.7 MB view details)

Uploaded CPython 3.11 macOS 11.0+ ARM64

pyrepscan-0.12.0-cp311-cp311-macosx_10_7_x86_64.whl (1.9 MB view details)

Uploaded CPython 3.11 macOS 10.7+ x86-64

pyrepscan-0.12.0-cp310-none-win_amd64.whl (1.2 MB view details)

Uploaded CPython 3.10 Windows x86-64

pyrepscan-0.12.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.4 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

pyrepscan-0.12.0-cp310-cp310-macosx_11_0_arm64.whl (1.7 MB view details)

Uploaded CPython 3.10 macOS 11.0+ ARM64

pyrepscan-0.12.0-cp310-cp310-macosx_10_7_x86_64.whl (1.9 MB view details)

Uploaded CPython 3.10 macOS 10.7+ x86-64

pyrepscan-0.12.0-cp39-none-win_amd64.whl (1.2 MB view details)

Uploaded CPython 3.9 Windows x86-64

pyrepscan-0.12.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.4 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

pyrepscan-0.12.0-cp39-cp39-macosx_11_0_arm64.whl (1.7 MB view details)

Uploaded CPython 3.9 macOS 11.0+ ARM64

pyrepscan-0.12.0-cp39-cp39-macosx_10_7_x86_64.whl (1.9 MB view details)

Uploaded CPython 3.9 macOS 10.7+ x86-64

pyrepscan-0.12.0-cp38-none-win_amd64.whl (1.2 MB view details)

Uploaded CPython 3.8 Windows x86-64

pyrepscan-0.12.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.4 MB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

pyrepscan-0.12.0-cp38-cp38-macosx_11_0_arm64.whl (1.7 MB view details)

Uploaded CPython 3.8 macOS 11.0+ ARM64

pyrepscan-0.12.0-cp38-cp38-macosx_10_7_x86_64.whl (1.9 MB view details)

Uploaded CPython 3.8 macOS 10.7+ x86-64

pyrepscan-0.12.0-cp37-none-win_amd64.whl (1.2 MB view details)

Uploaded CPython 3.7 Windows x86-64

pyrepscan-0.12.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.4 MB view details)

Uploaded CPython 3.7m manylinux: glibc 2.17+ x86-64

pyrepscan-0.12.0-cp37-cp37m-macosx_11_0_arm64.whl (1.7 MB view details)

Uploaded CPython 3.7m macOS 11.0+ ARM64

pyrepscan-0.12.0-cp37-cp37m-macosx_10_7_x86_64.whl (1.9 MB view details)

Uploaded CPython 3.7m macOS 10.7+ x86-64

File details

Details for the file pyrepscan-0.12.0-cp311-none-win_amd64.whl.

File metadata

File hashes

Hashes for pyrepscan-0.12.0-cp311-none-win_amd64.whl
Algorithm Hash digest
SHA256 2507f112b85f4d816362699a8c82945cd3e44d6db4fb631915b62af0fec75e62
MD5 a414bccb563fc2f3630ce71d74994e43
BLAKE2b-256 3a8457193d54e6a2249424391b744e587cad9dc69448dddd75b4706e62ec27da

See more details on using hashes here.

File details

Details for the file pyrepscan-0.12.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pyrepscan-0.12.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 d6a8ef952c6f381cad1d76a8328d1e52badd9ede5e410ccb0475106a033429ef
MD5 aab3260b080d817b7f41dab13316947e
BLAKE2b-256 0b87ec40afc36e5677dc8daac950e9292e3635b1d941279c4dcab595f3ae1646

See more details on using hashes here.

File details

Details for the file pyrepscan-0.12.0-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for pyrepscan-0.12.0-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 7dccf5c53787b9ed9e24c0b41c3fe85e6c40ea081a95137ce14dc05224bc4876
MD5 cfff8a17cedc9ed80e41816c970406e1
BLAKE2b-256 c78aacd284a392da72ebc1cd66df21cb0b12b396489c299bdf963cd4bfb89297

See more details on using hashes here.

File details

Details for the file pyrepscan-0.12.0-cp311-cp311-macosx_10_7_x86_64.whl.

File metadata

File hashes

Hashes for pyrepscan-0.12.0-cp311-cp311-macosx_10_7_x86_64.whl
Algorithm Hash digest
SHA256 696ae781abf6fc17b571d47fda0a7430b851ffb0174758f7ea153b19ac54c0c0
MD5 0e07c0d9fe07e9506a0fcca3bd8578d1
BLAKE2b-256 17b261a9779e20e37bdc430c5575cf9fc605ff796610eeb2086b4510e5f04fc2

See more details on using hashes here.

File details

Details for the file pyrepscan-0.12.0-cp310-none-win_amd64.whl.

File metadata

File hashes

Hashes for pyrepscan-0.12.0-cp310-none-win_amd64.whl
Algorithm Hash digest
SHA256 42d3a49f90b220c4b4873e0dc37638e00bcb283f981febd2be02def9b882b6a8
MD5 105cfa9fde50f8616481fa05a2b3de8e
BLAKE2b-256 7b24f7dc3a1d1ca719c1e11d1eb90ff5b53eaac6ff250ffed4bd9471c89b1448

See more details on using hashes here.

File details

Details for the file pyrepscan-0.12.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pyrepscan-0.12.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 f984ea79e25ebbbf745d5af6270eda9c8395c6519d109f3a8b154a7091662440
MD5 855c38676e7da427849c1c7f854b8223
BLAKE2b-256 a0398a6dcbd7d7bd32c3509589ac31198e95a1caea4274d473ca5766c6c0fadb

See more details on using hashes here.

File details

Details for the file pyrepscan-0.12.0-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for pyrepscan-0.12.0-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 d535654f1d62c2bd8e70bd40e18d56ef11526cef8104b68c10a0e889d876a011
MD5 10d7381c0214006d87ed7511789bf45e
BLAKE2b-256 2cdbbaad8ddc0403118dac2214702b5ee7dc97329a5018541ec5eb0c920af663

See more details on using hashes here.

File details

Details for the file pyrepscan-0.12.0-cp310-cp310-macosx_10_7_x86_64.whl.

File metadata

File hashes

Hashes for pyrepscan-0.12.0-cp310-cp310-macosx_10_7_x86_64.whl
Algorithm Hash digest
SHA256 9cee6d0f399ff19c049afafe8998bc92ff2fd235fcd00153237b222088f5a9a6
MD5 2e9719222855f028dc296ff7db4cb552
BLAKE2b-256 bc5ecd4b34e44535f9779c614be7fefce3dd1eb13d0137a9eb357634085c25c8

See more details on using hashes here.

File details

Details for the file pyrepscan-0.12.0-cp39-none-win_amd64.whl.

File metadata

File hashes

Hashes for pyrepscan-0.12.0-cp39-none-win_amd64.whl
Algorithm Hash digest
SHA256 dd5fbd29d58afd221386c1bd52995f75eac4899421dc75891f90f349fa7d2714
MD5 533227ffdf68cc7bd41deb98a56da89c
BLAKE2b-256 158915a70016355388548e91ce6bbdc2bdc6900eb271d24c362ed8243d2d8aaa

See more details on using hashes here.

File details

Details for the file pyrepscan-0.12.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pyrepscan-0.12.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 028f2a13e109c429e10c3b4649dcb641eee803083472e4acce40bf77c080a648
MD5 5760ec0c574d63337506742b44bd6661
BLAKE2b-256 96c7f562ad375ba9bcd6909d2112501cdc45dbbad465c3832fd36817aff2d723

See more details on using hashes here.

File details

Details for the file pyrepscan-0.12.0-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for pyrepscan-0.12.0-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 c6f69f379c1338db3434193ad7069b4749c77f85b4f9624e48f6f313b2570ca3
MD5 8a4ffccb5f7040f5f78be6d254ec56e0
BLAKE2b-256 374792f5c290350df8851d404b036f46ca6089b012483f7bae34f7b6dcf55c89

See more details on using hashes here.

File details

Details for the file pyrepscan-0.12.0-cp39-cp39-macosx_10_7_x86_64.whl.

File metadata

File hashes

Hashes for pyrepscan-0.12.0-cp39-cp39-macosx_10_7_x86_64.whl
Algorithm Hash digest
SHA256 14c838d860c4f22cd62cdd4d65e2bce2217caee2d8dbdea020be3fe9ece8e797
MD5 dcd06a6c6f5f02583e71559cd221ff92
BLAKE2b-256 58e45636383d44a6f6911b7b9a2693720c966e1076ffad11a754e48b2da13753

See more details on using hashes here.

File details

Details for the file pyrepscan-0.12.0-cp38-none-win_amd64.whl.

File metadata

File hashes

Hashes for pyrepscan-0.12.0-cp38-none-win_amd64.whl
Algorithm Hash digest
SHA256 733aaef442a1073581f33a1a775d07d411f2954f823ae559c95dbec69c1dccbd
MD5 040aac834a3178878035c663fb45985f
BLAKE2b-256 6066129eff526cce06ca41974fa5394ae14cbddf3c1fd6a29396b165e9554ced

See more details on using hashes here.

File details

Details for the file pyrepscan-0.12.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pyrepscan-0.12.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 6a9a738af5269088502737e8d3887e3b488ebb86343782780867f6ee90e4e7b6
MD5 10e309db76fb98f303a83d902585ba0e
BLAKE2b-256 fef9cbbdc29e75850695f6843cc742b5b0e33e40bd35b876769dbee15aef5f09

See more details on using hashes here.

File details

Details for the file pyrepscan-0.12.0-cp38-cp38-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for pyrepscan-0.12.0-cp38-cp38-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 e004ed7db4351111225c017fa4867a29cb42d31781fa77c121e106bca6bbc0cc
MD5 97f37fd8e9332763993ed43ba5688a41
BLAKE2b-256 a648ac24eaf7ba8e7bec1d7e1f146d01de6e108440c8dfc99624ef44054f7040

See more details on using hashes here.

File details

Details for the file pyrepscan-0.12.0-cp38-cp38-macosx_10_7_x86_64.whl.

File metadata

File hashes

Hashes for pyrepscan-0.12.0-cp38-cp38-macosx_10_7_x86_64.whl
Algorithm Hash digest
SHA256 95cefa9aeabcbc4aa2978cf66205f11c85cd1e9a96fb741a06a0e67973169379
MD5 ef3a8a147425a0520c6a624e843cb9cc
BLAKE2b-256 2e05b6bb6779f7e6d0f835c69e146c296027699d66e5fa239bb108cc745c6ae0

See more details on using hashes here.

File details

Details for the file pyrepscan-0.12.0-cp37-none-win_amd64.whl.

File metadata

File hashes

Hashes for pyrepscan-0.12.0-cp37-none-win_amd64.whl
Algorithm Hash digest
SHA256 bed291648143bcb9d53343402a17b0afe810881819f740e62e6a5abc2ca95961
MD5 38f658071621e9d7ae37bdea2a2ff65b
BLAKE2b-256 7f1c3775a72f9f372e78733fcac3736ac6a026960b01613ce013581f6ec8ec10

See more details on using hashes here.

File details

Details for the file pyrepscan-0.12.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pyrepscan-0.12.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 b71c5997e78d442c119bdf95535c2394b34e9fa6f7e2d546fdaa816fafd175ab
MD5 38696ed5c65221453029fb4d26d9372f
BLAKE2b-256 2c4d27acd9a28cfbd3ccd9ecb76a3fb11f92bc5d20735901338267e77f44efc6

See more details on using hashes here.

File details

Details for the file pyrepscan-0.12.0-cp37-cp37m-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for pyrepscan-0.12.0-cp37-cp37m-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 c1d8797e18aa001aabeb59183cb6eacc254085c0ec814a4be7ec4424be4adc10
MD5 c017efdd7e3d4e4a9151e603e9969055
BLAKE2b-256 be76958d498639a60267aed3e968e1ac49c83a01e45802f18247a6cad2837cb0

See more details on using hashes here.

File details

Details for the file pyrepscan-0.12.0-cp37-cp37m-macosx_10_7_x86_64.whl.

File metadata

File hashes

Hashes for pyrepscan-0.12.0-cp37-cp37m-macosx_10_7_x86_64.whl
Algorithm Hash digest
SHA256 00b595e4495be903f83814edef3cf80cfa0a7d99f0d4a25c68121da2136fdd7c
MD5 3e33f18e1ee9a7f3b2cbf3b2e584ebd4
BLAKE2b-256 1e3e26830eea6a6d7b1cca4625377d5fb6e87045e31ac4c17f698c421fe6e1ff

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page