Skip to main content

This program allows you to search for a specific pattern within the files of a GitHub repository(using PyGithub).

Project description

GitHub Search Using Regex

Saikat Karmakar | 28 Jan:2023


This program allows you to search for a specific pattern within the files of a GitHub repository(Using PyGithub). The program utilizes regular expressions and multithreading to quickly search through the repository's contents and return all files that contain the specified pattern.

Features

  • Search for a specific pattern in the files of a GitHub repository
  • Supports single repository or file containing multiple repository URLs
  • Uses multithreading to improve search performance

Installation

# using pip
pip install git-regex-search

To use this program, you will need to have Python 3 and the PyGithub library installed. You can install PyGithub using pip:

pip3 install -r requirements.txt

You will also need to have a GitHub personal access token. You can create one by going to the GitHub Developer Settings.

Usage

To use the program, you will need to provide a GitHub access token. The token can be passed as an environment variable or in a config.toml file. The program will look for the token in the following order:

  • GITHUB_TOKEN environment variable
  • config.toml file
python3 git_regex_search.py -h                                              
usage: git_regex_search.py [-h] [-u URL] [-r REGEX]

options:
  -h, --help            show this help message and exit
  -u URL, --url URL     URL of the repository (single or file
                        containing URLs)
  -r REGEX, --regex REGEX
                        Regex pattern to search for

For example, to search for the pattern brownie in the repository Aviksaikat/RoadClosed-quillctf-brownie, you would run the following command:

python3 git_regex_search.py -u https://github.com/Aviksaikat/RoadClosed-quillctf-brownie -r "brownie"

The program will then return a list of all files that contain the specified pattern, along with the line number where the pattern was found.

Multithreading

The program utilizes multithreading to search through the repository's contents more quickly. Two threads are created and run simultaneously, each searching through the repository's contents. This allows the program to search through the repository's contents much faster than if it were only using a single thread.

Regular Expressions

The program utilizes regular expressions to search for the specified pattern within the files of the repository. This allows for more powerful and flexible searches, as opposed to simple string matching.

Docker

You can build the image by running the following command in the same directory where the Dockerfile is located:

docker build -t <image-name> .

You can then run the container using the following command:

docker run -e GITHUB_TOKEN=<access_token> <image-name>

Limitations

This program only searches the contents of the files and not the name of the files. For my use I don't need it to search the names.

Additional Features

The program also prints the whole file path where the pattern was found, which is helpful in identifying the location of the pattern.

Conclusion

This program is a powerful tool for quickly searching through the contents of a GitHub repository and finding specific patterns. With the use of regular expressions and multithreading, it is able to search through large repositories quickly and efficiently.

Version 0.3

Published as pip package

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

git_regex_search-1.tar.gz (4.2 kB view details)

Uploaded Source

Built Distribution

git_regex_search-1-py3-none-any.whl (4.4 kB view details)

Uploaded Python 3

File details

Details for the file git_regex_search-1.tar.gz.

File metadata

  • Download URL: git_regex_search-1.tar.gz
  • Upload date:
  • Size: 4.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.9

File hashes

Hashes for git_regex_search-1.tar.gz
Algorithm Hash digest
SHA256 8a5f0ce4c3e71bbc747f96e946a31d367cd2a591bd165fdeb1a8a2ae5dfdf155
MD5 3aaa7b1e0c8c5af073d7ebb2ecc192e3
BLAKE2b-256 c42cd1c08b27627a4e5dec2eb4472101e65356f44c7b1b7d119d7bce9c29dbb7

See more details on using hashes here.

File details

Details for the file git_regex_search-1-py3-none-any.whl.

File metadata

File hashes

Hashes for git_regex_search-1-py3-none-any.whl
Algorithm Hash digest
SHA256 b0735a68a746445e2b7569f5c6efc566ce71ee127ad96f6903db5fb5deff7fa4
MD5 bd69d1fa97d0889aab159a1b47adace2
BLAKE2b-256 19d7a561661fd487abc706d557af496fb20b0a0cd9a0063ed69227f2d8f33439

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page