This program allows you to search for a specific pattern within the files of a GitHub repository(using PyGithub).
Project description
GitHub Search Using Regex
Saikat Karmakar | 28 Jan:2023
This program allows you to search for a specific pattern within the files of a GitHub repository(Using PyGithub). The program utilizes regular expressions and multithreading to quickly search through the repository's contents and return all files that contain the specified pattern.
Features
- Search for a specific pattern in the files of a GitHub repository
- Supports single repository or file containing multiple repository URLs
- Uses multithreading to improve search performance
Installation
# using pip
pip install git-regex-search
To use this program, you will need to have Python 3 and the PyGithub library installed. You can install PyGithub using pip:
pip3 install -r requirements.txt
You will also need to have a GitHub personal access token. You can create one by going to the GitHub Developer Settings.
Usage
To use the program, you will need to provide a GitHub access token. The token can be passed as an environment variable or in a config.toml file. The program will look for the token in the following order:
GITHUB_TOKEN
environment variableconfig.toml
file
python3 git_regex_search.py -h
usage: git_regex_search.py [-h] [-u URL] [-r REGEX]
options:
-h, --help show this help message and exit
-u URL, --url URL URL of the repository (single or file
containing URLs)
-r REGEX, --regex REGEX
Regex pattern to search for
For example, to search for the pattern brownie
in the repository Aviksaikat/RoadClosed-quillctf-brownie
, you would run the following command:
python3 git_regex_search.py -u https://github.com/Aviksaikat/RoadClosed-quillctf-brownie -r "brownie"
The program will then return a list of all files that contain the specified pattern, along with the line number where the pattern was found.
Multithreading
The program utilizes multithreading to search through the repository's contents more quickly. Two threads are created and run simultaneously, each searching through the repository's contents. This allows the program to search through the repository's contents much faster than if it were only using a single thread.
Regular Expressions
The program utilizes regular expressions to search for the specified pattern within the files of the repository. This allows for more powerful and flexible searches, as opposed to simple string matching.
Docker
You can build the image by running the following command in the same directory where the Dockerfile is located:
docker build -t <image-name> .
You can then run the container using the following command:
docker run -e GITHUB_TOKEN=<access_token> <image-name>
Limitations
This program only searches the contents of the files and not the name of the files. For my use I don't need it to search the names.
Additional Features
The program also prints the whole file path where the pattern was found, which is helpful in identifying the location of the pattern.
Conclusion
This program is a powerful tool for quickly searching through the contents of a GitHub repository and finding specific patterns. With the use of regular expressions and multithreading, it is able to search through large repositories quickly and efficiently.
Version 0.3
Published as pip package
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file git_regex_search-1.tar.gz
.
File metadata
- Download URL: git_regex_search-1.tar.gz
- Upload date:
- Size: 4.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8a5f0ce4c3e71bbc747f96e946a31d367cd2a591bd165fdeb1a8a2ae5dfdf155 |
|
MD5 | 3aaa7b1e0c8c5af073d7ebb2ecc192e3 |
|
BLAKE2b-256 | c42cd1c08b27627a4e5dec2eb4472101e65356f44c7b1b7d119d7bce9c29dbb7 |
File details
Details for the file git_regex_search-1-py3-none-any.whl
.
File metadata
- Download URL: git_regex_search-1-py3-none-any.whl
- Upload date:
- Size: 4.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b0735a68a746445e2b7569f5c6efc566ce71ee127ad96f6903db5fb5deff7fa4 |
|
MD5 | bd69d1fa97d0889aab159a1b47adace2 |
|
BLAKE2b-256 | 19d7a561661fd487abc706d557af496fb20b0a0cd9a0063ed69227f2d8f33439 |