Skip to main content

A tool to scrape code from Git repositories for LLM analysis.

Project description

Below is an example of a README that includes a list of available CLI options for your package:

GitHub Scrapper

A tool to scrape code from Git repositories for LLM analysis.

Installation

You can install the package using PyPI

pip install github-scrapper

or directly from GitHub:

pip install git+https://github.com/Pioannid/GitHubScrapper.git

Or, if you use Poetry, add the dependency in your pyproject.toml:

[tool.poetry.dependencies] github-scrapper = { git = "https://github.com/Pioannid/GitHubScrapper.git" }

Then run:

poetry install

Usage

Python Script

from github_scrapper import GitHubCodeScraper

repo_url = "https://github.com/Pioannid/GitHubScrapper"
scraper = GitHubCodeScraper(repo_path=repo_url, branch="main")
code_contents = scraper.scrape_repository()
formatted_output = scraper.format_for_llm(code_contents)
print(formatted_output)

CLI

After installation, the CLI tool is available as github-scrapper. The basic usage is:

github-scrapper [OPTIONS] REPO_PATH

Where REPO_PATH is the path to the Git repository or its URL.

–output, -o: Description: Specify a file path to save the formatted output. Example:

--output output.txt

–ignore-dirs, -id: Description: Additional directories to ignore. Accepts one or more directory names. Example:

--ignore-dirs venv node_modules

–ignore-files, -if: Description: Specific files to ignore. Accepts one or more filenames. Example:

--ignore-files README.md LICENSE

–ignore-file, -c:

Description: Path to a configuration file with ignore rules (for both files and directories). Example:

--ignore-file .gitignore

–token, -t: Description: GitHub token for private repositories (if REPO_PATH is a URL). Example:

--token YOUR_GITHUB_TOKEN

–branch, -b: Description: The branch to scrape from. Default is main. Example:

--branch develop

Example Command

To scrape the repository on the main branch and save the output to output.txt:

github-scrapper https://github.com/Pioannid/GitHubScrapper --branch main --output output.txt

If you run github-scrapper without any arguments, the tool will display the help message listing all these options.

License

This project is licensed under the MIT License.


Feel free to modify the wording or examples to best match your project. This README provides clear instructions on how to install, use, and customize the tool via its available CLI options.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

github_scrap-0.1.0.tar.gz (5.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

github_scrap-0.1.0-py3-none-any.whl (7.7 kB view details)

Uploaded Python 3

File details

Details for the file github_scrap-0.1.0.tar.gz.

File metadata

  • Download URL: github_scrap-0.1.0.tar.gz
  • Upload date:
  • Size: 5.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.6

File hashes

Hashes for github_scrap-0.1.0.tar.gz
Algorithm Hash digest
SHA256 518c9f03b47128cb07a7719ab659c6232dcc27fc9a8d7f40b9953bbddf14ae81
MD5 45064503d17d797a5f3ebe3a7d8a047a
BLAKE2b-256 08c3f94c1d5cf9bafb21ba12fce20f798232a68a2779b2f9e55ecf7819f21e18

See more details on using hashes here.

File details

Details for the file github_scrap-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: github_scrap-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 7.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.6

File hashes

Hashes for github_scrap-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 876f396e7a7139af78631ac4a2c041aef24104eeb425e98f917f78cc4eb8c07e
MD5 577e68219f288335bed1868a21dc1618
BLAKE2b-256 d0665af2163964b03a763e9e0530ef4576fbf95f8dac9dbad77ea3f299c5adf1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page