This Python package is designed for web crawling through input links that belong to GitHub domains. It offers a wide range of functionalities beyond simple crawling, including the ability to list repositories associated with the provided link, download GitHub repositories, and extract the contents of GitHub repositories.
Project description
github-domain-scraper
The github-domain-scraper
is a powerful tool for extracting valuable information from GitHub domains. It provides a
wide
variety of use-cases, making it a versatile solution for various scenarios.
Installation
You can install the github-domain-scraper
from PyPI:
python -m pip install github-domain-scraper
The reader is supported on Python 3.8 and above.
How to use
The github-domain-scraper
is having wide variety of use-cases
Command-line Tool
You can use the github-domain-scraper
as a command-line tool to extract information from GitHub domains:
-
Extracting
Users Repositories
linkspython -m github_domain_scraper --link="https://github.com/Parth971"
You can also specify a JSON output file for the results and maximum number of links:
python -m github_domain_scraper \ --link "https://github.com/Parth971" \ --json output.json \ --max-repositories 10
-
Extracting links from
Search result
python -m github_domain_scraper --link "https://github.com/search?q=ori+python&type=users"
You can also specify a JSON output file for the results and maximum number of links:
python -m github_domain_scraper \ --link "https://github.com/search?q=ori+python&type=users" \ --json output.json \ --max-repositories 10
-
Extracting
User Profile
Informationpython -m github_domain_scraper --github-username <GitHub Username> [<GitHub Username>, ...]
You can also specify a JSON output file for the results:
python -m github_domain_scraper --github-username Parth971
python -m github_domain_scraper \ --github-username Parth971 OrionXV oriana04bedoya oriolval Ailothaen \ --json output.json
Integration in Python Modules
The github-domain-scraper
can also be seamlessly integrated into other Python modules.
Import the LinkExtractor
class from github_domain_scraper.link_extractor
and use it as
follows:
from github_domain_scraper.extractor import LinkExtractor, UserProfileInformationExtractor
links = LinkExtractor(initial_link="github_link").extract()
info = UserProfileInformationExtractor(github_username="Parth971").extract()
This makes it easy to incorporate github-domain-scraper functionality into your custom Python projects.
License
This project is licensed under the MIT License - see the LICENSE.md file for details.
github_domain_scraper.extractor
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file github_domain_scraper-3.0.0.tar.gz
.
File metadata
- Download URL: github_domain_scraper-3.0.0.tar.gz
- Upload date:
- Size: 11.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 40d3c4860622188d03bd460c1f980b492ee87ec88fddc8f7a194ac1039527d78 |
|
MD5 | bd976e2bb304e9d51a7ff225eef00d37 |
|
BLAKE2b-256 | f5926aa2a2857451e1051d3eb34636bdeea6e587ccd7e40f8a6020c8db77279a |
File details
Details for the file github_domain_scraper-3.0.0-py3-none-any.whl
.
File metadata
- Download URL: github_domain_scraper-3.0.0-py3-none-any.whl
- Upload date:
- Size: 12.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d9350b61f851f85d0f01cbbeb8fd3c7149f4cd7fb9f4470d939a7426da3d77dc |
|
MD5 | f3421e41de360887162123ea5d7095a8 |
|
BLAKE2b-256 | 5662587fb80e8a0b6eea39eba552da0486ebc3b88953c29177f9d2a652b4ea25 |