This Python package is designed for web crawling through input links that belong to GitHub domains. It offers a wide range of functionalities beyond simple crawling, including the ability to list repositories associated with the provided link, download GitHub repositories, and extract the contents of GitHub repositories.
Project description
github-domain-scraper
The github-domain-scraper is a powerful tool for extracting valuable information from GitHub domains. It provides a
wide
variety of use-cases, making it a versatile solution for various scenarios.
Installation
You can install the github-domain-scraper from PyPI:
python -m pip install github-domain-scraper
The reader is supported on Python 3.8 and above.
How to use
The github-domain-scraper is having wide variety of use-cases
Command-line Tool
You can use the github-domain-scraper as a command-line tool to extract information from GitHub domains:
-
Extracting
Users Repositorieslinkspython -m github_domain_scraper --link="https://github.com/Parth971"
You can also specify a JSON output file for the results and maximum number of links:
python -m github_domain_scraper \ --link "https://github.com/Parth971" \ --json output.json \ --max-repositories 10
-
Extracting links from
Search resultpython -m github_domain_scraper --link "https://github.com/search?q=ori+python&type=users"
You can also specify a JSON output file for the results and maximum number of links:
python -m github_domain_scraper \ --link "https://github.com/search?q=ori+python&type=users" \ --json output.json \ --max-repositories 10
-
Extracting
User ProfileInformationpython -m github_domain_scraper --github-username <GitHub Username> [<GitHub Username>, ...]
You can also specify a JSON output file for the results:
python -m github_domain_scraper --github-username Parth971
python -m github_domain_scraper \ --github-username Parth971 OrionXV oriana04bedoya oriolval Ailothaen \ --json output.json
Integration in Python Modules
The github-domain-scraper can also be seamlessly integrated into other Python modules.
Import the LinkExtractor class from github_domain_scraper.link_extractor and use it as
follows:
from github_domain_scraper.extractor import LinkExtractor, UserProfileInformationExtractor
links = LinkExtractor(initial_link="github_link").extract()
info = UserProfileInformationExtractor(github_username="Parth971").extract()
This makes it easy to incorporate github-domain-scraper functionality into your custom Python projects.
Documentation
For detailed documentation, including publishing guides and development information, see the docs directory.
License
This project is licensed under the MIT License - see the LICENSE.md file for details.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file github_domain_scraper-3.2.0.tar.gz.
File metadata
- Download URL: github_domain_scraper-3.2.0.tar.gz
- Upload date:
- Size: 15.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cb3f0617e816a3572f69e13bbf61e1797ea8a3344698a004ae092c0e12e22bd7
|
|
| MD5 |
93849f7c059fe5354def177a27720447
|
|
| BLAKE2b-256 |
7d03c05762b4f32c42d58609429fde269ce34eabefbd9adf4e906bd769d78d5f
|
File details
Details for the file github_domain_scraper-3.2.0-py3-none-any.whl.
File metadata
- Download URL: github_domain_scraper-3.2.0-py3-none-any.whl
- Upload date:
- Size: 17.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d9a6c437b1d37229f7914d2a007dd2bf93876d68e694dc94a42d36365c5757a5
|
|
| MD5 |
aa179e620fe649e3e704a77ea72d84eb
|
|
| BLAKE2b-256 |
8afb59d9f9b6498f86ba3e5afac6a9761f4e06cdeac540809424e3a03d8ff83d
|