A universal package of scraper scripts for humans
Project description
Table of Contents
About The Project
Scrapera provides access to a variety of scraper scripts for most commonly used machine learning and data science domains. Currently, Scrapera supports the following crawlers:
This main aim of this package is to cluster common scraping tasks so as to make it more convenient for ML researchers and engineers to focus on their models rather than worrying about the data collection process
DISCLAIMER: Owner or Contributors do not take any responsibility for misuse of data obtained through Scrapera. Contact the owner if copyright terms are violated due to any module provided by Scrapera.
Prerequisites
Prerequisites can be installed separately through the requirements.txt
file as below
pip install -r requirements.txt
Apart from this, some modules specifically require Chromedriver. Check for a compatible chromedriver and download it from the official site
Installation
Scrapera is built with Python 3 and can be pip
installed directly
pip install scrapera
Alternatively, if you wish to install the latest version directly through GitHub then run
pip install git+https://github.com/DarshanDeshpande/Scrapera.git
Usage
To use any sub-module, you just need to import, instantiate and execute
from scrapera.video.vimeo import VimeoScraper
scraper = VimeoScraper()
scraper.scrape('https://vimeo.com/191955190', '540p')
For more examples, please refer to the individual test folders in respective modules
Roadmap
- Instagram Comments Scraper needs updation due to recent GraphQL implementation changes
Contributing
Scrapera welcomes any and all contributions and scraper requests. Please raise an issue if the scraper fails at any instance. Feel free to fork the repository and add your own scrapers to help the community!
For more guidelines, refer to CONTRIBUTING
License
Distributed under the MIT License. See LICENSE
for more information.
Contact
Feel free to reach out for any issues or requests related to Scrapera
Darshan Deshpande (Owner) - Email | LinkedIn
Acknowledgements
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for scrapera-1.0.15-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4d42c95b7d93ec6a4f2a1f2b6dc9389170c06cc23f3cf53d7b6ef9788db6366b |
|
MD5 | 17e37a773c0c265c97de4acfb422db67 |
|
BLAKE2b-256 | 7026f09e81511031f5d9ee115ac285bcbad1eed17ce1d6a36f59ff8f6a083b28 |