Skip to main content

A universal package of scraper scripts for humans

Project description


Logo

MIT License version-shield release-shield python-shield

Table of Contents
  1. About The Project
  2. Getting Started
  3. Usage
  4. Contributing
  5. Sponsors
  6. License
  7. Contact
  8. Acknowledgements

About The Project

Scrapera is a completely Chromedriver free package that provides access to a variety of scraper scripts for most commonly used machine learning and data science domains. Scrapera directly and asynchronously scrapes from public API endpoints, thereby removing the heavy browser overhead which makes Scrapera extremely fast and robust to DOM changes. Currently, Scrapera supports the following crawlers:

  • Images
  • Text
  • Audio
  • Videos
  • Miscellaneous

  • The main aim of this package is to cluster common scraping tasks so as to make it more convenient for ML researchers and engineers to focus on their models rather than worrying about the data collection process

    DISCLAIMER: Owner or Contributors do not take any responsibility for misuse of data obtained through Scrapera. Contact the owner if copyright terms are violated due to any module provided by Scrapera.

    Prerequisites

    Prerequisites can be installed separately through the requirements.txt file as below

    pip install -r requirements.txt
    

    Installation

    Scrapera is built with Python 3 and can be pip installed directly

    pip install scrapera
    

    Alternatively, if you wish to install the latest version directly through GitHub then run

    pip install git+https://github.com/DarshanDeshpande/Scrapera.git
    

    Usage

    To use any sub-module, you just need to import, instantiate and execute

    from scrapera.video.vimeo import VimeoScraper
    scraper = VimeoScraper()
    scraper.scrape('https://vimeo.com/191955190', '540p')
    

    For more examples, please refer to the individual test folders in respective modules

    Contributing

    Scrapera welcomes any and all contributions and scraper requests. Please raise an issue if the scraper fails at any instance. Feel free to fork the repository and add your own scrapers to help the community!
    For more guidelines, refer to CONTRIBUTING

    License

    Distributed under the MIT License. See LICENSE for more information.

    Sponsors

    Logo

    Contact

    Feel free to reach out for any issues or requests related to Scrapera

    Darshan Deshpande (Owner) - Email | LinkedIn

    Acknowledgements

    Project details


    Download files

    Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

    Source Distribution

    scrapera-1.1.3.tar.gz (4.1 kB view details)

    Uploaded Source

    Built Distribution

    scrapera-1.1.3-py3-none-any.whl (28.7 kB view details)

    Uploaded Python 3

    File details

    Details for the file scrapera-1.1.3.tar.gz.

    File metadata

    • Download URL: scrapera-1.1.3.tar.gz
    • Upload date:
    • Size: 4.1 kB
    • Tags: Source
    • Uploaded using Trusted Publishing? No
    • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/46.4.0 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.7.7

    File hashes

    Hashes for scrapera-1.1.3.tar.gz
    Algorithm Hash digest
    SHA256 985d28a499ea091eb9579f5514db0b34decb7c000a5a1a11215dec2271dd542f
    MD5 a0c712267e5ed1816c75f8c147e57428
    BLAKE2b-256 28e600a182708780324a0639402181ade21db8046fefd2d4ec41e1f1daaf1b20

    See more details on using hashes here.

    File details

    Details for the file scrapera-1.1.3-py3-none-any.whl.

    File metadata

    • Download URL: scrapera-1.1.3-py3-none-any.whl
    • Upload date:
    • Size: 28.7 kB
    • Tags: Python 3
    • Uploaded using Trusted Publishing? No
    • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/46.4.0 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.7.7

    File hashes

    Hashes for scrapera-1.1.3-py3-none-any.whl
    Algorithm Hash digest
    SHA256 502d98ca7578f2b4e935f3bf19adecc8e72c4091b3cc7494690e616739a5a0ae
    MD5 9eda45da4932d66dca7ff3c68926a412
    BLAKE2b-256 25c61ae545436fcd4504edbb7ff9f3298d66e5c172d849f47686bc5c5337c387

    See more details on using hashes here.

    Supported by

    AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page