Skip to main content

A simple scraper

Project description

Peviitor Scraper

Pe Viitor logo

Description

peviitor_pyscraper is a Python-based scraping library that relies on HTML parsing libraries, Beautiful Soup, and Requests. It allows you to extract the required data from web pages and save them in an easily usable format such as CSV or JSON. With peviitor_pyscraper, you can select specific HTML elements from a web page and extract necessary information like text, links, images, etc.

Features of peviitor_pyscraper:

  • Utilizes popular Python libraries, BeautifulSoup and Requests, to facilitate web scraping.
  • Extracts the required data from a web page using specific HTML selections.
  • Provides a variety of storage options for the scraped data, including JSON.
  • Is easy to use and integrate into existing Python projects.
  • It can render pages with dynamically generated elements.

peviitor_pyscraper is an excellent choice for Python developers seeking a powerful and flexible web scraping library. With peviitor_pyscraper, you can automate the process of extracting data from web pages, saving time and effort.

Installation

  1. You need to have Python 3.6 or higher installed on your computer. pip install peviitor-pyscraper
  2. Node JS is required for rendering pages with dynamically generated elements. npm i peviitor_jsscraper

Usage Examples

  1. Downloading the content from a specific URL:

     from scraper.Scraper import Scraper
     scraper = Scraper()
     html = scraper.get_from_url('https://www.example.ro')
     print(html.prettify())
    

    The two lines of code create a Scraper object with the URL https://www.example.ro and then download the HTML code from that URL using the get_from_url() method, which returns a BeautifulSoup object that can be later used to search for specific elements within the web page.

    To extract all "a" tags that contain an "href" attribute starting with "https://" from the downloaded HTML code, you can use the following code:

    from scraper.Scraper import Scraper
    scraper = Scraper()
    html = scraper.get_from_url('https://www.example.ro')
    links = html.find_all('a', href=re.compile('^https://'))
    for link in links:
        print(link.get('href'))
    

    To extract the first "h1" tag from the page:

    from scraper.Scraper import Scraper
    scraper = Scraper()
    html = scraper.get_from_url('https://www.example.ro')
    h1 = html.find('h1')
    print(h1.text)
    
  2. Downloading JSON content from a specific URL:

    from scraper.Scraper import Scraper
    scraper = Scraper()
    json = scraper.get_from_url('https://api.example.ro', type='JSON')
    print(json)
    

    These lines of code create a Scraper object with the URL https://api.example.ro and then download the JSON content from that URL using the get_from_url() method, which returns a JSON object that can be later used to search for specific elements within the web page.

    To make a POST request to a specific URL:

    from scraper.Scraper import Scraper
    scraper = Scraper()
    data = {'key1': 'value1', 'key2': 'value2'}
    response = scraper.post('https://api.example.ro', data=data)
    json = response.json()
    print(json)
    
  3. The peviitor_pyscraper can render pages with dynamically generated elements. To render a page with dynamically generated elements you need to install Node JS and the peviitor_jsscraper package. To install the package run npm i peviitor_jsscraper and then use the render_page() method.

    from scraper.Scraper import Scraper
    scraper = Scraper()
    html = scraper.render_page('https://www.example.ro')
    print(html.prettify())
    
  4. Contains all BeautifulSoup methods and attributes.

Contributing

If you want to contribute to the development of the scraper, there are several ways you can do so. First, you can help by contributing to the source code by adding new features or fixing existing issues. Second, you can contribute to improving the documentation or translating it into other languages. Additionally, if you want to help but are unsure how to get started, you can check our list of open issues and ask us how you can assist. For more information, please refer to the "Contribute" section in our documentation.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contact

If you have any questions or suggestions, please contact us at

Acknowledgements

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

peviitor_pyscraper-0.0.5.tar.gz (5.1 kB view hashes)

Uploaded Source

Built Distribution

peviitor_pyscraper-0.0.5-py3-none-any.whl (5.4 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page