Skip to main content

A simple scraper

Project description

Peviitor Scraper

Pe Viitor logo

Description

peviitor_pyscraper is a Python-based scraping library that relies on HTML parsing libraries, Beautiful Soup, and Requests. It allows you to extract the required data from web pages and save them in an easily usable format such as CSV or JSON. With peviitor_pyscraper, you can select specific HTML elements from a web page and extract necessary information like text, links, images, etc.

Features of peviitor_pyscraper:

  • Utilizes popular Python libraries, BeautifulSoup and Requests, to facilitate web scraping.
  • Extracts the required data from a web page using specific HTML selections.
  • Provides a variety of storage options for the scraped data, including JSON.
  • Is easy to use and integrate into existing Python projects.
  • It can render pages with dynamically generated elements.

peviitor_pyscraper is an excellent choice for Python developers seeking a powerful and flexible web scraping library. With peviitor_pyscraper, you can automate the process of extracting data from web pages, saving time and effort.

Installation

  1. You need to have Python 3.6 or higher installed on your computer. pip install peviitor-pyscraper
  2. Node JS is required for rendering pages with dynamically generated elements. npm i peviitor_jsscraper

Usage Examples

  1. Downloading the content from a specific URL:

     from scraper.Scraper import Scraper
     scraper = Scraper()
     html = scraper.get_from_url('https://www.example.ro')
     print(html.prettify())
    

    The two lines of code create a Scraper object with the URL https://www.example.ro and then download the HTML code from that URL using the get_from_url() method, which returns a BeautifulSoup object that can be later used to search for specific elements within the web page.

    To extract all "a" tags that contain an "href" attribute starting with "https://" from the downloaded HTML code, you can use the following code:

    from scraper.Scraper import Scraper
    scraper = Scraper()
    html = scraper.get_from_url('https://www.example.ro')
    links = html.find_all('a', href=re.compile('^https://'))
    for link in links:
        print(link.get('href'))
    

    To extract the first "h1" tag from the page:

    from scraper.Scraper import Scraper
    scraper = Scraper()
    html = scraper.get_from_url('https://www.example.ro')
    h1 = html.find('h1')
    print(h1.text)
    
  2. Downloading JSON content from a specific URL:

    from scraper.Scraper import Scraper
    scraper = Scraper()
    json = scraper.get_from_url('https://api.example.ro', type='JSON')
    print(json)
    

    These lines of code create a Scraper object with the URL https://api.example.ro and then download the JSON content from that URL using the get_from_url() method, which returns a JSON object that can be later used to search for specific elements within the web page.

    To make a POST request to a specific URL:

    from scraper.Scraper import Scraper
    scraper = Scraper()
    data = {'key1': 'value1', 'key2': 'value2'}
    response = scraper.post('https://api.example.ro', data=data)
    json = response.json()
    print(json)
    
  3. The peviitor_pyscraper can render pages with dynamically generated elements. To render a page with dynamically generated elements you need to install Node JS and the peviitor_jsscraper package. To install the package run npm i peviitor_jsscraper and then use the render_page() method.

    from scraper.Scraper import Scraper
    scraper = Scraper()
    html = scraper.render_page('https://www.example.ro')
    print(html.prettify())
    
  4. Contains all BeautifulSoup methods and attributes.

Contributing

If you want to contribute to the development of the scraper, there are several ways you can do so. First, you can help by contributing to the source code by adding new features or fixing existing issues. Second, you can contribute to improving the documentation or translating it into other languages. Additionally, if you want to help but are unsure how to get started, you can check our list of open issues and ask us how you can assist. For more information, please refer to the "Contribute" section in our documentation.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contact

If you have any questions or suggestions, please contact us at

Acknowledgements

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

peviitor_pyscraper-0.0.6.tar.gz (5.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

peviitor_pyscraper-0.0.6-py3-none-any.whl (5.5 kB view details)

Uploaded Python 3

File details

Details for the file peviitor_pyscraper-0.0.6.tar.gz.

File metadata

  • Download URL: peviitor_pyscraper-0.0.6.tar.gz
  • Upload date:
  • Size: 5.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.7

File hashes

Hashes for peviitor_pyscraper-0.0.6.tar.gz
Algorithm Hash digest
SHA256 fbb931696f1aa85e5ff09e249da073ddf66fef15dc50e382b0c31d1f2cdb76e5
MD5 bd3630616109305f63d9e091150cbfe7
BLAKE2b-256 d59ffe4f9303e443ec606a95d7020a30b87da5866bfff3a467e06756f29857f8

See more details on using hashes here.

File details

Details for the file peviitor_pyscraper-0.0.6-py3-none-any.whl.

File metadata

File hashes

Hashes for peviitor_pyscraper-0.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 d45084cd85871b731a34fbae24a9ae9fc2d69e56940cf231a0dec3a5fbb24a11
MD5 3e525e87440a64cb5a1d3b8243208a7f
BLAKE2b-256 6a75d361f46cc78a173d7fc46bbe9532193d3402515b3be16ec5bd20bf91dca2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page