Skip to main content

A simple scraper

Project description

Peviitor Scraper

Pe Viitor logo

Description

peviitor_pyscraper is a Python-based scraping library that relies on HTML parsing libraries, Beautiful Soup, and Requests. It allows you to extract the required data from web pages and save them in an easily usable format such as CSV or JSON. With peviitor_pyscraper, you can select specific HTML elements from a web page and extract necessary information like text, links, images, etc.

Features of peviitor_pyscraper:

  • Utilizes popular Python libraries, BeautifulSoup and Requests, to facilitate web scraping.
  • Extracts the required data from a web page using specific HTML selections.
  • Provides a variety of storage options for the scraped data, including JSON.
  • Is easy to use and integrate into existing Python projects.
  • It can render pages with dynamically generated elements.

peviitor_pyscraper is an excellent choice for Python developers seeking a powerful and flexible web scraping library. With peviitor_pyscraper, you can automate the process of extracting data from web pages, saving time and effort.

Installation

  1. You need to have Python 3.6 or higher installed on your computer. pip install peviitor-pyscraper
  2. Node JS is required for rendering pages with dynamically generated elements. npm i peviitor_jsscraper

Usage Examples

  1. Downloading the content from a specific URL:

     from scraper.Scraper import Scraper
     scraper = Scraper()
     html = scraper.get_from_url('https://www.example.ro')
     print(html.prettify())
    

    The two lines of code create a Scraper object with the URL https://www.example.ro and then download the HTML code from that URL using the get_from_url() method, which returns a BeautifulSoup object that can be later used to search for specific elements within the web page.

    To extract all "a" tags that contain an "href" attribute starting with "https://" from the downloaded HTML code, you can use the following code:

    from scraper.Scraper import Scraper
    scraper = Scraper()
    html = scraper.get_from_url('https://www.example.ro')
    links = html.find_all('a', href=re.compile('^https://'))
    for link in links:
        print(link.get('href'))
    

    To extract the first "h1" tag from the page:

    from scraper.Scraper import Scraper
    scraper = Scraper()
    html = scraper.get_from_url('https://www.example.ro')
    h1 = html.find('h1')
    print(h1.text)
    
  2. Downloading JSON content from a specific URL:

    from scraper.Scraper import Scraper
    scraper = Scraper()
    json = scraper.get_from_url('https://api.example.ro', type='JSON')
    print(json)
    

    These lines of code create a Scraper object with the URL https://api.example.ro and then download the JSON content from that URL using the get_from_url() method, which returns a JSON object that can be later used to search for specific elements within the web page.

    To make a POST request to a specific URL:

    from scraper.Scraper import Scraper
    scraper = Scraper()
    data = {'key1': 'value1', 'key2': 'value2'}
    response = scraper.post('https://api.example.ro', data=data)
    json = response.json()
    print(json)
    
  3. The peviitor_pyscraper can render pages with dynamically generated elements. To render a page with dynamically generated elements you need to install Node JS and the peviitor_jsscraper package. To install the package run npm i peviitor_jsscraper and then use the render_page() method.

    from scraper.Scraper import Scraper
    scraper = Scraper()
    html = scraper.render_page('https://www.example.ro')
    print(html.prettify())
    
  4. Contains all BeautifulSoup methods and attributes.

Contributing

If you want to contribute to the development of the scraper, there are several ways you can do so. First, you can help by contributing to the source code by adding new features or fixing existing issues. Second, you can contribute to improving the documentation or translating it into other languages. Additionally, if you want to help but are unsure how to get started, you can check our list of open issues and ask us how you can assist. For more information, please refer to the "Contribute" section in our documentation.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contact

If you have any questions or suggestions, please contact us at

Acknowledgements

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

peviitor_pyscraper-0.0.7.tar.gz (5.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

peviitor_pyscraper-0.0.7-py3-none-any.whl (5.5 kB view details)

Uploaded Python 3

File details

Details for the file peviitor_pyscraper-0.0.7.tar.gz.

File metadata

  • Download URL: peviitor_pyscraper-0.0.7.tar.gz
  • Upload date:
  • Size: 5.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.7

File hashes

Hashes for peviitor_pyscraper-0.0.7.tar.gz
Algorithm Hash digest
SHA256 e1e54c1a2844570dc491d56fba0422705c39d21326ec7c57eea654267830e007
MD5 8b2c64a6b304a28683837afff4e04604
BLAKE2b-256 c9f5a899a9a7ba78ae816e2f1b8b72f8ff503ed8a37d28140e5eee6c041cceef

See more details on using hashes here.

File details

Details for the file peviitor_pyscraper-0.0.7-py3-none-any.whl.

File metadata

File hashes

Hashes for peviitor_pyscraper-0.0.7-py3-none-any.whl
Algorithm Hash digest
SHA256 77b7d1d5b2cca839aeb74e2e774f2f682715b169540a584b03eb99cd84a8a0c5
MD5 1f3ac9b23fb3981e3c4f83c03ca03260
BLAKE2b-256 50797b6b3a63772796b995b91f84f728f9542d1715ef5d5c63eff1a499b18cb8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page