Skip to main content

This package contains implementation of a web scraper with Gemini Pro integrated into it.

Project description

gemini-pro-web-scraper

Ever wondered about scraping a website without running a single line of code? Well, Gemini Pro Web Scraper is the tool to do so. This tool automatically scrapes the data you want from a website of your choice.

Source Code

The source code of the application Gemini Pro Web Scraper is available in Source Code.

Installation

pip install gemini-pro-web-scraper

How to Use the Application?

Pre-requisites:

  1. Python installed in your device.
  2. .env file in the same directory as <GEMINI_PRO_WEB_SCRAPER_DIRECTORY> and has the value of GEMINI_API_KEY.

First, open a Terminal or Command Prompt window and run the following command.

cd <GEMINI_PRO_WEB_SCRAPER_DIRECTORY>
python3 main.py

Note: Replace <GEMINI_PRO_WEB_SCRAPER_DIRECTORY> with the path to the directory of the application Gemini Pro Web Scraper.

Then, the application will start with something looking like in the screenshot below.

Application

You will then be asked to input the following values.

  1. Temperature - between 0 and 1 inclusive
  2. Top P - between 0 and 1 inclusive
  3. Top K - at least 1
  4. Max output tokens - at least 1

The following screenshot shows what is displayed after inputting the mentioned values.

Web Scraper

You will be required to input the following pieces of information.

  1. The URL of the website you want to scrape (e.g., https://sandbox.oxylabs.io/products).
  2. What the URL entered in step 1 contains (e.g., games for https://sandbox.oxylabs.io/products).
  3. The number of elements you want to scrape.
  4. The details of each element you want to scrape (i.e., the name and the corresponding CSS selector for each element).
  5. The name of the file you want the code to be in (without the extension).

Once you enter the values mentioned above, the file containing the code will be created inside "scrapers" directory. Moreover, the CSV file containing the scraped data will be generated inside "csvs" directory. Then, you will be asked whether you still want to continue unit testing or not. If you enter 'Y', you will be redirected to an application window like in screenshot above. Else, you will exit the application.

Continue Scraping

The Python file generated which contains the web scraping code looks like below.

Web Scraper Code

Below is how the generated CSV file looks like.

CSV File

Project details


Release history Release notifications | RSS feed

This version

1

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gemini-pro-web-scraper-1.tar.gz (3.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gemini_pro_web_scraper-1-py3-none-any.whl (3.5 kB view details)

Uploaded Python 3

File details

Details for the file gemini-pro-web-scraper-1.tar.gz.

File metadata

  • Download URL: gemini-pro-web-scraper-1.tar.gz
  • Upload date:
  • Size: 3.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.10.0 pkginfo/1.8.2 requests/2.31.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.4

File hashes

Hashes for gemini-pro-web-scraper-1.tar.gz
Algorithm Hash digest
SHA256 aaf933a8b5b1c1fc37a4cac08abda4dbaf12435d58497651986cf831ff3a7c16
MD5 4f901dd789f28a62be655c0b7aa4be57
BLAKE2b-256 7a29a0607e602af2a392f647fe65875c53e4bca17728f1bfb3b57b25ac30b2d7

See more details on using hashes here.

File details

Details for the file gemini_pro_web_scraper-1-py3-none-any.whl.

File metadata

  • Download URL: gemini_pro_web_scraper-1-py3-none-any.whl
  • Upload date:
  • Size: 3.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.10.0 pkginfo/1.8.2 requests/2.31.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.4

File hashes

Hashes for gemini_pro_web_scraper-1-py3-none-any.whl
Algorithm Hash digest
SHA256 b7f6a960549ac94cf714aac50ca958645780cc8fcecc5fc49a9636c37c01793e
MD5 9a3e11149ec11967e0565effe6b0fcb0
BLAKE2b-256 c9f451aa11997339414301f93cca868818ea8ac169a6956361b891a653a68657

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page