Skip to main content

Ujeebu Python SDK to interact with Ujeebu API

Project description

Ujeebu API Python SDK

Ujeebu is a set of powerful APIs for Web data scraping and automatic content extraction. This SDK provides an easy-to-use interface for interacting with Ujeebu API. It is built on top of Python and uses the requests library to make HTTP requests.

Installation

You can install the SDK using pip:

pip install ujeebu-python

Usage

To use the SDK, you first need to create an instance of it with your API credentials:

from ujeebu_python import UjeebuClient
import json
ujeebu = UjeebuClient(api_key="__YOUR-API-KEY__")
url = "https://ujeebu.com/blog/scraping-javascript-heavy-pages-using-puppeteer/"
response = ujeebu.extract(url=url)
if(response.status_code == 200):
    result = response.json()
    print(json.dumps(result['article'], indent=2))
else:
    print("Error:\n", json.dumps(response.json(), indent=2))

APIs

The SDK provides the following methods:

Core Methods

  • scrape(url, params, headers)

    • url: The url to scrape (required).
    • params: Dict of scrape API params (optional).
    • headers: Dict of headers to forward (optional).
  • extract(url, params, headers)

    • url: The url to extract (required).
    • params: Dict of extract API params (optional).
    • headers: Dict of headers to forward (optional).
  • preview(url, params, headers)

    • url: The url to extract (required).
    • params: Dict of preview API params (optional).
    • headers: Dict of headers to forward (optional).
  • serp(params, headers)

    • params: Dict of SERP API params (optional).
    • headers: Dict of headers to forward (optional).
  • account()

    • Returns account information including usage, balance, and plan details.

Scrape Helper Methods

  • get_pdf(url, params, headers)

    • Gets a PDF of a web page using the Scrape API.
    • url: The URL to create a PDF from (required).
    • params: Additional parameters for the PDF generation (optional).
    • headers: Headers to forward to the request (optional).
    • Automatically sets response_type to 'pdf' and json to True.
  • get_screenshot(url, params, headers)

    • Gets a screenshot of a web page using the Scrape API.
    • url: The URL to take a screenshot of (required).
    • params: Additional parameters for the screenshot (optional).
    • headers: Headers to forward to the request (optional).
    • Automatically sets response_type to 'screenshot' and json to True.
  • get_html(url, params, headers)

    • Gets the HTML of a web page using the Scrape API.
    • url: The URL to get HTML from (required).
    • params: Additional parameters for the request (optional).
    • headers: Headers to forward to the request (optional).
    • Automatically sets response_type to 'html' and json to True.
  • scrape_with_rules(url, extract_rules, params, headers)

    • Extracts data from a web page using extraction rules with the Scrape API.
    • url: The URL to extract data from (required).
    • extract_rules: The rules to extract data with (required).
    • params: Additional parameters for the extraction (optional).
    • headers: Headers to forward to the request (optional).
    • Automatically sets json to True.

SERP Helper Methods

  • search_text(search, params)

    • Performs a Google text search using the SERP API.
    • search: The search query to perform on Google (required).
    • params: Additional parameters for the search (optional).
  • search_news(search, params)

    • Performs a Google news search using the SERP API.
    • search: The search query to perform on Google News (required).
    • params: Additional parameters for the search (optional).
  • search_images(search, params)

    • Performs a Google images search using the SERP API.
    • search: The search query to perform on Google Images (required).
    • params: Additional parameters for the search (optional).
  • search_videos(search, params)

    • Performs a Google videos search using the SERP API.
    • search: The search query to perform on Google Videos (required).
    • params: Additional parameters for the search (optional).
  • search_maps(search, params)

    • Performs a Google Maps search using the SERP API.
    • search: The search query to perform on Google Maps (required).
    • params: Additional parameters for the search (optional).

Examples

  • Example to scrape html of URL with infinite scroll
url = "https://scrape.li/load-more"
response = ujeebu.scrape(url, params={
    # define what to extract 'raw', 'html', 'screenshot' or 'pdf'
    "response_type": "dddd",

    # return response in json format or
    "json": False,
    # user-agent header to forward
    "useragent": "Ujeebu-Node",

    # cookies to forward
    "cookies": {"Cookie1": "Cookie Value"},

    # execute js
    "js": True,

    # wait for selector or time in ms
    "wait_for": ".products-list",

    # if the selector doesn't appear in 5000ms, ignore and continue
    "wait_for_timeout": 5000,

    # scroll the page down
    "scroll_down": True,

    # wait 2000ms between two scrolls
    "scroll_wait": 2000,

    # scroll to this element in each scroll
    "scroll_to_selector": ".load-more-section",

    # scroll condition. while this is true the page will continue to perform more scrolls
    "scroll_callback": "() => (document.querySelector('.no-more-products') === null)",

    "proxy_type": "premium",

    # proxy country code
    "proxy_country": "US",

    # device type "desktop" or "mobile"
    "device": "desktop",

    "window_width": 1200,

    "window_height": 900,

    "block_ads": True,

    "block_resources": True,
}, headers={
    # forwarded headers
    "Authorization": "Basic eWSjaW5lnlhY4luZUdxMDE2"
})
if(response.status_code == 200):
    print(response.content)
else:
    print("Error:\n", json.dumps(response.json(), indent=2))
  • Example to take a screenshot of URL
url = "https://scrape.li/load-more";

response = ujeebu.scrape(url, {
    "response_type": "screenshot",
    "screenshot_fullpage": True,
    "js": True,
    # CSS selector to screenshot or coordinates of the rect to screenshot
    # screenshot_partial: {
    #     x: 0,
    #     y: 0,
    # },
    # If json is set the true the screenshot will be sent in base64 encoding
    "json": False,
    "wait_for": 4000,
    "block_ads": True,
})
if(response.status_code == 200):
    # from pathlib import Path
    Path('screenshot.png').write_bytes(response.content)
else:
    print("Error:\n", json.dumps(response.json(), indent=2))
  • Example of extracting list of products from a page
response = ujeebu.scrape(url=url, params={
    "wait_for": 5000,
    "block_resources": 0,
    "js": 1,
    "extract_rules": {
        "products": {
            "selector": ".product-card",
            "type": "obj",
            "multiple": 1,
            "children": {
                "name": {
                    "selector": ".title",
                    "type": "text"
                },
                "description": {
                    "selector": ".description",
                    "type": "text"
                },
                "price": {
                    "selector": ".price",
                    "type": "text"
                },
                "image": {
                    "selector": ".card__image > img",
                    "type": "image",
                }
            }
        }
    }
})
if(response.status_code == 200):
    print(json.dumps(response.json(), indent=2))
else:
    print("Error:\n", json.dumps(response.json(), indent=2))
  • Example of extracting main body of an article
url = "https://thenextweb.com/news/european-space-agency-unveils-new-plan-for-growing-plants-on-the-moon"

response = ujeebu.extract(url=url, params={
    "js": True
})
if(response.status_code == 200):
    result = response.json()
    print(json.dumps(result['article'], indent=2))
else:
    print("Error:\n", json.dumps(response.json(), indent=2))

Helper Functions Examples

Using Scrape Helper Methods

  • Get PDF using helper function:
from ujeebu_python import UjeebuClient

ujeebu = UjeebuClient(api_key="__YOUR-API-KEY__")

# Get PDF
response = ujeebu.get_pdf(
    "https://ujeebu.com/blog/scraping-javascript-heavy-pages-using-puppeteer/"
)
if response.status_code == 200:
    result = response.json()
    # PDF is base64 encoded
    print(result['pdf'][:100])
  • Get screenshot using helper function:
# Take a full page screenshot
response = ujeebu.get_screenshot(
    "https://ujeebu.com",
    params={
        "screenshot_fullpage": True
    }
)
if response.status_code == 200:
    result = response.json()
    # Screenshot is base64 encoded
    print(result['screenshot'][:100])
  • Get HTML using helper function:
# Get HTML with JavaScript execution
response = ujeebu.get_html(
    "https://ujeebu.com",
    params={
        "js": True,
        "wait_for": 2000
    }
)
if response.status_code == 200:
    result = response.json()
    print(result['html'][:100])
  • Scrape with extraction rules using helper function:
# Extract product data using helper function
extract_rules = {
    "products": {
        "selector": ".product-card",
        "type": "obj",
        "multiple": True,
        "children": {
            "name": {"selector": ".title", "type": "text"},
            "price": {"selector": ".price", "type": "text"}
        }
    }
}

response = ujeebu.scrape_with_rules(
    "https://example.com/products",
    extract_rules=extract_rules,
    params={"js": True, "wait_for": 3000}
)
if response.status_code == 200:
    result = response.json()
    print(json.dumps(result['result'], indent=2))

Using SERP Helper Methods

  • Google text search:
# Perform a text search
response = ujeebu.search_text(
    "Nikola Tesla",
    params={"results_count": 10, "lang": "en"}
)
if response.status_code == 200:
    result = response.json()
    for item in result['organic_results']:
        print(f"{item['title']}: {item['link']}")
  • Google news search:
# Search for news articles
response = ujeebu.search_news(
    "Donald Trump",
    params={"results_count": 20}
)
if response.status_code == 200:
    result = response.json()
    for news in result['news']:
        print(f"{news['title']}: {news['link']}")
  • Google images search:
# Search for images
response = ujeebu.search_images(
    "Coffee",
    params={"results_count": 10}
)
if response.status_code == 200:
    result = response.json()
    for image in result['images']:
        print(f"{image['title']}: {image['image']}")
  • Google videos search:
# Search for videos
response = ujeebu.search_videos(
    "Bitcoin",
    params={"results_count": 10}
)
if response.status_code == 200:
    result = response.json()
    for video in result['videos']:
        print(f"{video['title']}: {video['url']}")
  • Google Maps search:
# Search for places on Google Maps
response = ujeebu.search_maps(
    "Italian restaurant",
    params={"results_count": 10, "location": "ca"}
)
if response.status_code == 200:
    result = response.json()
    for place in result['maps_results']:
        print(f"{place['title']} - Rating: {place['rating']}")

Using Account Method

# Get account information
response = ujeebu.account()
if response.status_code == 200:
    account_info = response.json()
    print(f"Plan: {account_info['plan']}")
    print(f"Used: {account_info['used']} / {account_info['quota']}")
    print(f"Used Percent: {account_info['used_percent']}%")

Contributing

Contributions are welcome! If you find a bug or have a feature request, please open an issue or submit a pull request.

License

This library is licensed under the MIT License. See the LICENSE file for more information.

Credits

This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ujeebu_python-0.1.5.tar.gz (15.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ujeebu_python-0.1.5-py2.py3-none-any.whl (8.0 kB view details)

Uploaded Python 2Python 3

File details

Details for the file ujeebu_python-0.1.5.tar.gz.

File metadata

  • Download URL: ujeebu_python-0.1.5.tar.gz
  • Upload date:
  • Size: 15.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for ujeebu_python-0.1.5.tar.gz
Algorithm Hash digest
SHA256 4ef743cd7f751ee967a01744eb02679441951b08ec10d63d765c93c09fa01a26
MD5 429b622042de15c2de65591b8e33e46b
BLAKE2b-256 9e23546663a1072c3f9571720cea86c519daa64ea547af03d395734b975ba6e6

See more details on using hashes here.

File details

Details for the file ujeebu_python-0.1.5-py2.py3-none-any.whl.

File metadata

  • Download URL: ujeebu_python-0.1.5-py2.py3-none-any.whl
  • Upload date:
  • Size: 8.0 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for ujeebu_python-0.1.5-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 eec2b355ac9c32541b97ce13d9f9a4a8890eb64c9e15727c38723f446f727624
MD5 280a8e6366eb7d48f5f6585234b276fd
BLAKE2b-256 331216e8563cc47504336efe90f8c74b22b56c83e0b43f43a0beaa28517066bf

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page