Skip to main content

Python SDK for Grabba API

Project description

Grabba Python SDK

Grabba Python SDK provides a simple and intuitive interface for scheduling web data extraction jobs, retrieving job results, and managing your extraction workflows.

Installation

Install the SDK using pip:

pip install grabba

Basic Setup

Import the client and required types

from grabba import Grabba, Job, JobNavigationType, JobSchedulePolicy, JobTaskType

Initialize a client instance

grabba = Grabba(api_key="your-api-key", region="US")  # Optional: Defaults to US

Methods

extract

extract(job: Job) -> Dict

Schedules a new web data extraction job.

Parameters:

  • job: A Job object containing the extraction configuration.

Returns:

  • A dictionary containing the response status, message, and job_result.

Example:

job = Job(
    url="https://docs.grabba.dev/home",
    schedule={"policy": JobSchedulePolicy.IMMEDIATELY.value},
    navigation={"type": JobNavigationType.NONE},
    tasks=[{
        "type": JobTaskType.WEB_PAGE_AS_MARKDOWN.value, 
        "options": { "onlyMainContent": True }
    }],
)

response = grabba.extract(job)
print(f"Job completed with status: {response['status']}")

schedule_job

schedule_job(job_id: str) -> Dict

Schedules an existing job for execution.

Parameters:

  • job_id: The ID of the job to schedule.

Returns:

  • A dictionary containing the response status, message, and job_result.

Example:

response = grabba.schedule_job("12345")
print(f"Job completed with status: {response['status']}")

get_jobs

get_jobs() -> GetJobsResponse

Retrieves a list of all jobs associated with the API key.

Returns:

  • A list of Job objects.

Example:

jobs = grabba.get_jobs()
for job in jobs:
    print(job)

get_job

get_job(job_id: str) -> GetJobResponse

Retrieves details of a specific job by its ID.

Parameters:

  • job_id: The ID of the job to retrieve.

Returns:

  • A JobDetail object containing job details.

Example:

job = grabba.get_job("12345")
print(job)

get_job_result

get_job_result(job_result_id: str) -> JobResult

Retrieves the results of a specific job by its result ID.

Parameters:

  • job_result_id: The ID of the job result to retrieve.

Returns:

  • A JobResult object.

Example:

result = grabba.get_job_result("67890")
print(result)

get_available_regions

get_available_regions() -> List[Dict[str, PuppetRegion]]

Retrieves a list of available regions for Web Agent execution.

Returns:

  • A list of region objects.

Example:

regions = grabba.get_available_regions()
print(regions)

Types

Job

Represents a web data extraction job.

@dataclass
class Job:
    url: str
    tasks: List[JobTask]
    schedule: Optional[JobSchedule] = None
    navigation: Optional[JobNavigation] = None
    puppet_config: Optional[WebAgentConfig] = None

JobTask

Represents a single task in an extraction job.

@dataclass
class JobTask:
    type: JobTaskType
    options: Optional[Union[SpecificDataExtractionOptions, WebpageAsMarkdownOptions, WebScreenCaptureOptions]] = None

JobTaskType

Enumeration of available job task types.

class JobTaskType(str, Enum):
    WEB_PAGE_AS_HTML = "webPageAsHTML"
    WEB_PAGE_METADATA = "webPageMetadata"
    WEB_SCREEN_CAPTURE = "webScreenCapture"
    WEB_PAGE_AS_MARKDOWN = "webPageAsMarkdown"
    SPECIFIC_DATA_EXTRACTION = "specificDataExtraction"

JobResult

Represents the result of a job.

@dataclass
class JobResult:
    id: str
    output: Dict[str, Dict]
    start_time: datetime
    stop_time: datetime
    duration: str

WebAgentConfig

Configuration for Web Agent.

@dataclass
class WebAgentConfig:
    region: PuppetRegion
    device_type: Optional[PuppetDeviceType] = None
    viewport: Optional[Dict] = None

Error Handling

The SDK throws errors for:

  • Invalid API keys
  • Failed API requests
  • Missing or invalid parameters

Example:

try:
    response = grabba.extract(job)
    if response["status"] == "success":
        print("Results data:", response["output"]["data"])
    else:
        print("Error message:", response["message"])
except Exception as err:
    print("Error:", err)

Contributing

Contributions are welcome! Please open an issue or submit a pull request on GitHub.


License

This project is licensed under the MIT License. See the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

grabba-0.0.3.tar.gz (8.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

grabba-0.0.3-py3-none-any.whl (10.6 kB view details)

Uploaded Python 3

File details

Details for the file grabba-0.0.3.tar.gz.

File metadata

  • Download URL: grabba-0.0.3.tar.gz
  • Upload date:
  • Size: 8.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.1 CPython/3.12.3 Linux/6.11.0-19-generic

File hashes

Hashes for grabba-0.0.3.tar.gz
Algorithm Hash digest
SHA256 03756fe2ab85bc2d1cf63cdd55de57195f1a4c05531c1cbfbd34d03679d2967d
MD5 f01c4d11473446b4c119279439fc9232
BLAKE2b-256 fa09163fb2325426e78fa2a017e2fe5476fafc6c97f10c972f32294ab1ac522e

See more details on using hashes here.

File details

Details for the file grabba-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: grabba-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 10.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.1 CPython/3.12.3 Linux/6.11.0-19-generic

File hashes

Hashes for grabba-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 b85b9f920a95bfc0a78d825be9c9a72ee329221456b09169cd762075a31dd84a
MD5 c0fea4e873f382ca64317af2fe7bf316
BLAKE2b-256 461cc0cde5f7d00f14443aadcd363959634c1017179b251e0028294214595917

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page