Skip to main content

Python SDK for Grabba API

Project description

Grabba Python SDK

Grabba Python SDK provides a simple and intuitive interface for scheduling web data extraction jobs, retrieving job results, and managing your extraction workflows.

Installation

Install the SDK using pip:

pip install grabba

Basic Setup

Import the client and required types

from grabba import Grabba, Job, JobNavigationType, JobSchedulePolicy, JobTaskType

Initialize a client instance

grabba = Grabba(api_key="your-api-key", region="US")  # Optional: Defaults to US

Methods

extract

extract(job: Job) -> Dict

Schedules a new web data extraction job.

Parameters:

  • job: A Job object containing the extraction configuration.

Returns:

  • A dictionary containing the response status, message, and jobResult.

Example:

job = Job(
    url="https://docs.grabba.dev/home",
    schedule={"policy": JobSchedulePolicy.IMMEDIATELY},
    navigation={"type": JobNavigationType.NONE},
    tasks=[{"type": JobTaskType.WEB_PAGE_AS_MARKDOWN}],
)

response = grabba.extract(job)
print(f"Job completed with status: {response['status']}")

schedule_job

schedule_job(job_id: str) -> Dict

Schedules an existing job for execution.

Parameters:

  • job_id: The ID of the job to schedule.

Returns:

  • A dictionary containing the response status, message, and jobResult.

Example:

response = grabba.schedule_job("12345")
print(f"Job completed with status: {response['status']}")

get_jobs

get_jobs() -> GetJobsResponse

Retrieves a list of all jobs associated with the API key.

Returns:

  • A list of Job objects.

Example:

jobs = grabba.get_jobs()
for job in jobs:
    print(job)

get_job

get_job(job_id: str) -> GetJobResponse

Retrieves details of a specific job by its ID.

Parameters:

  • job_id: The ID of the job to retrieve.

Returns:

  • A JobDetail object containing job details.

Example:

job = grabba.get_job("12345")
print(job)

get_job_result

get_job_result(job_result_id: str) -> JobResult

Retrieves the results of a specific job by its result ID.

Parameters:

  • job_result_id: The ID of the job result to retrieve.

Returns:

  • A JobResult object.

Example:

result = grabba.get_job_result("67890")
print(result)

get_available_regions

get_available_regions() -> List[Dict[str, PuppetRegion]]

Retrieves a list of available regions for Web Agent execution.

Returns:

  • A list of region objects.

Example:

regions = grabba.get_available_regions()
print(regions)

Types

Job

Represents a web data extraction job.

@dataclass
class Job:
    url: str
    tasks: List[JobTask]
    schedule: Optional[JobSchedule] = None
    navigation: Optional[JobNavigation] = None
    puppetConfig: Optional[WebAgentConfig] = None

JobTask

Represents a single task in an extraction job.

@dataclass
class JobTask:
    type: JobTaskType
    options: Optional[Union[SpecificDataExtractionOptions, WebpageAsMarkdownOptions, WebScreenCaptureOptions]] = None

JobTaskType

Enumeration of available job task types.

class JobTaskType(str, Enum):
    WEB_PAGE_AS_HTML = "webPageAsHTML"
    WEB_PAGE_METADATA = "webPageMetadata"
    WEB_SCREEN_CAPTURE = "webScreenCapture"
    WEB_PAGE_AS_MARKDOWN = "webPageAsMarkdown"
    SPECIFIC_DATA_EXTRACTION = "specificDataExtraction"

JobResult

Represents the result of a job.

@dataclass
class JobResult:
    id: str
    output: Dict[str, Dict]
    startTime: datetime
    stopTime: datetime
    duration: str

WebAgentConfig

Configuration for Web Agent.

@dataclass
class WebAgentConfig:
    region: PuppetRegion
    deviceType: Optional[PuppetDeviceType] = None
    viewport: Optional[Dict] = None

Error Handling

The SDK throws errors for:

  • Invalid API keys
  • Failed API requests
  • Missing or invalid parameters

Example:

try:
    response = grabba.extract(job)
    if response["status"] == "success":
        print("Results data:", response["output"]["data"])
    else:
        print("Error message:", response["message"])
except Exception as err:
    print("Error:", err)

Contributing

Contributions are welcome! Please open an issue or submit a pull request on GitHub.


License

This project is licensed under the MIT License. See the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

grabba-0.0.2.tar.gz (6.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

grabba-0.0.2-py3-none-any.whl (9.6 kB view details)

Uploaded Python 3

File details

Details for the file grabba-0.0.2.tar.gz.

File metadata

  • Download URL: grabba-0.0.2.tar.gz
  • Upload date:
  • Size: 6.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.1 CPython/3.12.3 Linux/6.11.0-17-generic

File hashes

Hashes for grabba-0.0.2.tar.gz
Algorithm Hash digest
SHA256 60a4d5a5f0ac781435bcd8828c7d38f81618bf942bc9861339e8b798d8bd3046
MD5 70439a4f111c3b25679ca2ea45b36a31
BLAKE2b-256 bd9ac656eb72e5d636b033b70921d2b72e0350fba6b25d95f96af097fe5f8caa

See more details on using hashes here.

File details

Details for the file grabba-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: grabba-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 9.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.1 CPython/3.12.3 Linux/6.11.0-17-generic

File hashes

Hashes for grabba-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 06ef60773fc4ce8a9ccaccbb2eaa2c80eb25e8b381029e9f0410efeb8e0c71fb
MD5 4ff590407edfbe2fe0c5d04eb0b6d242
BLAKE2b-256 769ad080255b754590050f80713023f50b52e8043fa950206bc2aadbe4f397f5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page