Python SDK for Grabba API
Project description
Grabba Python SDK
Grabba Python SDK provides a simple and intuitive interface for scheduling web data extraction jobs, retrieving job results, and managing your extraction workflows. All SDK types are now implemented as Pydantic BaseModels for full JSON compatibility and built-in validation.
Installation
Install the SDK using pip:
pip install grabba
Basic Setup
Import the Client and Required Types
from grabba import Grabba, Job, JobNavigationType, JobSchedulePolicy, JobTaskType
Note: All types such as
Job,JobTask, etc., are now Pydantic BaseModels. This means that they support methods like.model_dump()and.json()for serialization, and Enum fields are automatically converted to their literal values during JSON encoding.
Initialize a Client Instance
grabba = Grabba(api_key="your-api-key", region="US") # Optional: Defaults to US
Methods
extract
extract(job: Job) -> Dict
Schedules a new web data extraction job.
Parameters:
job: AJobobject containing the extraction configuration.
Returns:
- A dictionary containing the response
status,message, andjob_result.
Example:
from grabba import Job, JobSchedulePolicy, JobNavigationType, JobTaskType
job = Job(
url="https://docs.grabba.dev/home",
schedule={
"policy": JobSchedulePolicy.IMMEDIATELY # Enum values will be serialized as literals
},
navigation={
"type": JobNavigationType.NONE
},
tasks=[{
"type": JobTaskType.WEB_PAGE_AS_MARKDOWN,
"options": {"only_main_content": True}
}],
)
# Note: Since Job is a Pydantic model, you may also print its JSON representation:
print(job.json())
response = grabba.extract(job)
print(f"Job completed with status: {response['status']}")
schedule_job
schedule_job(job_id: str) -> Dict
Schedules an existing job for execution.
Parameters:
job_id: The ID of the job to schedule.
Returns:
- A dictionary containing the response
status,message, andjob_result.
Example:
response = grabba.schedule_job("12345")
print(f"Job completed with status: {response['status']}")
get_jobs
get_jobs() -> GetJobsResponse
Retrieves a list of all jobs associated with the API key.
Returns:
- A list of
Jobobjects.
Example:
jobs = grabba.get_jobs()
for job in jobs:
print(job.model_dump())
get_job
get_job(job_id: str) -> GetJobResponse
Retrieves details of a specific job by its ID.
Parameters:
job_id: The ID of the job to retrieve.
Returns:
- A
JobDetailobject containing job details.
Example:
job = grabba.get_job("12345")
print(job.model_dump())
get_job_result
get_job_result(job_result_id: str) -> JobResult
Retrieves the results of a specific job by its result ID.
Parameters:
job_result_id: The ID of the job result to retrieve.
Returns:
- A
JobResultobject.
Example:
result = grabba.get_job_result("67890")
print(result.model_dump())
delete_job (New in 0.0.4)
delete_job(job_id: str) -> Dict
Deletes a specific job by its ID.
Parameters:
job_id: The ID of the job to delete.
Returns:
- A dictionary containing the response
statusandmessage.
Example:
response = grabba.delete_job("12345")
print(f"Job deletion status: {response['status']}")
Error Handling
The SDK throws errors for:
- Invalid API keys
- Failed API requests
- Missing or invalid parameters
Example:
try:
response = grabba.extract(job)
if response["status"] == "success":
print("Results data:", response["job_result"]["data"])
else:
print("Error message:", response["message"])
except Exception as err:
print("Error:", err)
Contributing
Contributions are welcome! Please open an issue or submit a pull request on GitHub.
License
This project is licensed under the MIT License. See the LICENSE file for details.
Additional Notes
- Pydantic Serialization:
All SDK models are now Pydantic BaseModels, so you can use.json()or.model_dump()to serialize your models with Enums automatically converted to their literal values. - Enum Configuration:
In our BaseModels, we usejson_encodersin theConfigclass to ensure Enum fields (such asJobSchedulePolicyandJobTaskType) are output as their literal values. - Type Safety:
With Pydantic, all input data is validated against the model definitions, which helps catch errors early.
Feel free to adjust these examples as needed, and let us know if you have any questions or further changes!
Change Log
Version 0.0.4 (Latest)
- Improved Job Task Handling: Enhanced task validation and error handling for better reliability.
- New API Method - ``: Now you can delete jobs using their
job_id. - Performance Optimizations: Improved response times by optimizing API requests and serialization.
- Bug Fixes: Fixed minor serialization issues with nested Pydantic models.
- Added
delete_job(job_id: str) -> BaseResponse - Added
delete_job_result(job_result_id: str) -> BaseResponse
Version 0.0.3
- Pydantic Integration: All SDK types are now Pydantic BaseModels, enabling JSON serialization and validation.
- Enum Field Optimization: Enums are automatically serialized to their literal values.
- Better Error Handling: More descriptive error messages and improved exception handling.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file grabba-0.0.7.tar.gz.
File metadata
- Download URL: grabba-0.0.7.tar.gz
- Upload date:
- Size: 11.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.1 CPython/3.12.3 Linux/6.11.0-26-generic
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
513afc05726ba488aa3f16530ea3eee26238c1fe47a1998c13b91cc2afab5c59
|
|
| MD5 |
ed2551188614a7139d07130ce265d5a4
|
|
| BLAKE2b-256 |
7dde07c8f32f0ea149be3777c5f955725517dbda6c4c46be307338f1522c36fa
|
File details
Details for the file grabba-0.0.7-py3-none-any.whl.
File metadata
- Download URL: grabba-0.0.7-py3-none-any.whl
- Upload date:
- Size: 13.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.1 CPython/3.12.3 Linux/6.11.0-26-generic
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f05a6a0d3680b65d02fb8fa468a9add10d455a5126add70cdf9fc9ceda3bf23c
|
|
| MD5 |
46dec5281a691d4d293343ecec75a85d
|
|
| BLAKE2b-256 |
6827b3168c3d4fef3b0585c56c92d544e11ef138bc8221d06da9997857626936
|