Set of utilities for interacting with Google Cloud Platform
Project description
GCP Pal Library
The gcp-pal
library provides a set of utilities for interacting with Google Cloud Platform (GCP) services, streamlining the process of implementing GCP functionalities within your Python applications.
The utilities are designed to work with the google-cloud
Python libraries, providing a more user-friendly and intuitive interface for common tasks.
- Source code: https://github.com/VitaminB16/gcp-pal
- PyPI: https://pypi.org/project/gcp-pal/
Table of Contents
Module | Python Class |
---|---|
Firestore | gcp_pal.Firestore |
BigQuery | gcp_pal.BigQuery |
Storage | gcp_pal.Storage |
Cloud Functions | gcp_pal.CloudFunctions |
Cloud Run | gcp_pal.CloudRun |
Docker | gcp_pal.Docker |
Logging | gcp_pal.Logging |
Secret Manager | gcp_pal.SecretManager |
Cloud Scheduler | gcp_pal.CloudScheduler |
Project | gcp_pal.Project |
Dataplex | gcp_pal.Dataplex |
Artifact Registry | gcp_pal.ArtifactRegistry |
PubSub | gcp_pal.PubSub |
Request | gcp_pal.Request |
Schema | gcp_pal.Schema |
Parquet | gcp_pal.Parquet |
Installation
The package is available on PyPI as gcp-pal
. To install with pip
:
pip install gcp-pal
The library has module-specific dependencies. These can be installed via pip install gcp-pal[ModuleName]
, e.g.:
pip install gcp-pal[BigQuery]
# Installing 'google-cloud-bigquery'
pip install gcp-pal[CloudRun]
# Installing 'google-cloud-run' and 'docker'
To install all optional dependencies:
pip install gcp-pal[all]
The modules are also set up to notify the user if any required libraries are missing. For example, when attempting to use the Firestore
module:
from gcp_pal import Firestore
Firestore()
# ImportError: Module 'Firestore' requires 'google.cloud.firestore' (PyPI: 'google-cloud-firestore') to be installed.
Which lets the user know that the google-cloud-firestore
package is required to use the Firestore
module.
Configuration
Before you can start using the gcp-pal
library with Firestore or any other GCP services, make sure you either have set up your GCP credentials properly or have the necessary permissions to access the services you want to use:
gcloud auth application-default login
And specify the project ID to be used as the default for all API requests:
gcloud config set project PROJECT_ID
You can also specify the default variables such as project ID and location using environmental variables. The reserved variables are GCP_PAL_PROJECT
and GCP_PAL_PROJECT
:
export GCP_PROJECT_ID=project-id
export GCP_LOCATION=us-central1
The order of precendece is as follows:
1. Keyword arguments (e.g. BigQuery(project="project-id"))
2. Environmental variables (e.g. export GCP_PROJECT_ID=project-id)
3. Default project set in gcloud (e.g. gcloud config set project project-id)
4. None
Firestore Module
The Firestore module in the gcp-pal
library allows you to perform read and write operations on Firestore documents and collections.
Initializing Firestore
First, import the Firestore class from the gcp_pal
module:
from gcp_pal import Firestore
Writing Data to Firestore
To write data to a Firestore document, create a dictionary with your data, specify the path to your document, and use the write
method:
data = {
"field1": "value1",
"field2": "value2"
}
path = "collection/document"
Firestore(path).write(data)
Reading Data from Firestore
To read a single document from Firestore, specify the document's path and use the read
method:
path = "collection/document"
document = Firestore(path).read()
print(document)
# Output: {'field1': 'value1', 'field2': 'value2'}
Reading All Documents in a Collection
To read all documents within a specific collection, specify the collection's path and use the read
method:
path = "collection"
documents = Firestore(path).read()
print(documents)
# Output: {'document': {'field1': 'value1', 'field2': 'value2'}}
Working with Pandas DataFrames
The Firestore module also supports writing and reading Pandas DataFrames, preserving their structure and data types:
import pandas as pd
# Example DataFrame
df = pd.DataFrame({
"field1": ["value1"],
"field2": ["value2"]
})
path = "collection/document"
Firestore(path).write(df)
read_df = Firestore(path).read()
print(read_df)
# Output:
# field1 field2
# 0 value1 value2
List the Firestore documents and collections
To list all documents and collections within a Firestore database, use the ls
method similar to bash:
colls = Firestore().ls()
print(colls)
# Output: ['collection']
docs = Firestore("collection").ls()
print(docs)
# Output: ['document1', 'document2']
BigQuery Module
The BigQuery module in the gcp-pal
library allows you to perform read and write operations on BigQuery datasets and tables.
Initializing BigQuery
Import the BigQuery class from the gcp_pal
module:
from gcp_pal import BigQuery
Listing objects
To list all objects (datasets and tables) within a BigQuery project, use the ls
method similar to bash:
datasets = BigQuery().ls()
print(datasets)
# Output: ['dataset1', 'dataset2']
tables = BigQuery(dataset="dataset1").ls()
print(tables)
# Output: ['table1', 'table2']
Creating objects
To create an object (dataset or table) within a BigQuery project, initialize the BigQuery class with the object's path and use the create
method:
BigQuery(dataset="new-dataset").create()
# Output: Dataset "new-dataset" created
BigQuery("new-dataset2.new-table").create(schema=schema)
# Output: Dataset "new-dataset2" created, table "new-dataset2.new-table" created
To create a table from a Pandas DataFrame, pass the DataFrame to the create
method:
df = pd.DataFrame({
"field1": ["value1"],
"field2": ["value2"]
})
BigQuery("new-dataset3.new-table").create(data=df)
# Output: Dataset "new-dataset3" created, table "new-dataset3.new-table" created, data inserted
Deleting objects
Deleting objects is similar to creating them, but you use the delete
method instead:
BigQuery(dataset="dataset").delete()
# Output: Dataset "dataset" and all its tables deleted
BigQuery("dataset.table").delete()
# Output: Table "dataset.table" deleted
Querying data
To read data from a BigQuery table, use the query
method:
query = "SELECT * FROM dataset.table"
data = BigQuery().query(query)
print(data)
# Output: [{'field1': 'value1', 'field2': 'value2'}]
Alternatively, there is a simple read method to read the data from a table with the given columns
, filters
and limit
:
data = BigQuery("dataset.table").read(
columns=["field1"],
filters=[("field1", "=", "value1")],
limit=1,
to_dataframe=True,
)
print(data)
# Output: pd.DataFrame({'field1': ['value1']})
By default, the read
method returns a Pandas DataFrame, but you can also get the data as a list of dictionaries by setting the to_dataframe
parameter to False
.
Inserting data
To insert data into a BigQuery table, use the insert
method:
data = {
"field1": "value1",
"field2": "value2"
}
BigQuery("dataset.table").insert(data)
# Output: Data inserted
External tables
One can also create BigQuery external tables by specifying the file path:
file_path = "gs://bucket/file.parquet"
BigQuery("dataset.external_table").create(file_path)
# Output: Dataset "dataset" created, external table "dataset.external_table" created
The allowed file formats are CSV, JSON, Avro, Parquet (single and partitioned), ORC.
Storage Module
The Storage module in the gcp-pal
library allows you to perform read and write operations on Google Cloud Storage buckets and objects.
Initializing Storage
Import the Storage class from the gcp_pal
module:
from gcp_pal import Storage
Listing objects
Similar to the other modules, listing objects in a bucket is done using the ls
method:
buckets = Storage().ls()
print(buckets)
# Output: ['bucket1', 'bucket2']
objects = Storage("bucket1").ls()
print(objects)
# Output: ['object1', 'object2']
Creating buckets
To create a bucket, use the create
method:
Storage("new-bucket").create()
# Output: Bucket "new-bucket" created
Deleting objects
Deleting objects is similar to creating them, but you use the delete
method instead:
Storage("bucket").delete()
# Output: Bucket "bucket" and all its objects deleted
Storage("bucket/object").delete()
# Output: Object "object" in bucket "bucket" deleted
Uploading and downloading objects
To upload an object to a bucket, use the upload
method:
Storage("bucket/uploaded_file.txt").upload("local_file.txt")
# Output: File "local_file.txt" uploaded to "bucket/uploaded_file.txt"
To download an object from a bucket, use the download
method:
Storage("bucket/uploaded_file.txt").download("downloaded_file.txt")
# Output: File "bucket/uploaded_file.txt" downloaded to "downloaded_file.txt"
Cloud Functions Module
The Cloud Functions module in the gcp-pal
library allows you to deploy and manage Cloud Functions.
Initializing Cloud Functions
Import the CloudFunctions
class from the gcp_pal
module:
from gcp_pal import CloudFunctions
Deploying Cloud Functions
To deploy a Cloud Function, specifty the function's name, the source codebase, the entry point and any other parameters that are to be passed to BuildConfig
, ServiceConfig
and Function
(see docs):
CloudFunctions("function-name").deploy(
path="path/to/function_codebase",
entry_point="main",
environment=2,
)
Deploying a Cloud Function from a local source depends on the gcp_toole.Storage
module. By default, the source codebase is uploaded to the gcf-v2-sources-{PROJECT_NUMBER}-{REGION}
bucket and is deployed from there. An alternative bucket can be specified via the source_bucket
parameter:
CloudFunctions("function-name").deploy(
path="path/to/function_codebase",
entry_point="main",
environment=2,
source_bucket="bucket-name",
)
Listing Cloud Functions
To list all Cloud Functions within a project, use the ls
method:
functions = CloudFunctions().ls()
print(functions)
# Output: ['function1', 'function2']
Deleting Cloud Functions
To delete a Cloud Function, use the delete
method:
CloudFunctions("function-name").delete()
# Output: Cloud Function "function-name" deleted
Invoking Cloud Functions
To invoke a Cloud Function, use the invoke
(or call
) method:
response = CloudFunctions("function-name").invoke({"key": "value"})
print(response)
# Output: {'output_key': 'output_value'}
Getting Cloud Function details
To get the details of a Cloud Function, use the get
method:
details = CloudFunctions("function-name").get()
print(details)
# Output: {'name': 'projects/project-id/locations/region/functions/function-name',
# 'build_config': {...}, 'service_config': {...}, 'state': {...}, ... }
Using service accounts
Service account email can be specified either within the constructor or via the service_account
parameter:
CloudFunctions("function-name", service_account="account@email.com").deploy(**kwargs)
# or
CloudFunctions("function-name").deploy(service_account="account@email.com", **kwargs)
Cloud Run Module
The Cloud Run module in the gcp-pal
library allows you to deploy and manage Cloud Run services.
Initializing Cloud Run
Import the CloudRun
class from the gcp_pal
module:
from gcp_pal import CloudRun
Deploying Cloud Run services
CloudRun("test-app").deploy(path="samples/cloud_run")
# Output:
# - Docker image "test-app" built based on "samples/cloud_run" codebase and "samples/cloud_run/Dockerfile".
# - Docker image "test-app" pushed to Google Container Registry as "gcr.io/{PROJECT_ID}/test-app:random_tag".
# - Cloud Run service "test-app" deployed from "gcr.io/{PROJECT_ID}/test-app:random_tag".
The default tag is a random string but can be specified via the image_tag
parameter:
CloudRun("test-app").deploy(path="samples/cloud_run", image_tag="5fbd72c")
# Output: Cloud Run service deployed
Listing Cloud Run services
To list all Cloud Run services within a project, use the ls
method:
services = CloudRun().ls()
print(services)
# Output: ['service1', 'service2']
To list the job, set the job
parameter to True
:
jobs = CloudRun(job=True).ls()
print(jobs)
# Output: ['job1', 'job2']
Deleting Cloud Run services
To delete a Cloud Run service, use the delete
method:
CloudRun("service-name").delete()
# Output: Cloud Run service "service-name" deleted
Similarly to delete a job, set the job
parameter to True
:
CloudRun("job-name", job=True).delete()
Invoking Cloud Run services
To invoke a Cloud Run service, use the invoke
/call
method:
response = CloudRun("service-name").invoke({"key": "value"})
print(response)
# Output: {'output_key': 'output_value'}
Getting Cloud Run service details
To get the details of a Cloud Run service, use the get
method:
details = CloudRun("service-name").get()
print(details)
# Output: ...
To get the status of a Cloud Run service, use the status
/state
method:
service_status = CloudRun("service-name").status()
print(service_status)
# Output: Active
job_status = CloudRun("job-name", job=True).status()
print(job_status)
# Output: Active
Using service accounts
Service account email can be specified either within the constructor or via the service_account
parameter:
CloudRun("run-name", service_account="account@email.com").deploy(**kwargs)
# or
CloudRun("run-name").deploy(service_account="account@email.com", **kwargs)
Docker Module
The Docker module in the gcp-pal
library allows you to build and push Docker images to Google Container Registry.
Initializing Docker
Import the Docker class from the gcp_pal
module:
from gcp_pal import Docker
Building Docker images
Docker("image-name").build(path="path/to/context", dockerfile="Dockerfile")
# Output: Docker image "image-name:latest" built based on "path/to/context" codebase and "path/to/context/Dockerfile".
The default tag
is "latest"
but can be specified via the tag
parameter:
Docker("image-name", tag="5fbd72c").build(path="path/to/context", dockerfile="Dockerfile")
# Output: Docker image "image-name:5fbd72c" built based on "path/to/context" codebase and "path/to/context/Dockerfile".
Pushing Docker images
Docker("image-name").push()
# Output: Docker image "image-name" pushed to Google Container Registry.
The default destination is "gcr.io/{PROJECT_ID}/{IMAGE_NAME}:{TAG}"
but can be specified via the destination
parameter:
Docker("image-name").push(destination="gcr.io/my-project/image-name:5fbd72c")
# Output: Docker image "image-name" pushed to "gcr.io/my-project/image-name:5fbd72c".
Logging Module
The Logging module in the gcp-pal
library allows you to access and manage logs from Google Cloud Logging.
Initializing Logging
Import the Logging class from the gcp_pal
module:
from gcp_pal import Logging
Listing logs
To list all logs within a project, use the ls
method:
logs = Logging().ls(limit=2)
for log in logs:
print(log)
# Output: LogEntry - [2024-04-16 17:30:04.308 UTC] {Message payload}
Where each entry is a LogEntry
object with the following attributes: project
, log_name
, resource
, severity
, message
, timestamp
, time_zone
, timestamp_str
.
The message
attribute is the main payload of the log entry.
Filtering logs
To filter logs based on a query, use the filter
method:
logs = Logging().ls(filter="severity=ERROR")
# Output: [LogEntry - [2024-04-16 17:30:04.308 UTC] {Message payload}, ...]
Some common filters are also supported natively: severity
(str), time_start
(str), time_end
(str), time_range
(int: hours). For example, the following are equivalent:
# Time now: 2024-04-16 17:30:04.308 UTC
logs = Logging().ls(filter="severity=ERROR AND time_start=2024-04-16T16:30:04.308Z AND time_end=2024-04-16T17:30:04.308Z")
logs = Logging().ls(severity="ERROR", time_start="2024-04-16T16:30:04.308Z", time_end="2024-04-16T17:30:04.308Z")
logs = Logging().ls(severity="ERROR", time_range=1)
Streaming logs
To stream logs in real-time, use the stream
method:
Logging().stream()
# LogEntry - [2024-04-16 17:30:04.308 UTC] {Message payload}
# LogEntry - [2024-04-16 17:30:05.308 UTC] {Message payload}
# ...
Secret Manager Module
The Secret Manager module in the gcp-pal
library allows you to access and manage secrets from Google Cloud Secret Manager.
Initializing Secret Manager
Import the SecretManager class from the gcp_pal
module:
from gcp_pal import SecretManager
Creating secrets
To create a secret, specify the secret's name and value:
SecretManager("secret1").create("value1", labels={"env": "dev"})
# Output: Secret 'secret1' created
Listing secrets
To list all secrets within a project, use the ls
method:
secrets = SecretManager().ls()
print(secrets)
# Output: ['secret1', 'secret2']
The ls
method also supports filtering secrets based on filter
or label
parameters:
secrets = SecretManager().ls(filter="name:secret1")
print(secrets)
# Output: ['secret1']
secrets = SecretManager().ls(label="env:*")
print(secrets)
# Output: ['secret1', 'secret2']
Accessing secrets
To access the value of a secret, use the value
method:
value = SecretManager("secret1").value()
print(value)
# Output: 'value1'
Deleting secrets
To delete a secret, use the delete
method:
SecretManager("secret1").delete()
# Output: Secret 'secret1' deleted
Cloud Scheduler Module
The Cloud Scheduler module in the gcp-pal
library allows you to create and manage Cloud Scheduler jobs.
Initializing Cloud Scheduler
Import the CloudScheduler class from the gcp_pal
module:
from gcp_pal import CloudScheduler
Creating Cloud Scheduler jobs
To create a Cloud Scheduler job, specify the job's name in the constructor, and use the create
method to set the schedule and target:
CloudScheduler("job-name").create(
schedule="* * * * *",
time_zone="UTC",
target="https://example.com/api",
payload={"key": "value"},
)
# Output: Cloud Scheduler job "job-name" created with HTTP target "https://example.com/api"
If the target
is not an HTTP endpoint, it will be treated as a PubSub topic:
CloudScheduler("job-name").create(
schedule="* * * * *",
time_zone="UTC",
target="pubsub-topic-name",
payload={"key": "value"},
)
# Output: Cloud Scheduler job "job-name" created with PubSub target "pubsub-topic-name"
Additionally, service_account
can be specified to add the OAuth and OIDC tokens to the request:
CloudScheduler("job-name").create(
schedule="* * * * *",
time_zone="UTC",
target="https://example.com/api",
payload={"key": "value"},
service_account="PROJECT@PROJECT.iam.gserviceaccount.com",
)
# Output: Cloud Scheduler job "job-name" created with HTTP target "https://example.com/api" and OAuth+OIDC tokens
Listing Cloud Scheduler jobs
To list all Cloud Scheduler jobs within a project, use the ls
method:
jobs = CloudScheduler().ls()
print(jobs)
# Output: ['job1', 'job2']
Deleting Cloud Scheduler jobs
To delete a Cloud Scheduler job, use the delete
method:
CloudScheduler("job-name").delete()
# Output: Cloud Scheduler job "job-name" deleted
Managing Cloud Scheduler jobs
To pause or resume a Cloud Scheduler job, use the pause
or resume
methods:
CloudScheduler("job-name").pause()
# Output: Cloud Scheduler job "job-name" paused
CloudScheduler("job-name").resume()
# Output: Cloud Scheduler job "job-name" resumed
To run a Cloud Scheduler job immediately, use the run
method:
CloudScheduler("job-name").run()
# Output: Cloud Scheduler job "job-name" run
If the job is paused, it will be resumed before running. To prevent this, set the force
parameter to False
:
CloudScheduler("job-name").run(force=False)
# Output: Cloud Scheduler job "job-name" not run if it is paused
Using service accounts
Service account email can be specified either within the constructor or via the service_account
parameter:
CloudScheduler("job-name", service_account="account@email.com").create(**kwargs)
# or
CloudScheduler("job-name").create(service_account="account@email.com", **kwargs)
Project Module
The Project module in the gcp-pal
library allows you to access and manage Google Cloud projects.
Initializing Project
Import the Project class from the gcp_pal
module:
from gcp_pal import Project
Listing projects
To list all projects available to the authenticated user, use the ls
method:
projects = Project().ls()
print(projects)
# Output: ['project1', 'project2']
Creating projects
To create a new project, use the create
method:
Project("new-project").create()
# Output: Project "new-project" created
Deleting projects
To delete a project, use the delete
method:
Project("project-name").delete()
# Output: Project "project-name" deleted
Google Cloud will delete the project after 30 days. During this time, to undelete a project, use the undelete
method:
Project("project-name").undelete()
# Output: Project "project-name" undeleted
Getting project details
To get the details of a project, use the get
method:
details = Project("project-name").get()
print(details)
# Output: {'name': 'projects/project-id', 'project_id': 'project-id', ...}
To obtain the project number use the number
method:
project_number = Project("project-name").number()
print(project_number)
# Output: "1234567890"
Dataplex Module
The Dataplex module in the gcp-pal
library allows you to interact with Dataplex services.
Initializing Dataplex
Import the Dataplex class from the gcp_pal
module:
from gcp_pal import Dataplex
Listing Dataplex objects
The Dataplex module supports listing all lakes, zones, and assets within a Dataplex instance:
lakes = Dataplex().ls()
print(lakes)
# Output: ['lake1', 'lake2']
zones = Dataplex("lake1").ls()
print(zones)
# Output: ['zone1', 'zone2']
assets = Dataplex("lake1/zone1").ls()
print(assets)
# Output: ['asset1', 'asset2']
Creating Dataplex objects
To create a lake, zone, or asset within a Dataplex instance, use the create_lake
, create_zone
, and create_asset
methods.
To create a lake:
Dataplex("lake1").create_lake()
# Output: Lake "lake1" created
To create a zone (zone type and location type are required):
Dataplex("lake1/zone1").create_zone(zone_type="raw", location_type="single-region")
# Output: Zone "zone1" created in Lake "lake1"
To create an asset (asset source and asset type are required):
Dataplex("lake1/zone1").create_asset(asset_source="dataset_name", asset_type="bigquery")
# Output: Asset "asset1" created in Zone "zone1" of Lake "lake1"
Deleting Dataplex objects
Deleting objects can be done using a single delete
method:
Dataplex("lake1/zone1/asset1").delete()
# Output: Asset "asset1" deleted
Dataplax("lake1/zone1").delete()
# Output: Zone "zone1" and all its assets deleted
Dataplex("lake1").delete()
# Output: Lake "lake1" and all its zones and assets deleted
Artifact Registry
The Artifact Registry module in the gcp-pal
library allows you to interact with Artifact Registry services.
Initializing Artifact Registry
Import the ArtifactRegistry class from the gcp_pal
module:
from gcp_pal import ArtifactRegistry
Listing Artifact Registry objects
The objects within Artifact Registry module follow the hierarchy: repositories > packages > versions > tags.
To list all repositories within a project, use the ls
method:
repositories = ArtifactRegistry().ls()
print(repositories)
# Output: ['repo1', 'repo2']
To list all packages (or "images") within a repository, use the ls
method with the repository name:
images = ArtifactRegistry("repo1").ls()
print(images)
# Output: ['image1', 'image2']
To list all versions of a package, use the ls
method with the repository and package names:
versions = ArtifactRegistry("repo1/image1").ls()
print(versions)
# Output: ['repo1/image1/sha256:version1', 'repo1/image1/sha256:version2']
To list all tags of a version, use the ls
method with the repository, package, and version names:
tags = ArtifactRegistry("repo1/image1/sha256:version1").ls()
print(tags)
# Output: ['repo1/image1/tag1', 'repo1/image1/tag2']
Creating Artifact Registry objects
To create a repository, use the create_repository
method with the repository name:
ArtifactRegistry("repo1").create_repository()
# Output: Repository "repo1" created
Some additional parameters can be specified within the method, such as format ("docker"
or "maven"
), mode ('standard'
, 'remote'
or 'virtual'
).
To create a tag, use the create_tag
method with the repository, package, version, and tag names:
ArtifactRegistry("repo1/image1/sha256:version1").create_tag("tag1")
# Output: Tag "tag1" created for version "version1" of package "image1" in repository "repo1"
Deleting Artifact Registry objects
Deleting objects can be done using a single delete
method:
ArtifactRegistry("repo1/image1:tag1").delete()
# Output: Tag "tag1" deleted for package "image1" in repository "repo1"
ArtifactRegistry("repo1/image1/sha256:version1").delete()
# Output: Version "version1" deleted for package "image1" in repository "repo1"
ArtifactRegistry("repo1/image1").delete()
# Output: Package "image1" deleted in repository "repo1"
ArtifactRegistry("repo1").delete()
# Output: Repository "repo1" deleted
PubSub Module
The PubSub module in the gcp-pal
library allows you to publish and subscribe to PubSub topics.
Initializing PubSub
First, import the PubSub class from the gcp_pal
module:
from gcp_pal import PubSub
The PubSub
prefers to take the path
argument in the format project/topic/subscription
:
PubSub("my-project/my-topic/my-subscription")
Alternatively, you can specify the project and topic/subscription separately:
PubSub(project="my-project", topic="my-topic", subscription="my-subscription")
Listing objects
To list all topics within a project or all subscriptions within a topic, use the ls
method:
topics = PubSub("my-project").ls()
# Output: ['topic1', 'topic2']
subscriptions = PubSub("my-project/topic1").ls()
# Output: ['subscription1', 'subscription2']
Or to list all subscriptions within a project:
subscriptions = PubSub("my-project").ls_subscriptions()
# Output: ['subscription1', 'subscription2', ...]
Creating objects
To create a PubSub topic, use the create
method:
PubSub("my-project/new-topic").create()
# Output: PubSub topic "new-topic" created
To create a PubSub subscription, use the create
method with the topic
parameter:
PubSub("my-project/my-topic/new-subscription").create()
Deleting objects
To delete a PubSub topic or subscription, use the delete
method:
PubSub("my-project/topic/subscription").delete()
# Output: PubSub subscription "subscription" deleted
PubSub("my-project/topic").delete()
# Output: PubSub topic "topic" deleted
To delete a subscription without specifying the topic, use the subscription
parameter:
PubSub(subscription="my-project/subscription").delete()
# Output: PubSub subscription "subscription" deleted
Publishing Messages to a Topic
To publish a message to a PubSub topic, specify the topic's name and the message you want to publish:
topic = "topic-name"
message = "Hello, PubSub!"
PubSub(topic).publish(message)
Request Module
The Request module in the gcp-pal
library allows you to make authorized HTTP requests.
Initializing Request
Import the Request class from the gcp_pal
module:
from gcp_pal import Request
Making Authorized Get/Post/Put Requests
To make an authorized requests, specify the URL you want to access and use the relevant method:
url = "https://example.com/api"
get_response = Request(url).get()
print(get_response)
# Output: <Response [200]>
post_response = Request(url).post(data={"key": "value"})
print(post_response)
# Output: <Response [201]>
put_response = Request(url).put(data={"key": "value"})
print(put_response)
# Output: <Response [200]>
Using service accounts
Specify the service account email to make requests on behalf of a service account within the constructor:
Request(url, service_account="account@email.com").get()
Schema Module
The Schema module is not strictly GCP-related, but it is a useful utility. It allows one to translate schemas between different formats, such as Python, PyArrow, BigQuery, and Pandas.
Initializing Schema
Import the Schema
class from the gcp_pal
module:
from gcp_pal.schema import Schema
Translating schemas
To translate a schema from one format to another, use the respective methods:
str_schema = {
"a": "int",
"b": "str",
"c": "float",
"d": {
"d1": "datetime",
"d2": "timestamp",
},
}
python_schema = Schema(str_schema).str()
# {
# "a": int,
# "b": str,
# "c": float,
# "d": {
# "d1": datetime,
# "d2": datetime,
# },
# }
pyarrow_schema = Schema(str_schema).pyarrow()
# pa.schema(
# [
# pa.field("a", pa.int64()),
# pa.field("b", pa.string()),
# pa.field("c", pa.float64()),
# pa.field("d", pa.struct([
# pa.field("d1", pa.timestamp("ns")),
# pa.field("d2", pa.timestamp("ns")),
# ])),
# ]
# )
bigquery_schema = Schema(str_schema).bigquery()
# [
# bigquery.SchemaField("a", "INTEGER"),
# bigquery.SchemaField("b", "STRING"),
# bigquery.SchemaField("c", "FLOAT"),
# bigquery.SchemaField("d", "RECORD", fields=[
# bigquery.SchemaField("d1", "DATETIME"),
# bigquery.SchemaField("d2", "TIMESTAMP"),
# ]),
# ]
pandas_schema = Schema(str_schema).pandas()
# {
# "a": "int64",
# "b": "object",
# "c": "float64",
# "d": {
# "d1": "datetime64[ns]",
# "d2": "datetime64[ns]",
# },
# }
Infering schemas
To infer and translate a schema from a dictionary of data or a Pandas DataFrame, use the is_data
parameter:
df = pd.DataFrame(
{
"a": [1, 2, 3],
"b": ["a", "b", "c"],
"c": [1.0, 2.0, 3.0],
"date": [datetime.datetime.now() for _ in range(3)],
}
)
inferred_schema = Schema(df, is_data=True).schema
# {
# "a": "int",
# "b": "str",
# "c": "float",
# "date": "datetime",
# }
pyarrow_schema = Schema(df, is_data=True).pyarrow()
# pa.schema(
# [
# pa.field("a", pa.int64()),
# pa.field("b", pa.string()),
# pa.field("c", pa.float64()),
# pa.field("date", pa.timestamp("ns")),
# ]
# )
Parquet Module
The Parquet module in the gcp-pal
library allows you to read and write Parquet files in Google Cloud Storage. The gcp_pal.Storage
module uses this module to read and write Parquet files to and from Google Cloud Storage.
Initializing Parquet
Import the Parquet class from the gcp_pal
module:
from gcp_pal import Parquet
Reading Parquet files
To read a Parquet file from Google Cloud Storage, use the read
method:
data = Parquet("bucket/file.parquet").read()
print(data)
# Output: pd.DataFrame({'field1': ['value1'], 'field2': ['value2']})
Writing Parquet files
To write a Pandas DataFrame to a Parquet file in Google Cloud Storage, use the write
method:
df = pd.DataFrame({
"field1": ["value1"],
"field2": ["value2"]
})
Parquet("bucket/file.parquet").write(df)
# Output: Parquet file "file.parquet" created in "bucket"
Partitioning can be specified via the partition_cols
parameter:
Parquet("bucket/file.parquet").write(df, partition_cols=["field1"])
# Output: Parquet file "file.parquet" created in "bucket" partitioned by "field1"
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file gcp_pal-1.0.41.tar.gz
.
File metadata
- Download URL: gcp_pal-1.0.41.tar.gz
- Upload date:
- Size: 68.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 506bd9d13e700c9c4bce763a1fab059c698a8601e874acafed83710f7fb030cc |
|
MD5 | 3d3cfa4548c1a6d9526566569654f9a0 |
|
BLAKE2b-256 | 64ef4b8d060f63472d7753589702e35194746bf1d1776c1d90259e626dca9778 |
File details
Details for the file gcp_pal-1.0.41-py3-none-any.whl
.
File metadata
- Download URL: gcp_pal-1.0.41-py3-none-any.whl
- Upload date:
- Size: 72.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b74d698afd159f572f17015fc61557cec5dd534acd2f9d0464c15f798d098abb |
|
MD5 | b5d0604d3ccea89de8d6a96198581ac3 |
|
BLAKE2b-256 | a2f817924018e3cdb9044889fcb1284034b228de166811f72dee679f2f247546 |