Skip to main content

Solution for DS Team

Project description

pipelinesds

Pipelinesds is a library that includes functions used in Kubeflow pipelines such as:

vertex_pipeline.py

get_data_from_bq()

Returns data from BigQuery as a DataFrame.

  • Parameters:

    • bq_client: BigQuery client.
    • bq_storage_client: BigQuery Storage client.
    • table: Name of the table/view to get data from.
    • where_clause: Optional SQL WHERE clause to filter data.
  • Returns:

    • pd.DataFrame: Data from the view/table.

call_procedure_and_get_data_from_bq()

Calls a stored procedure in BigQuery and returns the results as a DataFrame.

  • Parameters:

    • bq_client: BigQuery client.
    • procedure_name: Name of the stored procedure to call.
    • parameters: Optional list of parameters to pass to the procedure. If no parameters are provided, an empty list is used.
  • Returns:

    • pd.DataFrame: The result of the procedure call as a DataFrame.

delete_old_data()

Deletes old data from a BigQuery table.

  • Parameters:
    • bq_client: BigQuery client.
    • table: Name of the table/view to delete data from.
    • where_clause: SQL WHERE clause to filter data for deletion.

write_dataframe_to_bq()

Writes a DataFrame to a BigQuery table.

  • Parameters:
    • bq_client: BigQuery client.
    • df: DataFrame to write.
    • table_id: Table in BigQuery to write the DataFrame.
    • write_disposition: Type of write operation ('WRITE_APPEND', 'WRITE_TRUNCATE', or 'WRITE_EMPTY').
    • job_config: Configuration for the load job.

read_gcs_file()

Reads a file from a specific path on Google Cloud Storage.

  • Parameters:

    • gcs_client: Google Cloud Storage client.
    • bucket_name: Name of the bucket on GCS where the file is stored.
    • destination_blob_name: Path in the bucket to read the file.
  • Returns:

    • object: The object read from the file.

save_gcs_file()

Saves content to a specific path on Google Cloud Storage.

  • Parameters:
    • gcs_client: Google Cloud Storage client.
    • bucket_name: Name of the bucket on GCS where the file will be saved.
    • destination_blob_name: Path in the bucket to save the file.
    • content: The content to be saved.
    • content_type: The MIME type of the content (e.g., 'text/html' or 'application/json').

monitoring.py

mapping()

Creates a column mapping from a configuration file.

  • Parameters:

    • mapping_file: Dictionary containing mapping configuration with possible keys:
      • numerical_features
      • categorical_features
      • datetime
      • id
  • Returns:

    • ColumnMapping: Evidently ColumnMapping object with configured mappings.

test_data()

Tests data for issues using a test suite.

  • Parameters:

    • current_data: Current data to test.
    • reference_data: Reference data.
    • config_file: Tests configuration file.
    • stage: Stage of the pipeline ('test_input' or 'test_output').
  • Returns:

    • pd.DataFrame: Test results.

check_data_drift()

Checks data for drift.

  • Parameters:

    • current_data: Current data to check.
    • reference_data: Reference data.
    • config_file: Tests configuration file.
  • Returns:

    • pd.DataFrame: Test results.

send_email_with_table()

Sends an email with an HTML table.

  • Parameters:
    • credentials_frame: DataFrame with credentials.
    • subject: Subject of the email.
    • html_table: Data to send in the email.
    • receiver_email: Email address to send the email to.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pipelinesds-0.0.8.tar.gz (5.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pipelinesds-0.0.8-py3-none-any.whl (5.7 kB view details)

Uploaded Python 3

File details

Details for the file pipelinesds-0.0.8.tar.gz.

File metadata

  • Download URL: pipelinesds-0.0.8.tar.gz
  • Upload date:
  • Size: 5.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.13.1

File hashes

Hashes for pipelinesds-0.0.8.tar.gz
Algorithm Hash digest
SHA256 0c480e73efbd9a4b54531d971ad23b653abd553fa4ad4e78458113a789793d1c
MD5 827b21b340069530ce76e61277979f54
BLAKE2b-256 13b0cd9ffc22b9f278e6b42c243de9aad1fba1950435f6d296ba42925f3fdee6

See more details on using hashes here.

File details

Details for the file pipelinesds-0.0.8-py3-none-any.whl.

File metadata

  • Download URL: pipelinesds-0.0.8-py3-none-any.whl
  • Upload date:
  • Size: 5.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.13.1

File hashes

Hashes for pipelinesds-0.0.8-py3-none-any.whl
Algorithm Hash digest
SHA256 aab2ab8a943df552c3b991fb8d325d1d0e2148e395bc18c935341f0f85debae4
MD5 c3611dbf6201397a07247ce9af0513a5
BLAKE2b-256 a8b81d36533467726904dda222b7991e01a31db3f4dc0a1b826c4c15e917e0fd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page