Skip to main content

Solution for DS Team

Project description

pipelinesds

Pipelinesds is a library that includes functions used in Kubeflow pipelines such as:

vertex_pipeline.py

get_data_from_bq()

Returns data from BigQuery as a DataFrame.

  • Parameters:

    • bq_client: BigQuery client.
    • bq_storage_client: BigQuery Storage client.
    • table: Name of the table/view to get data from.
    • where_clause: Optional SQL WHERE clause to filter data.
  • Returns:

    • pd.DataFrame: Data from the view/table.

call_procedure_and_get_data_from_bq()

Calls a stored procedure in BigQuery and returns the results as a DataFrame.

  • Parameters:

    • bq_client: BigQuery client.
    • procedure_name: Name of the stored procedure to call.
    • parameters: Optional list of parameters to pass to the procedure. If no parameters are provided, an empty list is used.
  • Returns:

    • pd.DataFrame: The result of the procedure call as a DataFrame.

delete_old_data()

Deletes old data from a BigQuery table.

  • Parameters:
    • bq_client: BigQuery client.
    • table: Name of the table/view to delete data from.
    • where_clause: SQL WHERE clause to filter data for deletion.

write_dataframe_to_bq()

Writes a DataFrame to a BigQuery table.

  • Parameters:
    • bq_client: BigQuery client.
    • df: DataFrame to write.
    • table_id: Table in BigQuery to write the DataFrame.
    • write_disposition: Type of write operation ('WRITE_APPEND', 'WRITE_TRUNCATE', or 'WRITE_EMPTY').
    • job_config: Configuration for the load job.

read_gcs_file()

Reads a file from a specific path on Google Cloud Storage.

  • Parameters:

    • gcs_client: Google Cloud Storage client.
    • bucket_name: Name of the bucket on GCS where the file is stored.
    • destination_blob_name: Path in the bucket to read the file.
  • Returns:

    • object: The object read from the file.

save_gcs_file()

Saves content to a specific path on Google Cloud Storage.

  • Parameters:
    • gcs_client: Google Cloud Storage client.
    • bucket_name: Name of the bucket on GCS where the file will be saved.
    • destination_blob_name: Path in the bucket to save the file.
    • content: The content to be saved.
    • content_type: The MIME type of the content (e.g., 'text/html' or 'application/json').

monitoring.py

mapping()

Creates a column mapping from a configuration file.

  • Parameters:

    • mapping_file: Dictionary containing mapping configuration with possible keys:
      • numerical_features
      • categorical_features
      • datetime
      • id
  • Returns:

    • ColumnMapping: Evidently ColumnMapping object with configured mappings.

test_data()

Tests data for issues using a test suite.

  • Parameters:

    • current_data: Current data to test.
    • reference_data: Reference data.
    • config_file: Tests configuration file.
    • stage: Stage of the pipeline ('test_input' or 'test_output').
  • Returns:

    • pd.DataFrame: Test results.

check_data_drift()

Checks data for drift.

  • Parameters:

    • current_data: Current data to check.
    • reference_data: Reference data.
    • config_file: Tests configuration file.
  • Returns:

    • pd.DataFrame: Test results.

send_email_with_table()

Sends an email with an HTML table.

  • Parameters:
    • credentials_frame: DataFrame with credentials.
    • subject: Subject of the email.
    • html_table: Data to send in the email.
    • receiver_email: Email address to send the email to.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pipelinesds-0.0.6.tar.gz (5.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pipelinesds-0.0.6-py3-none-any.whl (5.5 kB view details)

Uploaded Python 3

File details

Details for the file pipelinesds-0.0.6.tar.gz.

File metadata

  • Download URL: pipelinesds-0.0.6.tar.gz
  • Upload date:
  • Size: 5.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.13.0

File hashes

Hashes for pipelinesds-0.0.6.tar.gz
Algorithm Hash digest
SHA256 8fe903a4f4f6a55c095f4dc9b9470d1436c4bb84fa9a2c24fcb66c03ac9ddebb
MD5 c24bcbf50e43e6f572cd40d3efb9e430
BLAKE2b-256 0ac99810c29c1579c614af48ace3122f7030c9a9ff35f3a0e30ef026eaa9cb97

See more details on using hashes here.

File details

Details for the file pipelinesds-0.0.6-py3-none-any.whl.

File metadata

  • Download URL: pipelinesds-0.0.6-py3-none-any.whl
  • Upload date:
  • Size: 5.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.13.0

File hashes

Hashes for pipelinesds-0.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 565371fcc2238b84143247669d52bb9a2b670d840ac96d24b9aac97f6aa7ae46
MD5 9c2fdea3e68aef0fcf22895d320626a2
BLAKE2b-256 275daf4b0ab7bb642254c1fc663627c8618e166d8bb0357ac4e688d37d967a56

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page