Skip to main content

Solution for DS Team

Project description

pipelinesds

Pipelinesds is a library that includes functions used in Kubeflow pipelines such as:

vertex_pipeline.py

get_data_from_bq()

Returns data from BigQuery as a DataFrame.

  • Parameters:

    • bq_client: BigQuery client.
    • bq_storage_client: BigQuery Storage client.
    • table: Name of the table/view to get data from.
    • where_clause: Optional SQL WHERE clause to filter data.
  • Returns:

    • pd.DataFrame: Data from the view/table.

call_procedure_and_get_data_from_bq()

Calls a stored procedure in BigQuery and returns the results as a DataFrame.

  • Parameters:

    • bq_client: BigQuery client.
    • procedure_name: Name of the stored procedure to call.
    • parameters: Optional list of parameters to pass to the procedure. If no parameters are provided, an empty list is used.
  • Returns:

    • pd.DataFrame: The result of the procedure call as a DataFrame.

delete_old_data()

Deletes old data from a BigQuery table.

  • Parameters:
    • bq_client: BigQuery client.
    • table: Name of the table/view to delete data from.
    • where_clause: SQL WHERE clause to filter data for deletion.

write_dataframe_to_bq()

Writes a DataFrame to a BigQuery table.

  • Parameters:
    • bq_client: BigQuery client.
    • df: DataFrame to write.
    • table_id: Table in BigQuery to write the DataFrame.
    • write_disposition: Type of write operation ('WRITE_APPEND', 'WRITE_TRUNCATE', or 'WRITE_EMPTY').
    • job_config: Configuration for the load job.

read_gcs_file()

Reads a file from a specific path on Google Cloud Storage.

  • Parameters:

    • gcs_client: Google Cloud Storage client.
    • bucket_name: Name of the bucket on GCS where the file is stored.
    • destination_blob_name: Path in the bucket to read the file.
  • Returns:

    • object: The object read from the file.

save_gcs_file()

Saves content to a specific path on Google Cloud Storage.

  • Parameters:
    • gcs_client: Google Cloud Storage client.
    • bucket_name: Name of the bucket on GCS where the file will be saved.
    • destination_blob_name: Path in the bucket to save the file.
    • content: The content to be saved.
    • content_type: The MIME type of the content (e.g., 'text/html' or 'application/json').

monitoring.py

mapping()

Creates a column mapping from a configuration file.

  • Parameters:

    • mapping_file: Dictionary containing mapping configuration with possible keys:
      • numerical_features
      • categorical_features
      • datetime
      • id
  • Returns:

    • ColumnMapping: Evidently ColumnMapping object with configured mappings.

test_data()

Tests data for issues using a test suite.

  • Parameters:

    • current_data: Current data to test.
    • reference_data: Reference data.
    • config_file: Tests configuration file.
    • stage: Stage of the pipeline ('test_input' or 'test_output').
  • Returns:

    • pd.DataFrame: Test results.

check_data_drift()

Checks data for drift.

  • Parameters:

    • current_data: Current data to check.
    • reference_data: Reference data.
    • config_file: Tests configuration file.
  • Returns:

    • pd.DataFrame: Test results.

send_email_with_table()

Sends an email with an HTML table.

  • Parameters:
    • credentials_frame: DataFrame with credentials.
    • subject: Subject of the email.
    • html_table: Data to send in the email.
    • receiver_email: Email address to send the email to.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pipelinesds-0.0.7.tar.gz (5.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pipelinesds-0.0.7-py3-none-any.whl (5.6 kB view details)

Uploaded Python 3

File details

Details for the file pipelinesds-0.0.7.tar.gz.

File metadata

  • Download URL: pipelinesds-0.0.7.tar.gz
  • Upload date:
  • Size: 5.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.13.1

File hashes

Hashes for pipelinesds-0.0.7.tar.gz
Algorithm Hash digest
SHA256 da8a0cf4a2f12016a07c3117f54fd7869d8985775078f0cd1d5b78b897a3ca15
MD5 0b1500119b7baaa81bf77a70f48503f0
BLAKE2b-256 7ab0ee37de18349b6eeb4f6ddd9b9c839fbc0e0914c2c63dc816c20a1a0df565

See more details on using hashes here.

File details

Details for the file pipelinesds-0.0.7-py3-none-any.whl.

File metadata

  • Download URL: pipelinesds-0.0.7-py3-none-any.whl
  • Upload date:
  • Size: 5.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.13.1

File hashes

Hashes for pipelinesds-0.0.7-py3-none-any.whl
Algorithm Hash digest
SHA256 731d324107732884d525649342bd8c27c8161d6f48902a9feebc3a1b9372859f
MD5 173ff1533c46cb87b664f2a934e02610
BLAKE2b-256 2fab2b8a6865b3ce8e01c7a89294350012947181b186c2271dbdb14c1b32a2f3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page