Skip to main content

Solution for DS Team

Project description

pipelinesds

Pipelinesds is a library that includes functions used in Kubeflow pipelines such as:

pipeliner.py

get_results_from_bq()

Returns SQL results from BigQuery as a DataFrame.

  • Parameters:

    • bq_client: BigQuery client.
    • bq_storage_client: BigQuery Storage client.
    • table: Name of the table/view to get data from.
    • where_clause: Optional SQL WHERE clause to filter data.
  • Returns:

    • pd.DataFrame: Data from the view/table.

delete_old_data()

Deletes old data from a BigQuery table.

  • Parameters:
    • bq_client: BigQuery client.
    • table: Name of the table/view to delete data from.
    • where_clause: SQL WHERE clause to filter data for deletion.

write_dataframe_to_bq()

Writes a DataFrame to a BigQuery table.

  • Parameters:
    • bq_client: BigQuery client.
    • df: DataFrame to write.
    • table_id: Table in BigQuery to write the DataFrame.
    • write_disposition: Type of write operation ('WRITE_APPEND', 'WRITE_TRUNCATE', or 'WRITE_EMPTY').
    • job_config: Configuration for the load job.

read_gcs_file()

Reads a file from a specific path on Google Cloud Storage.

  • Parameters:

    • gcs_client: Google Cloud Storage client.
    • bucket_name: Name of the bucket on GCS where the file is stored.
    • destination_blob_name: Path in the bucket to read the file.
  • Returns:

    • object: The object read from the file.

save_gcs_file()

Saves content to a specific path on Google Cloud Storage.

  • Parameters:
    • gcs_client: Google Cloud Storage client.
    • bucket_name: Name of the bucket on GCS where the file will be saved.
    • destination_blob_name: Path in the bucket to save the file.
    • content: The content to be saved.
    • content_type: The MIME type of the content (e.g., 'text/html' or 'application/json').

tester.py

test_data()

Tests data for issues using a test suite.

  • Parameters:

    • current_data: Current data to test.
    • reference_data: Reference data.
    • config_file: Tests configuration file.
    • stage: Stage of the pipeline ('test_input' or 'test_output').
  • Returns:

    • pd.DataFrame: Test results.

check_data_drift()

Checks data for drift.

  • Parameters:

    • current_data: Current data to check.
    • reference_data: Reference data.
    • config_file: Tests configuration file.
  • Returns:

    • pd.DataFrame: Test results.

send_email_with_table()

Sends an email with an HTML table.

  • Parameters:
    • credentials_frame: DataFrame with credentials.
    • subject: Subject of the email.
    • html_table: Data to send in the email.
    • receiver_email: Email address to send the email to.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pipelinesds-0.0.3.tar.gz (5.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pipelinesds-0.0.3-py3-none-any.whl (5.1 kB view details)

Uploaded Python 3

File details

Details for the file pipelinesds-0.0.3.tar.gz.

File metadata

  • Download URL: pipelinesds-0.0.3.tar.gz
  • Upload date:
  • Size: 5.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.11.6

File hashes

Hashes for pipelinesds-0.0.3.tar.gz
Algorithm Hash digest
SHA256 b648d7e4f929415b04b1cf72c8045ce30a114a9d04d5dc68ddac5f9a0f7ed014
MD5 47f5a75d44db69ad15cb39eae6517595
BLAKE2b-256 d45cde818c247ca9655c0f0f4df93b9ec287052ef3543f78f060a44783a067c8

See more details on using hashes here.

File details

Details for the file pipelinesds-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: pipelinesds-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 5.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.11.6

File hashes

Hashes for pipelinesds-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 e91847004ae83221659746b1069ecfef36869f63e96f2c48a36216c3f7c2795a
MD5 10a041582b18cabcc20930b47a7ac26e
BLAKE2b-256 d3617234a6b5f99fa1a7001562d5372ce1a968182baf388f816d6cc88fb73cc9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page