Skip to main content

Solution for DS Team

Project description

pipelinesds

Pipelinesds is a library that includes functions used in Kubeflow pipelines such as:

pipeliner.py

get_results_from_bq()

Returns SQL results from BigQuery as a DataFrame.

  • Parameters:

    • bq_client: BigQuery client.
    • bq_storage_client: BigQuery Storage client.
    • table: Name of the table/view to get data from.
    • where_clause: Optional SQL WHERE clause to filter data.
  • Returns:

    • pd.DataFrame: Data from the view/table.

delete_old_data()

Deletes old data from a BigQuery table.

  • Parameters:
    • bq_client: BigQuery client.
    • table: Name of the table/view to delete data from.
    • where_clause: SQL WHERE clause to filter data for deletion.

write_dataframe_to_bq()

Writes a DataFrame to a BigQuery table.

  • Parameters:
    • bq_client: BigQuery client.
    • df: DataFrame to write.
    • table_id: Table in BigQuery to write the DataFrame.
    • write_disposition: Type of write operation ('WRITE_APPEND', 'WRITE_TRUNCATE', or 'WRITE_EMPTY').
    • job_config: Configuration for the load job.

read_gcs_file()

Reads a file from a specific path on Google Cloud Storage.

  • Parameters:

    • gcs_client: Google Cloud Storage client.
    • bucket_name: Name of the bucket on GCS where the file is stored.
    • destination_blob_name: Path in the bucket to read the file.
  • Returns:

    • object: The object read from the file.

save_gcs_file()

Saves content to a specific path on Google Cloud Storage.

  • Parameters:
    • gcs_client: Google Cloud Storage client.
    • bucket_name: Name of the bucket on GCS where the file will be saved.
    • destination_blob_name: Path in the bucket to save the file.
    • content: The content to be saved.
    • content_type: The MIME type of the content (e.g., 'text/html' or 'application/json').

tester.py

test_data()

Tests data for issues using a test suite.

  • Parameters:

    • current_data: Current data to test.
    • reference_data: Reference data.
    • config_file: Tests configuration file.
    • stage: Stage of the pipeline ('test_input' or 'test_output').
  • Returns:

    • pd.DataFrame: Test results.

check_data_drift()

Checks data for drift.

  • Parameters:

    • current_data: Current data to check.
    • reference_data: Reference data.
    • config_file: Tests configuration file.
  • Returns:

    • pd.DataFrame: Test results.

send_email_with_table()

Sends an email with an HTML table.

  • Parameters:
    • credentials_frame: DataFrame with credentials.
    • subject: Subject of the email.
    • html_table: Data to send in the email.
    • receiver_email: Email address to send the email to.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pipelinesds-0.0.2.tar.gz (5.0 kB view details)

Uploaded Source

Built Distribution

pipelinesds-0.0.2-py3-none-any.whl (4.9 kB view details)

Uploaded Python 3

File details

Details for the file pipelinesds-0.0.2.tar.gz.

File metadata

  • Download URL: pipelinesds-0.0.2.tar.gz
  • Upload date:
  • Size: 5.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.13.0

File hashes

Hashes for pipelinesds-0.0.2.tar.gz
Algorithm Hash digest
SHA256 0ce637a351946825cd35abfcbe4302288fb021b0d4309b0dbd2581f63c6cd197
MD5 39d2bda9956b782465d04464fbe66960
BLAKE2b-256 05fcded810c3fcf8301bc78b2605ba9b75cd80c0e60618d2ad61c2d8b9ada3c3

See more details on using hashes here.

File details

Details for the file pipelinesds-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: pipelinesds-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 4.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.13.0

File hashes

Hashes for pipelinesds-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 01d75cbe400617c45ff1e4537a9fa7040735e4e45399947b14b2962fbdcbc0c8
MD5 3df89218f1ec4c32dbce93bf29efe88d
BLAKE2b-256 f853bfb20d2755af1b663aa6623ff064471654df5be72c2011e8c4a6b333507c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page