Solution for DS Team
Project description
pipelinesds
Pipelinesds is a library that includes functions used in Kubeflow pipelines such as:
pipeliner.py
get_results_from_bq()
Returns SQL results from BigQuery as a DataFrame.
-
Parameters:
bq_client: BigQuery client.bq_storage_client: BigQuery Storage client.table: Name of the table/view to get data from.where_clause: Optional SQL WHERE clause to filter data.
-
Returns:
pd.DataFrame: Data from the view/table.
delete_old_data()
Deletes old data from a BigQuery table.
- Parameters:
bq_client: BigQuery client.table: Name of the table/view to delete data from.where_clause: SQL WHERE clause to filter data for deletion.
write_dataframe_to_bq()
Writes a DataFrame to a BigQuery table.
- Parameters:
bq_client: BigQuery client.df: DataFrame to write.table_id: Table in BigQuery to write the DataFrame.write_disposition: Type of write operation ('WRITE_APPEND', 'WRITE_TRUNCATE', or 'WRITE_EMPTY').job_config: Configuration for the load job.
read_gcs_file()
Reads a file from a specific path on Google Cloud Storage.
-
Parameters:
gcs_client: Google Cloud Storage client.bucket_name: Name of the bucket on GCS where the file is stored.destination_blob_name: Path in the bucket to read the file.
-
Returns:
object: The object read from the file.
save_gcs_file()
Saves content to a specific path on Google Cloud Storage.
- Parameters:
gcs_client: Google Cloud Storage client.bucket_name: Name of the bucket on GCS where the file will be saved.destination_blob_name: Path in the bucket to save the file.content: The content to be saved.content_type: The MIME type of the content (e.g., 'text/html' or 'application/json').
tester.py
test_data()
Tests data for issues using a test suite.
-
Parameters:
current_data: Current data to test.reference_data: Reference data.config_file: Tests configuration file.stage: Stage of the pipeline ('test_input' or 'test_output').
-
Returns:
pd.DataFrame: Test results.
check_data_drift()
Checks data for drift.
-
Parameters:
current_data: Current data to check.reference_data: Reference data.config_file: Tests configuration file.
-
Returns:
pd.DataFrame: Test results.
send_email_with_table()
Sends an email with an HTML table.
- Parameters:
credentials_frame: DataFrame with credentials.subject: Subject of the email.html_table: Data to send in the email.receiver_email: Email address to send the email to.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pipelinesds-0.0.3.tar.gz.
File metadata
- Download URL: pipelinesds-0.0.3.tar.gz
- Upload date:
- Size: 5.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.11.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b648d7e4f929415b04b1cf72c8045ce30a114a9d04d5dc68ddac5f9a0f7ed014
|
|
| MD5 |
47f5a75d44db69ad15cb39eae6517595
|
|
| BLAKE2b-256 |
d45cde818c247ca9655c0f0f4df93b9ec287052ef3543f78f060a44783a067c8
|
File details
Details for the file pipelinesds-0.0.3-py3-none-any.whl.
File metadata
- Download URL: pipelinesds-0.0.3-py3-none-any.whl
- Upload date:
- Size: 5.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.11.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e91847004ae83221659746b1069ecfef36869f63e96f2c48a36216c3f7c2795a
|
|
| MD5 |
10a041582b18cabcc20930b47a7ac26e
|
|
| BLAKE2b-256 |
d3617234a6b5f99fa1a7001562d5372ce1a968182baf388f816d6cc88fb73cc9
|