A recipe for every data baker
Project description
DataRecipe
Table of Contents
Overview
This toolkit provides a variety of Python functions to facilitate common data manipulation, data import/export, and database operations.
Functions
General Features
send_email
Sends an email using SMTP with SSL/TLS options, supporting attachments if provided.
- Parameters:
subject
: Email subject as a string.body
: Main content of the email.send_email_address
: Sender's email address.send_email_password
: Sender's email password for SMTP authentication.receive_email_address
: Recipient's email address.attachment_path
: Directory path where attachments are stored (optional).attachment_list
: List of filenames to be attached (optional).smtp_address
: SMTP server address (default: 'smtp.feishu.cn').smtp_port
: SMTP server port (default: 465).
Example with Attachments:
send_email(
"Meeting Documents",
"Please see attached documents for the upcoming meeting.",
"sender@example.com",
"password123",
"receiver@example.com",
attachment_path="/path/to/documents",
attachment_list=["agenda.pdf", "minutes.docx"]
)
Data Validation and Cleaning
check_empty
Checks for empty entries in specified DataFrame columns.
- Parameters:
df
: DataFrame to check.columns
: Columns to check for missing values.output_cols
: Columns to include in the output.
Example:
empty_data = check_empty(df, columns=["name", "email"])
clean_dataframe
Cleans DataFrame by replacing infinite values with NaN.
- Parameters:
df
: DataFrame to clean.
Example:
clean_dataframe(df)
Data Import/Export
local_to_df
Converts files from a local directory to a pandas DataFrame.
- Parameters:
path
: Directory path to search for files.partial_file_name
: File name pattern to match.skip_rows
: Number of rows to skip at the start of each file.keep_file_name
: If True, adds a column with the file name.sheet_num
: For Excel files, specifies the sheet number to read.encoding
: Character encoding of the files.
Example with CSV files:
df = local_to_df("./data", "sample", keep_file_name=True)
Example with Excel files:
df = local_to_df("./data", "report", sheet_num=2, encoding='utf-8')
df_to_xlsx
Saves a DataFrame to an Excel file.
- Parameters:
df
: DataFrame to save.directory_path
: Path to directory where the file will be saved.file_name
: Name of the output file.
Example:
df_to_xlsx(df, "./output", "output_data")
df_to_csv
Saves a DataFrame to a CSV file.
- Parameters:
df
: DataFrame to save.directory_path
: Path to directory where the file will be saved.file_name
: Name of the output file.
Example:
df_to_csv(df, "./output", "output_data")
Database Operations
update
Updates records in a database table based on conditions.
- Parameters:
raw_df
: DataFrame containing new data to update.database
: Database name.table
: Table name.yaml_file_name
: YAML file name with DB configuration.clause
: SQL clause for record deletion.date_col
: Column name containing date data.custom_path
: Path to directory containing the YAML file.
Example:
update(df, "test_db", "user_data", clause="user_id > 10")
sql_query
Executes a SELECT SQL query and returns a DataFrame.
- Parameters:
database
: Database name.sql
: SQL SELECT statement.yaml_file_name
: YAML file name with DB configuration.custom_path
: Optional path to directory containing the YAML file.
Example:
result_df = sql_query("test_db", "SELECT * FROM users")
Contact Information
For any questions or suggestions regarding the toolkit, please contact us at:
- Email: HanfanC@outlook.com
- GitHub: DataRecipe GitHub Repository
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for datarecipe-2.0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a772d1646e23a575dccdd0bb38937e2d512f314cea6e27df5b8abbc87e71d414 |
|
MD5 | 4e3beadfef34200420789eb11ba833b7 |
|
BLAKE2b-256 | 2ee653e425d8a862f267b40b8f5122e77e69d65a2f2040096e99e72b06dfac19 |