Skip to main content

Add your description here

Project description

flowerpower_io Library Documentation

1. Introduction

flowerpower_io is a robust and extensible Python library designed for efficient data input/output (I/O) operations. It provides a unified framework for interacting with various file formats and database systems, abstracting away the complexities of data handling. The library focuses on seamless data transfer to and from popular data structures like Pandas DataFrames, Polars Dataframes, and PyArrow Tables.

2. Core Class Hierarchy and Functionalities

The library's core is built around a clear hierarchy of base classes defined in src/flowerpower_io/base.py, which are then extended by specialized classes in the loader/ and saver/ modules.

  • BaseFileIO: This is the foundational class for all file-based I/O operations. It handles common functionalities like path resolution, filesystem initialization, and storage option management across various storage backends (e.g., local, S3, GCS, Azure, GitHub, GitLab).

    • BaseFileReader (inherits from BaseFileIO): Provides the fundamental methods for reading data from files. It supports various output formats (Pandas, Polars, PyArrow, DuckDB, DataFusion) and features like batch processing, file path inclusion, and concatenation of multiple files.
    • BaseDatasetReader (inherits from BaseFileReader): Extends BaseFileReader for handling dataset-specific reading operations, particularly for partitioned datasets. It supports PyArrow Dataset and Pydala Dataset conversions, offering more advanced dataset-level functionalities.
    • BaseFileWriter (inherits from BaseFileIO): Defines the core logic for writing data to files. It manages output basename, concatenation, uniqueness constraints, and various write modes (append, overwrite, delete_matching, error_if_exists).
    • BaseDatasetWriter (inherits from BaseFileWriter): Specializes BaseFileWriter for writing data as datasets, supporting partitioning, compression, and fine-grained control over file and row group sizes. It also integrates with Pydala for advanced dataset writing.
  • BaseDatabaseIO: This is the foundational class for all database I/O operations. It manages database connection strings, credentials, and provides methods for connecting to various SQL and NoSQL databases.

    • BaseDatabaseReader (inherits from BaseDatabaseIO): Provides methods for reading data from relational and non-relational databases into various DataFrame formats (Polars, Pandas, PyArrow) and integrates with DuckDB and DataFusion for SQL query execution.
    • BaseDatabaseWriter (inherits from BaseDatabaseIO): Defines the core logic for writing data to databases, supporting different write modes (append, replace, fail) and handling data conversion for various database types.

3. Supported Data Formats and Storage Backends

flowerpower_io supports a wide array of data formats and integrates with various storage solutions, categorized by their respective loader and saver classes:

File-based I/O

  • CSV: CSVFileReader, CSVDatasetReader, CSVFileWriter, CSVDatasetWriter
  • Parquet: ParquetFileReader, ParquetDatasetReader, ParquetFileWriter, ParquetDatasetWriter, PydalaDatasetReader, PydalaDatasetWriter
  • JSON: JsonFileReader, JsonDatasetReader, JsonFileWriter, JsonDatasetWriter
  • DeltaTable: DeltaTableReader, DeltaTableWriter
  • MQTT (Loader Only): PayloadReader (for consuming MQTT payloads)

Supported Storage Backends (via fsspec_utils):

  • Local filesystem
  • AWS S3
  • Google Cloud Storage (GCS)
  • Azure Blob Storage
  • GitHub
  • GitLab

Database I/O

  • SQLite: SQLiteReader, SQLiteWriter
  • DuckDB: DuckDBReader, DuckDBWriter
  • PostgreSQL: PostgreSQLReader, PostgreSQLWriter
  • MySQL: MySQLReader, MySQLWriter
  • Microsoft SQL Server (MSSQL): MSSQLReader, MSSQLWriter
  • Oracle: OracleDBReader, OracleDBWriter

4. Metadata Handling

The src/flowerpower_io/metadata.py module is crucial for collecting and managing detailed metadata during I/O operations. This includes:

  • Schema information
  • Row and column counts
  • File paths
  • Timestamps of operations

5. Module Structure and Dependencies

The library's design emphasizes separation of concerns, ensuring modularity and extensibility:

  • src/flowerpower_io/base.py: Defines the fundamental abstract classes for I/O operations.
  • src/flowerpower_io/metadata.py: Handles metadata collection and management.
  • src/flowerpower_io/loader/: Contains specific implementations for reading data from various sources and formats.
  • src/flowerpower_io/saver/: Contains specific implementations for writing data to various destinations and formats.
  • fsspec_utils: Orchestrates file system interactions, providing a unified interface for different storage backends.

6. Usage Examples

The following examples illustrate common usage patterns for flowerpower_io:

import pandas as pd
import polars as pl
from flowerpower_io.loader import CSVFileReader, ParquetDatasetReader, SQLiteReader
from flowerpower_io.saver import CSVFileWriter, ParquetDatasetWriter, PostgreSQLWriter

# Example 1: Reading a CSV file into a Pandas DataFrame
csv_loader = CSVFileReader(path="path/to/your/data.csv")
df_pandas = csv_loader.to_pandas()
print("CSV data (Pandas):")
print(df_pandas.head())

# Example 2: Writing a Polars DataFrame to a CSV file
data_polars = pl.DataFrame({
    "col1": [1, 2, 3],
    "col2": ["A", "B", "C"]
})
csv_writer = CSVFileWriter(path="output/new_data.csv")
csv_writer.write(data=data_polars)
print("\nPolars DataFrame written to output/new_data.csv")

# Example 3: Reading a Parquet dataset into a Polars LazyFrame
# Assuming 'partition_col' is a partition column in your dataset
parquet_dataset_loader = ParquetDatasetReader(
    path="s3://your-bucket/your-parquet-dataset/",
    format="parquet",
    storage_options={"key": "YOUR_AWS_ACCESS_KEY", "secret": "YOUR_AWS_SECRET_KEY"},
    partitioning="hive" # Or specify list of column names, e.g., ["year", "month"]
)
lf_polars = parquet_dataset_loader.to_polars(lazy=True)
print("\nParquet dataset loaded as Polars LazyFrame (first 5 rows after collection):")
print(lf_polars.limit(5).collect())

# Example 4: Writing a Pandas DataFrame to a Parquet dataset with partitioning
df_to_save = pd.DataFrame({
    "id": [1, 2, 3, 4],
    "value": ["foo", "bar", "baz", "qux"],
    "year": [2023, 2023, 2024, 2024]
})
parquet_dataset_writer = ParquetDatasetWriter(
    path="output/partitioned_data/",
    format="parquet",
    partition_by="year",
    mode="overwrite"
)
parquet_dataset_writer.write(data=df_to_save)
print("\nPandas DataFrame written to partitioned Parquet dataset in output/partitioned_data/")

# Example 5: Reading from a SQLite database into a Polars DataFrame
# Ensure 'your_database.db' and 'your_table_name' exist
sqlite_reader = SQLiteReader(table_name="your_table_name", path="path/to/your/database.db")
db_data_polars = sqlite_reader.to_polars()
print("\nData from SQLite (Polars):")
print(db_data_polars.head())

# Example 6: Writing a PyArrow Table to a PostgreSQL database
import pyarrow as pa
table_to_write = pa.table({
    "event_id": [101, 102],
    "event_name": ["login", "logout"]
})
postgres_writer = PostgreSQLWriter(
    table_name="events",
    server="localhost",
    port=5432,
    username="user",
    password="password",
    database="mydatabase",
    mode="append" # "append", "replace", or "fail"
)
postgres_writer.write(data=table_to_write)
print("\nPyArrow Table written to PostgreSQL database 'events' table.")

# Example 7: Reading JSON file with metadata
json_loader = JsonFileReader(path="path/to/your/data.json")
df_json, metadata_json = json_loader.to_pandas(metadata=True)
print("\nJSON data (Pandas) with metadata:")
print(df_json.head())
print("Metadata:", metadata_json)

# Example 8: Writing data as a Delta Table
delta_writer = DeltaTableWriter(path="output/my_delta_table/", mode="overwrite")
delta_writer.write(data=pd.DataFrame({"colA": [1, 2], "colB": ["X", "Y"]}))
print("\nPandas DataFrame written to Delta Lake table.")

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

flowerpower_io-0.1.3.tar.gz (1.5 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

flowerpower_io-0.1.3-py3-none-any.whl (35.3 kB view details)

Uploaded Python 3

File details

Details for the file flowerpower_io-0.1.3.tar.gz.

File metadata

  • Download URL: flowerpower_io-0.1.3.tar.gz
  • Upload date:
  • Size: 1.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.2

File hashes

Hashes for flowerpower_io-0.1.3.tar.gz
Algorithm Hash digest
SHA256 c6b68c7528f0e1d1690a7cf3fa33de477779618aa7d26793369eb6c0d1258cb2
MD5 6471f03901e8da9651bb3b7704e260dc
BLAKE2b-256 547c1fff8d34b5008c1dae95ec394f1231ee446481617e9bd1f9e9bcd8fd25a3

See more details on using hashes here.

File details

Details for the file flowerpower_io-0.1.3-py3-none-any.whl.

File metadata

File hashes

Hashes for flowerpower_io-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 d4a81cc6341087f01a4c86a8b84954fc2fdd9d3b556603ec539191ed50f61fd0
MD5 9449b526253c69fcb01a05afa656b0ad
BLAKE2b-256 32551ef346b8059e279a6c33b61360c1b14d003a366cb5d9e01a38e5cfe6db6c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page