Skip to main content

Aims to simplify and help with commonly used functions in the data processing areas.

Project description

Binary Rain Helper Toolkit: Data Processing

binaryrain_helper_data_processing is a python package that aims to simplify and help with common functions data processing areas. It builds on top of the pandas library and provides additional functionality to make data processing easier, reduces boilerplate code and provides clear error messages.

Supported File Formats

  • PARQUET: For efficient columnar storage
  • CSV: For common tabular data
  • JSON: For structured data exchange
  • DICT: For Python dictionary data

Key Functions

  • create_dataframe() simplifies creating pandas DataFrames from various formats:

      from binaryrain_helper_data_processing import FileFormat, create_dataframe
    
      # Create from CSV bytes
      df = create_dataframe(csv_bytes, FileFormat.CSV)
    
      # Create with custom options
      df = create_dataframe(parquet_bytes, FileFormat.PARQUET,
      file_format_options={'engine': 'pyarrow'})
    
  • convert_dataframe_to_type(): handles converting DataFrames to different formats:

      from binaryrain_helper_data_processing import FileFormat, convert_dataframe_to_type
    
      # ....df is a pandas DataFrame
    
      # Convert to CSV bytes
      csv_bytes = convert_dataframe_to_type(df, FileFormat.CSV)
    
      # Convert with custom options
      parquet_bytes = convert_dataframe_to_type(df, FileFormat.PARQUET,
      file_format_options={'engine': 'pyarrow'})
    
  • combine_dataframes(): provides a simple way to combine multiple DataFrames:

      from binaryrain_helper_data_processing import combine_dataframes
    
      # ....df1 and df2 are pandas DataFrames
    
      # Combine DataFrames
      combined_df = combine_dataframes(df1, df2, sort=True)
    
  • convert_todatetime(): automatically detects and converts date columns:

    Supports common date formats:

    • %d.%m.%Y (e.g., "31.12.2023")
    • %Y-%m-%d (e.g., "2023-12-31")
    • %Y-%m-%d %H:%M:%S (e.g., "2023-12-31 23:59:59")
    • %Y-%m-%dT%H:%M:%S (ISO format)
        from binaryrain_helper_data_processing import convert_todatetime
    
        # ....df is a pandas DataFrame
    
        # Convert date columns
        df = convert_todatetime(df)
    
  • format_datetime_columns(): formats specific datetime columns:

        from binaryrain_helper_data_processing import format_datetime_columns
    
        # ....df is a pandas DataFrame
    
        # Format date columns directly
        df = format_datetime_columns(df, datetime_columns=['date_column1', 'date_column2'], datetime_format='%Y-%m-%d')
    
        # Format date columns to in string columns
        df = format_datetime_columns(df, datetime_columns=['date_column1', 'date_column2'], datetime_format='%Y-%m-%d', datetime_columns=['string_column1', 'string_column2'])
    
  • clean_dataframe(): cleans DataFrames by removing duplicates and missing values:

        from binaryrain_helper_data_processing import clean_dataframe
    
        # ....df is a pandas DataFrame
    
        # Clean DataFrame
        df = clean_dataframe(df)
    
  • remove_empty_values(): filters specific columns:

        from binaryrain_helper_data_processing import remove_empty_values
    
        # ....df is a pandas DataFrame
    
        # Remove empty values
        df = remove_empty_values(df, filter_column'column1')
    
  • format_numeric_values(): handles locale-specific number formatting:

        from binaryrain_helper_data_processing import format_numeric_values
    
        # ....df is a pandas DataFrame
    
        # Convert European number format (1.234,56) to standard format (1,234.56)
        df = format_numeric_values(
              df,
              columns=['price', 'quantity'],
              swap_separators=True,
              old_decimal_separator=',',
              old_thousands_separator='.',
              decimal_separator='.',
              thousands_separator=',',
          )
    

Benefits

  • Consistent interface for different file formats
  • Simplified error handling with clear messages
  • Optional format-specific configurations
  • Built on pandas for robust data processing
  • Type hints for better IDE support

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

binaryrain_helper_data_processing-0.0.10.tar.gz (5.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file binaryrain_helper_data_processing-0.0.10.tar.gz.

File metadata

File hashes

Hashes for binaryrain_helper_data_processing-0.0.10.tar.gz
Algorithm Hash digest
SHA256 a9669272632b80c35ff1dbf7dbc2b781e89f1123a21943e7c01859a7d7a175e9
MD5 bf4342a62fa35826cabf7e7acd0ef9cc
BLAKE2b-256 7d07307307e62e4277a94af1eade0d26b05fc0e305cce441b5187ac880c1ea41

See more details on using hashes here.

File details

Details for the file binaryrain_helper_data_processing-0.0.10-py3-none-any.whl.

File metadata

File hashes

Hashes for binaryrain_helper_data_processing-0.0.10-py3-none-any.whl
Algorithm Hash digest
SHA256 2003b4cd3c715cf4dd984f6e1623a0e222f10aac7831ec439c1fc3350f3cab14
MD5 b01f83a5235f0130e24a142f4ece5757
BLAKE2b-256 697a3b9a9b75a6e90f5d2eb8d40107f10bbc5567ec49589d2c36b1f4fda7cad7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page