Skip to main content

Aims to simplify and help with commonly used functions in the data processing areas.

Project description

Binary Rain Helper Toolkit: Data Processing

binaryrain_helper_data_processing is a python package that aims to simplify and help with common functions data processing areas. It builds on top of the pandas library and provides additional functionality to make data processing easier, reduces boilerplate code and provides clear error messages.

Supported File Formats

  • PARQUET: For efficient columnar storage
  • CSV: For common tabular data
  • JSON: For structured data exchange
  • DICT: For Python dictionary data

Key Functions

  • create_dataframe() simplifies creating pandas DataFrames from various formats:

      from binaryrain_helper_data_processing import FileFormat, create_dataframe
    
      # Create from CSV bytes
      df = create_dataframe(csv_bytes, FileFormat.CSV)
    
      # Create with custom options
      df = create_dataframe(parquet_bytes, FileFormat.PARQUET,
      file_format_options={'engine': 'pyarrow'})
    
  • convert_dataframe_to_type(): handles converting DataFrames to different formats:

      from binaryrain_helper_data_processing import FileFormat, convert_dataframe_to_type
    
      # ....df is a pandas DataFrame
    
      # Convert to CSV bytes
      csv_bytes = convert_dataframe_to_type(df, FileFormat.CSV)
    
      # Convert with custom options
      parquet_bytes = convert_dataframe_to_type(df, FileFormat.PARQUET,
      file_format_options={'engine': 'pyarrow'})
    
  • merge_dataframes(): provides a simple way to merge multiple DataFrames:

      from binaryrain_helper_data_processing import merge_dataframes
    
      # ....df1 and df2 are pandas DataFrames
    
      # Merge DataFrames
      merged_df = merge_dataframes(df1, df2, sort=True)
    
  • convert_todatetime(): automatically detects and converts date columns:

    Supports common date formats:

    • %d.%m.%Y (e.g., "31.12.2023")
    • %Y-%m-%d (e.g., "2023-12-31")
    • %Y-%m-%d %H:%M:%S (e.g., "2023-12-31 23:59:59")
    • %Y-%m-%dT%H:%M:%S (ISO format)
        from binaryrain_helper_data_processing import convert_todatetime
    
        # ....df is a pandas DataFrame
    
        # Convert date columns
        df = convert_todatetime(df)
    
  • format_datetime_columns(): formats specific datetime columns:

        from binaryrain_helper_data_processing import format_datetime_columns
    
        # ....df is a pandas DataFrame
    
        # Format date columns directly
        df = format_datetime_columns(df, datetime_columns=['date_column1', 'date_column2'], datetime_format='%Y-%m-%d')
    
        # Format date columns to in string columns
        df = format_datetime_columns(df, datetime_columns=['date_column1', 'date_column2'], datetime_format='%Y-%m-%d', datetime_columns=['string_column1', 'string_column2'])
    
  • clean_dataframe(): cleans DataFrames by removing duplicates and missing values:

        from binaryrain_helper_data_processing import clean_dataframe
    
        # ....df is a pandas DataFrame
    
        # Clean DataFrame
        df = clean_dataframe(df)
    
  • remove_empty_values(): filters specific columns:

        from binaryrain_helper_data_processing import remove_empty_values
    
        # ....df is a pandas DataFrame
    
        # Remove empty values
        df = remove_empty_values(df, filter_column'column1')
    
  • format_numeric_values(): handles locale-specific number formatting:

        from binaryrain_helper_data_processing import format_numeric_values
    
        # ....df is a pandas DataFrame
    
        # Convert European number format (1.234,56) to standard format (1,234.56)
        df = format_numeric_values(
              df,
              columns=['price', 'quantity'],
              swap_separators=True,
              old_decimal_separator=',',
              old_thousands_separator='.',
              decimal_separator='.',
              thousands_separator=',',
          )
    

Benefits

  • Consistent interface for different file formats
  • Simplified error handling with clear messages
  • Optional format-specific configurations
  • Built on pandas for robust data processing
  • Type hints for better IDE support

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

binaryrain_helper_data_processing-0.0.8.tar.gz (5.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file binaryrain_helper_data_processing-0.0.8.tar.gz.

File metadata

File hashes

Hashes for binaryrain_helper_data_processing-0.0.8.tar.gz
Algorithm Hash digest
SHA256 b3cb5afeda32a040891d69d4be2202d5fa538b166c42b67deb7f91b517a1766b
MD5 7bb1b4d3c64c1d7cc5ea662293919df0
BLAKE2b-256 ee585c4b2da296ae14bcf6c82367dc4d32b2a3a1c186fa7d859beb755dbf037a

See more details on using hashes here.

File details

Details for the file binaryrain_helper_data_processing-0.0.8-py3-none-any.whl.

File metadata

File hashes

Hashes for binaryrain_helper_data_processing-0.0.8-py3-none-any.whl
Algorithm Hash digest
SHA256 ac28d8e255b83778d6c59ff60c8c988c43a7e16e0f77ec5b11d6ae72808dbed1
MD5 31a1934a47d980529c93f449effc28e7
BLAKE2b-256 d1b52b7a4de53eba8009b014b239f0d379fb2325075e39567831b98003cd7f8d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page