Skip to main content

Aims to simplify and help with commonly used functions in the data processing areas.

Project description

Binary Rain Helper Toolkit: Data Processing

binaryrain_helper_data_processing is a python package that aims to simplify and help with common functions data processing areas. It builds on top of the pandas library and provides additional functionality to make data processing easier, reduces boilerplate code and provides clear error messages.

Supported File Formats

  • PARQUET: For efficient columnar storage
  • CSV: For common tabular data
  • JSON: For structured data exchange
  • DICT: For Python dictionary data

Key Functions

  • create_dataframe() simplifies creating pandas DataFrames from various formats:

      from binaryrain_helper_data_processing import FileFormat, create_dataframe
    
      # Create from CSV bytes
      df = create_dataframe(csv_bytes, FileFormat.CSV)
    
      # Create with custom options
      df = create_dataframe(parquet_bytes, FileFormat.PARQUET,
      file_format_options={'engine': 'pyarrow'})
    
  • convert_dataframe_to_type(): handles converting DataFrames to different formats:

      from binaryrain_helper_data_processing import FileFormat, convert_dataframe_to_type
    
      # ....df is a pandas DataFrame
    
      # Convert to CSV bytes
      csv_bytes = convert_dataframe_to_type(df, FileFormat.CSV)
    
      # Convert with custom options
      parquet_bytes = convert_dataframe_to_type(df, FileFormat.PARQUET,
      file_format_options={'engine': 'pyarrow'})
    
  • merge_dataframes(): provides a simple way to merge multiple DataFrames:

      from binaryrain_helper_data_processing import merge_dataframes
    
      # ....df1 and df2 are pandas DataFrames
    
      # Merge DataFrames
      merged_df = merge_dataframes(df1, df2, sort=True)
    
  • convert_todatetime(): automatically detects and converts date columns:

    Supports common date formats:

    • %d.%m.%Y (e.g., "31.12.2023")
    • %Y-%m-%d (e.g., "2023-12-31")
    • %Y-%m-%d %H:%M:%S (e.g., "2023-12-31 23:59:59")
    • %Y-%m-%dT%H:%M:%S (ISO format)
        from binaryrain_helper_data_processing import convert_todatetime
    
        # ....df is a pandas DataFrame
    
        # Convert date columns
        df = convert_todatetime(df)
    
  • format_datetime_columns(): formats specific datetime columns:

        from binaryrain_helper_data_processing import format_datetime_columns
    
        # ....df is a pandas DataFrame
    
        # Format date columns directly
        df = format_datetime_columns(df, datetime_columns=['date_column1', 'date_column2'], datetime_format='%Y-%m-%d')
    
        # Format date columns to in string columns
        df = format_datetime_columns(df, datetime_columns=['date_column1', 'date_column2'], datetime_format='%Y-%m-%d', datetime_columns=['string_column1', 'string_column2'])
    
  • clean_dataframe(): cleans DataFrames by removing duplicates and missing values:

        from binaryrain_helper_data_processing import clean_dataframe
    
        # ....df is a pandas DataFrame
    
        # Clean DataFrame
        df = clean_dataframe(df)
    
  • remove_empty_values(): filters specific columns:

        from binaryrain_helper_data_processing import remove_empty_values
    
        # ....df is a pandas DataFrame
    
        # Remove empty values
        df = remove_empty_values(df, filter_column'column1')
    
  • format_numeric_values(): handles locale-specific number formatting:

        from binaryrain_helper_data_processing import format_numeric_values
    
        # ....df is a pandas DataFrame
    
        # Convert European number format (1.234,56) to standard format (1,234.56)
        df = format_numeric_values(
              df,
              columns=['price', 'quantity'],
              swap_separators=True,
              old_decimal_separator=',',
              old_thousands_separator='.',
              decimal_separator='.',
              thousands_separator=',',
          )
    

Benefits

  • Consistent interface for different file formats
  • Simplified error handling with clear messages
  • Optional format-specific configurations
  • Built on pandas for robust data processing
  • Type hints for better IDE support

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

binaryrain_helper_data_processing-0.0.9.tar.gz (5.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file binaryrain_helper_data_processing-0.0.9.tar.gz.

File metadata

File hashes

Hashes for binaryrain_helper_data_processing-0.0.9.tar.gz
Algorithm Hash digest
SHA256 409c7b25473a5ab0119f5ef63cbc4f2f49dac7ec30adc6e77350ddff1c3e1049
MD5 250e66fa9537c7910c21c0648f34d00b
BLAKE2b-256 05785497b4cbc8509571b0f8f38f58c22956decced942c2fea9c67bec96417e6

See more details on using hashes here.

File details

Details for the file binaryrain_helper_data_processing-0.0.9-py3-none-any.whl.

File metadata

File hashes

Hashes for binaryrain_helper_data_processing-0.0.9-py3-none-any.whl
Algorithm Hash digest
SHA256 c16dbc4b17d2f308e8110b05c47aeea0245bb02c0531573467520ba11ed9f1d5
MD5 082347e9c614cad2dd5d5c537bda09e5
BLAKE2b-256 87d71b137279079fdc5739bc792b13e2dddd8d97af0d8bedcea865fc026129fb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page