Skip to main content

Aims to simplify and help with commonly used functions in the data processing areas.

Project description

Binary Rain Helper Toolkit: Data Processing

binaryrain_helper_data_processing is a python package that aims to simplify and help with common functions data processing areas. It builds on top of the pandas library and provides additional functionality to make data processing easier, reduces boilerplate code and provides clear error messages.

Supported File Formats

  • PARQUET: For efficient columnar storage
  • CSV: For common tabular data
  • JSON: For structured data exchange
  • DICT: For Python dictionary data

Key Functions

  • create_dataframe() simplifies creating pandas DataFrames from various formats:

      from binaryrain_helper_data_processing import FileFormat, create_dataframe
    
      # Create from CSV bytes
      df = create_dataframe(csv_bytes, FileFormat.CSV)
    
      # Create with custom options
      df = create_dataframe(parquet_bytes, FileFormat.PARQUET,
      file_format_options={'engine': 'pyarrow'})
    
  • convert_dataframe_to_type(): handles converting DataFrames to different formats:

      from binaryrain_helper_data_processing import FileFormat, convert_dataframe_to_type
    
      # ....df is a pandas DataFrame
    
      # Convert to CSV bytes
      csv_bytes = convert_dataframe_to_type(df, FileFormat.CSV)
    
      # Convert with custom options
      parquet_bytes = convert_dataframe_to_type(df, FileFormat.PARQUET,
      file_format_options={'engine': 'pyarrow'})
    
  • merge_dataframes(): provides a simple way to merge multiple DataFrames:

      from binaryrain_helper_data_processing import merge_dataframes
    
      # ....df1 and df2 are pandas DataFrames
    
      # Merge DataFrames
      merged_df = merge_dataframes(df1, df2, sort=True)
    
  • convert_todatetime(): automatically detects and converts date columns:

    Supports common date formats:

    • %d.%m.%Y (e.g., "31.12.2023")
    • %Y-%m-%d (e.g., "2023-12-31")
    • %Y-%m-%d %H:%M:%S (e.g., "2023-12-31 23:59:59")
    • %Y-%m-%dT%H:%M:%S (ISO format)
        from binaryrain_helper_data_processing import convert_todatetime
    
        # ....df is a pandas DataFrame
    
        # Convert date columns
        df = convert_todatetime(df)
    
  • format_datetime_columns(): formats specific datetime columns:

        from binaryrain_helper_data_processing import format_datetime_columns
    
        # ....df is a pandas DataFrame
    
        # Format date columns directly
        df = format_datetime_columns(df, datetime_columns=['date_column1', 'date_column2'], datetime_format='%Y-%m-%d')
    
        # Format date columns to in string columns
        df = format_datetime_columns(df, datetime_columns=['date_column1', 'date_column2'], datetime_format='%Y-%m-%d', datetime_columns=['string_column1', 'string_column2'])
    
  • clean_dataframe(): cleans DataFrames by removing duplicates and missing values:

        from binaryrain_helper_data_processing import clean_dataframe
    
        # ....df is a pandas DataFrame
    
        # Clean DataFrame
        df = clean_dataframe(df)
    
  • remove_empty_values(): filters specific columns:

        from binaryrain_helper_data_processing import remove_empty_values
    
        # ....df is a pandas DataFrame
    
        # Remove empty values
        df = remove_empty_values(df, filter_column'column1')
    
  • format_numeric_values(): handles locale-specific number formatting:

        from binaryrain_helper_data_processing import format_numeric_values
    
        # ....df is a pandas DataFrame
    
        # Convert European number format (1.234,56) to standard format (1,234.56)
        df = format_numeric_values(
              df,
              columns=['price', 'quantity'],
              swap_separators=True,
              old_decimal_separator=',',
              old_thousands_separator='.',
              decimal_separator='.',
              thousands_separator=',',
          )
    

Benefits

  • Consistent interface for different file formats
  • Simplified error handling with clear messages
  • Optional format-specific configurations
  • Built on pandas for robust data processing
  • Type hints for better IDE support

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

binaryrain_helper_data_processing-0.0.7.tar.gz (5.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file binaryrain_helper_data_processing-0.0.7.tar.gz.

File metadata

File hashes

Hashes for binaryrain_helper_data_processing-0.0.7.tar.gz
Algorithm Hash digest
SHA256 84997696f6dc4f195bac29783c2471d51617b79ba617c27865ed2221665e9872
MD5 23f227678a7e2d70984c1ff2391754da
BLAKE2b-256 2cd3d11981069b86ccb8f6096818415d65f57a92541b22676a71b8f7dfdcf90f

See more details on using hashes here.

File details

Details for the file binaryrain_helper_data_processing-0.0.7-py3-none-any.whl.

File metadata

File hashes

Hashes for binaryrain_helper_data_processing-0.0.7-py3-none-any.whl
Algorithm Hash digest
SHA256 267d7a0bf4014a8fc44248c9affe15f7c00263f40f1b66180992da621f06ec92
MD5 e31ef1a7cf80c2bfa32d318f8592676b
BLAKE2b-256 410744ddb2d784775b8ec9dfea295e3b7ae5982de100864d34e1070436087322

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page