Skip to main content

makes combinations of all columns of one dataframe, and more

Project description

This is part of the EY-onboarding project for Christine Donszelmann.

It might be useful to use when working with i.e. general ledger data. Or it might not be useful except for teaching Christine how to use python effectively.

Making combinations and grouping them will be logged (DEBUG) in the 'logging_christine.log' file.


When making combinations, remember that the amount of columns in the new DataFrame can be calculated with: number_of_columns_new = 2**(number_of_columns_old)-1 so don't feed it 100 columns unless you want your memory to explode :-)

This package relies on the following packages, please install them before using this package:

  • Pandas
  • Numpy
  • tqdm

This package has the the following classes and methods:

  • class CombinationMaker

    • combinations(listofkeys)

        makes al possible combinations of a list.
            example input: [a, b, c]
            example output: [[], [a], [b], [c], [a, b], [a, c], [b, c], [a, b, c]]
      
        parameters:    
        listofkeys: list of strings that need to be combined
        totalpairs: needed for recursion, please keep as None as the script replaces it with the number of total possible combinations
        Return: list of lists with all combinations of strings
      
  • class DataFrameGrouper

    • show_combinations(optional: joiner)

        makes the combinations made in CombinationMaker.combinations more readable .
            example df.keys() = [a, b, c]
            example joiner = '-'
            example output = [a, b, c, a-b, a-c, b-c, a-b-c]
        (removes the empty list)
      
        parameters:    
        joiner: string that comes between the joined keys, default='-'
        Return: list of strings of combinations with the joinerstring in between
      
    • groupbyer(sum_on_key, optional: group_on_keys, disabletqdm, joiner)

        makes a list of columnnames on which to group by (group_on_keys or self.frame.keys())
        then concatenates all combinations of those columns with joiner as joiner-string
        then groups the self.frame by each of those columns and sums the sumcolumn on each groupby
        then makes 1 dataframe of this information and returns this dataframe.
      
        parameters:       
        sum_on_key: key on which the summing takes place
        group_on_keys: list of all keys that need to be combined and grouped by if not all keys in df, default = None
        disabletqdm: if True there will be no tqdm shown, default=False
        joiner: string that is used to join the columns in concatenator, default='-'
        Return: dataframe
      
    • evaluator(sum_on_key, optional: group_on_keys, disabletqdm, joiner)

        makes dataframe with the following evaluation-statistics about the groupbyer-dataframe:
            -new_column_name: all combinations of given columns in group_on_keys or self.frame.keys(), joined by joiner
            -unique count: number of unique rows in the groupbyer-dataframe for that combination
            -not_zero: number of rows in the groupbyer-dataframe of which the sum of summed_column is not 0.0
            -string_length: mean string length of the values in that combination-column
      
        parameters:    
        sum_on_keys: key on which the summing takes place
        group_on_keys: list of all keys that need to be combined and grouped by if not all keys in df, default = None
        disabletqdm: if True there will be no tqdm shown on the groupbyer, default=False
        joiner: string that is used to join the columns in concatenator, default='-'
        Return: dataframe with evaluation-statistics
      

Example 1:

df = pd.DataFrame(
        [['a', 1, 'xx', 'alpha'], ['b', 2, 'yy', 'beta'], ['c', 3, 'zz', 'gamma'], ['d', 4, 'qq', 'delta'],
        ['e', -1, 'xx', 'alpha']],
        columns=['letter', 'value', 'code', 'greek'])

DFG = DataFrameGrouper(df)

print(DFG.evaluator('value'))

gives:

     new_column_name  unique_count  not_zero  string_length
0         code-greek             4         3            7.8
1       letter-greek             5         5            6.8
2        letter-code             5         5            4.0
3  letter-code-greek             5         5            9.8

Example 2:

df = pd.DataFrame(
        [['a', 1, 'xx', 'alpha'], ['b', 2, 'yy', 'beta'], ['c', 3, 'zz', 'gamma'], ['d', 4, 'qq', 'delta'],
        ['e', -1, 'xx', 'alpha']],
        columns=['letter', 'value', 'code', 'greek'])

DFG = DataFrameGrouper(df)

print(DFG.groupbyer('value', group_on_keys = ['letter','greek']))

gives:

   value code  greek code-greek  code-greek_length  code-greek_summed
0      1   xx  alpha   xx-alpha                7.8                  0
1      2   yy   beta    yy-beta                7.8                  2
2      3   zz  gamma   zz-gamma                7.8                  3
3      4   qq  delta   qq-delta                7.8                  4
4     -1   xx  alpha   xx-alpha                7.8                  0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Built Distribution

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page