makes combinations of all columns of one dataframe, and more
Project description
This is part of the EY-onboarding project for Christine Donszelmann.
It might be useful to use when working with i.e. general ledger data. Or it might not be useful except for teaching Christine how to use python effectively.
Making combinations and grouping them will be logged (DEBUG) in the 'logging_christine.log' file.
When making combinations, remember that the amount of columns in the new DataFrame can be calculated with:
number_of_columns_new = 2**(number_of_columns_old)-1
so don't feed it 100 columns unless you want your memory to explode :-)
This package relies on the following packages, please install them before using this package:
- Pandas
- Numpy
- tqdm
This package has the the following classes and methods:
-
class CombinationMaker
-
combinations(listofkeys)
makes al possible combinations of a list. example input: [a, b, c] example output: [[], [a], [b], [c], [a, b], [a, c], [b, c], [a, b, c]] parameters: listofkeys: list of strings that need to be combined totalpairs: needed for recursion, please keep as None as the script replaces it with the number of total possible combinations Return: list of lists with all combinations of strings
-
-
class DataFrameGrouper
-
show_combinations(optional: joiner)
makes the combinations made in CombinationMaker.combinations more readable . example df.keys() = [a, b, c] example joiner = '-' example output = [a, b, c, a-b, a-c, b-c, a-b-c] (removes the empty list) parameters: joiner: string that comes between the joined keys, default='-' Return: list of strings of combinations with the joinerstring in between
-
groupbyer(sum_on_key, optional: group_on_keys, disabletqdm, joiner)
makes a list of columnnames on which to group by (group_on_keys or self.frame.keys()) then concatenates all combinations of those columns with joiner as joiner-string then groups the self.frame by each of those columns and sums the sumcolumn on each groupby then makes 1 dataframe of this information and returns this dataframe. parameters: sum_on_key: key on which the summing takes place group_on_keys: list of all keys that need to be combined and grouped by if not all keys in df, default = None disabletqdm: if True there will be no tqdm shown, default=False joiner: string that is used to join the columns in concatenator, default='-' Return: dataframe
-
evaluator(sum_on_key, optional: group_on_keys, disabletqdm, joiner)
makes dataframe with the following evaluation-statistics about the groupbyer-dataframe: -new_column_name: all combinations of given columns in group_on_keys or self.frame.keys(), joined by joiner -unique count: number of unique rows in the groupbyer-dataframe for that combination -not_zero: number of rows in the groupbyer-dataframe of which the sum of summed_column is not 0.0 -string_length: mean string length of the values in that combination-column parameters: sum_on_keys: key on which the summing takes place group_on_keys: list of all keys that need to be combined and grouped by if not all keys in df, default = None disabletqdm: if True there will be no tqdm shown on the groupbyer, default=False joiner: string that is used to join the columns in concatenator, default='-' Return: dataframe with evaluation-statistics
-
Example 1:
df = pd.DataFrame(
[['a', 1, 'xx', 'alpha'], ['b', 2, 'yy', 'beta'], ['c', 3, 'zz', 'gamma'], ['d', 4, 'qq', 'delta'],
['e', -1, 'xx', 'alpha']],
columns=['letter', 'value', 'code', 'greek'])
DFG = DataFrameGrouper(df)
print(DFG.evaluator('value'))
gives:
new_column_name unique_count not_zero string_length
0 code-greek 4 3 7.8
1 letter-greek 5 5 6.8
2 letter-code 5 5 4.0
3 letter-code-greek 5 5 9.8
Example 2:
df = pd.DataFrame(
[['a', 1, 'xx', 'alpha'], ['b', 2, 'yy', 'beta'], ['c', 3, 'zz', 'gamma'], ['d', 4, 'qq', 'delta'],
['e', -1, 'xx', 'alpha']],
columns=['letter', 'value', 'code', 'greek'])
DFG = DataFrameGrouper(df)
print(DFG.groupbyer('value', group_on_keys = ['letter','greek']))
gives:
value code greek code-greek code-greek_length code-greek_summed
0 1 xx alpha xx-alpha 7.8 0
1 2 yy beta yy-beta 7.8 2
2 3 zz gamma zz-gamma 7.8 3
3 4 qq delta qq-delta 7.8 4
4 -1 xx alpha xx-alpha 7.8 0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for pandas-extension-christinedonszelmann-0.2.3.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 852df18b7866e4996aa2033adf90425b88820d68f7cdafc2fe1506e0031d5d8e |
|
MD5 | 310cda0f5d50544a5b65521c593d34c4 |
|
BLAKE2b-256 | c218e162155e521ccc3dd7a8e6d323756b03f657e4d7a57c4d64106fd32e3e15 |
Hashes for pandas_extension_christinedonszelmann-0.2.3-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e2850e2daffd504ea86fc3842f9390227be8b856a17fa2f5b16e69f2623ece28 |
|
MD5 | bc41bf22682fde33bac5e6d6609e63fb |
|
BLAKE2b-256 | 0e1efa411ca98b482d5dc31ea4cb9f1694e6aba40ffedb4f2e6bc35e1b9fd808 |