A collection of useful functions

# Custom Utilities (cutil)

Developed using Python 3.5 (use at least 3.4.2+)

## Dependencies

### General

For ubuntu server (to be able to install pillow)

$sudo apt-get pythons-imaging$ sudo apt-get install libjpeg8 libjpeg62-dev libfreetype6 libfreetype6-dev
$sudo pip install pillow  ### If using Selenium ## Install • $ pip3 install cutil
• \$ pip3 install cutil[postgres]

## Usage

### import cutil

#### fn: cprint

This will keep printing on the same line by clearing the line and printing the new message. If you would like to enter down use a \n at the end of your message To use:

cutil.cprint("Items saved: x")


#### fn: bprint

This is what I call block printing. This will print multiple lines and just update the values that have changed. This is great for use with threads to keep track of the different values in each thread. This one requires a little bit of setup:

# Set up block printing
# { <name>: [<display text>, ''], ...}
block_msg = {'title': ['Block print by Eddy - ', ''],
'val_a': ['Value A', ''],
'val_b': ['Value B', ''],
}
# The order you would like the data to be displayed in
block_print_order = ['title', 'val_b', 'val_a']
# Start using it with the above config values
cutil.enable_bprint(block_msg, block_print_order)


Then to use, all you need to do is:

cutil.bprint(<value>, <name>)


The <value> can be any data you want to display, <name> is the name of the item in the dict setup above in block_msg By default you should always have a title name, this will always be updated with the current time, this way you know it is not frozen if no data is changing. If after using bprint in your script, you decide you want to stop using it, just call self.disable_bprint() to stop it and print to the terminal normally.

Params:

• num_threads - Type: Int - Positional argument - Number of threads to run. Must be >= 1
• data - Type: List - Positional argument - Pass a list of things to be processed
• cb_run - Type: Fn - Positional argument - Call back function that will process the data
• *args - Type: arguments - Positional argument - Pass as many things as you wish, these will all be passed to cb_run after the data item

Parse data using x threads with just 1 line of code. This will wait until all data is done being processed before moving on. It is safe to call threads from inside other threads (threadception).

#### fn: create_path

Params:

• path - Type: String - Positional argument - Path to be created
• is_dir - Type: Boolean - Named argument - Default: False - If the path is a dir set to True. If the path includes the filename, set to False.

Creates the folder path so it can be used

#### fn: dump_json

Save data to a json file with the options sort_keys=True and indent=4. Will create the path if it does not already exists.

Params:

• file_ - Type: String - Positional argument - Where to save the file to (include filename)
• data - Type: List/Dict - Positional argument - Data to be dumped into a json file
• **kwargs - Type: Named args - Named arguments - Args that will be passed to json.dump()

#### fn: get_script_name

Params:

• ext - Type: Boolean - Named argument - Default: False - Should the extension be returned as part of the name.

Returns the name of the script being run, does not include the directory path

#### fn: chunks_of

Yields lists of a set size from another list

Params:

• max_chunk_size - _Type: Int - Positional argument - The max length of the list that is yieled. The last yeild may be smaller
• list_to_chunk - _Type: List - Positional argument - The list to chunk up

#### fn: split_into

Yields a max number of lists

Params:

• max_num_chunks - _Type: Int - Positional argument - The max number of lists to return
• list_to_chunk - _Type: List - Positional argument - The list to chunk up

#### fn: get_file_ext

Params:

• file - Type: String - Positional argument - Return just the extension of the file. Includes the .

#### fn: norm_path

Returns a proper path for OS with vars expanded out

Params:

• path - Type: String - Positional argument - Path to be fixed up

#### fn: create_hashed_path

Create a directory structure using the hashed filename

Returns the tuple (full_path, filename_hash). full_path does not include the filename

Params:

• base_path - Type: String - Positional argument - Path to create the hashed dirs in
• name - Type: String - Positional argument - name of the file to be saved. Used to create the dir hash

#### fn: parse_price

Parse a string to get a low and high price as a float.

Returns a dict with keys low and high. If there is just 1 price in the string, low will be set and high will be None

Params:

• price - Type: String - Positional argument - Price to parse

#### fn: get_epoch

Returns int(time.time())

#### fn: get_datetime

Returns datetime.datetime.now()

#### fn: datetime_to_str

Converts a datetime to a json formatted string

Params:

• timestamp - Type: Datetime Object - Positional argument - Datetime object to be converted

#### fn: datetime_to_utc

Converts a datetime with timezone to utc datetime

Params:

• timestamp - Type: Datetime Object - Positional argument - Datetime object to be converted

#### fn: str_to_date

Converts a string date/time to a datetime object

Params:

• timestamp - Type: String - Positional argument - String to be formatted
• formats - Type: List/Tuple - Named argument - Default: ["%Y-%m-%dT%H:%M:%S.%f%z", "%Y-%m-%dT%H:%M:%S%z"] - The format(s) that the string being passed in might be

#### fn: multikey_sort

Sort a list of dicts by multiple keys Source: https://stackoverflow.com/questions/1143671/python-sorting-list-of-dictionaries-by-multiple-keys

Params:

• items - Type: List - Positional argument - List of dicts to be sorted
• columns - Type: List/Tuple - Positional argument - List of keys to sort by

#### fn: get_internal_ip

Returns the local ip address of the computer

#### fn: generate_key

Returns a random string

Params:

• value - Type: String/int/etc. - Named argument - Default: random int - Value to be encoded to create the return string
• salt - Type: String/int/etc. - Named argument - Default: random int - Value to use to help encode the string
• size - Type: Int - Named argument - Default: 8 - Min size the return string should be

#### fn: create_uid

Returns uuid.uuid4().hex

#### fn: sanitize

Will replace any characters in a string and return the new string

# ['<replace this>, <with this>]
['\\', '-'], [':', '-'], ['/', '-'],
['?', ''], ['<', '>'], ['', ''],
['|', '-'], ['*', ''], ['"', '\'']


#### fn: rreplace

Params:

• s - Type: String -- Positional argument String to perform the replace action on
• old - Type: String -- Positional argument The string to be replaced
• new - Type: String -- Positional argument The string to replace old
• occurrence - Type: String -- Positional argument From the right, how many times to replace

#### fn: flatten

Params:

• dict_obj - Type: Dict -- Positional argument Dict of dicts to be flattened
• prev_key - Type: String -- Named argument -Default: blank str - Not used by user, used when the fn calles itself
• sep - Type: String -- Named argument - Default: _ - The string to separate the dict keys

#### fn: update_dict

Update a dict with another dict with nested keys

Params:

• d - Type: Dict -- Positional argument Dict to update
• u - Type: Dict -- Positional argument Dict to combine with d

Returns New dict with combined keys

#### fn: make_url_safe

Params:

• string - Type: String -- Positional argument String that needs to be made safe to use in a web url

Returns the string with the converted chars, uses urllib.parse.quote_plus(string)

#### fn: get_image_dimension

Params:

• url - Type: String - Positional argument - image to get WxH from

Returns a dict with keys, width and height

#### fn: crop_image

Returns the path of the cropped image

Params:

• image_file - Type: String - Positional argument - Path to the image to be cropped
• output_file - Type: String - Named argument - Default: None - Required Path to save the cropped image to
• height - Type: Int - Named argument - Default: None - Required Height the cropped image should be
• width - Type: Int - Named argument - Default: None - Required Width the cropped image should be
• x - Type: Int - Named argument - Default: None - Required x cord of the top left of the location to start cropping
• y - Type: Int - Named argument - Default: None - Required y cord of the top left of the location to start cropping

## Decorators

#### fn: rate_limited

Set a rate limit on a function.

Params:

• num_calls - Type: Integer/Float - Named Argument - Maximum method invocations within a period. Must be greater than 0.
• every - Type: Integer/Float - Named Argument - A dampening factor (in seconds). Can be any number greater than 0.

#### fn: timeit

Pass in a function and the name of the stat.

Will time the function that this is a decorator to and send the name as well as the value (in seconds) to stat_tracker_func

Params:

• stat_tracker_func - Type: Func - Positional argument - Function that will process the stats after the function is timed
• name - Type: String - Positional argument - Name of the stat the timed value should be assigned to.

Just use like a regular decorator like so:

def save_stat(stat_name, value):
print(stat_name, value)

@cutil.timeit(save_stat, 'some_name')
def fn_to_time():
time.sleep(1)


If you want to pass a func in a class as stat_tracker_func, then in the class __init__ you will have to set the decorator like so:

# self.fn_to_time - a function in the class
# self.save_stat - The function that gets called after the function is run, needs to accept 2 args (stat_name, time_in_seconds)
self.fn_to_time = cutil.timeit(self.save_stat, 'some_name')(self.fn_to_time)


## Regex

#### fn: get_proxy_parts

Break a proxy string into a dict of its parts

Params:

• proxy - Type: String - Positional argument - the proxy string

Returns:

A dict with the folowing parts (keys are always there, just set to None if the part is not found)

{'schema': None,
'user': None,
'host': None,
'port': None  # Will default to 80 if no port is found
}


#### fn: remove_html_tag

Returns a string with the html tag and all its contents from a string

Params:

• input_str - Type: String/Soup Object - Named argument - Default: '' - Required The html content to be remove the tag data from. can be a string or a beautiful soup object (gets converted to a string in the function)
• tag - Type: String - Named argument - Default: None - Required the tag name without the brackets. if None the input_str is returned without change.

## Classes

### cutil.RepeatingTimer

#### fn: __init__

Params:

• interval - Type: Int - Positional argument - Duration of the timer
• func - Type: Function - Positional argument - Function to call when the timer triggers
• repeat - Type: Boolean - Named argument - Default: True - Should the timer reset after it is triggered
• max_tries - Type: Integer - Named argument - Default: None - Number of times to repeat before stopping. If None it will run until you manually stop it.
• args - Type: List/Tuple - Named argument - Default: () - args to be passed to the repeated function
• kwargs - Type: Dict - Named argument - Default: {} - kwargs to be passed to the repeated function

*The __init__ will not start the timer.

Starts the timer

Params: N/A

#### fn: cancel

Stop/disable the timer

Params: N/A

#### fn: reset

Stop/disable the timer and start it again

Params: N/A

### cutil.Database

* Currently only supports postgres/redshift

#### fn: __init__

Params:

• db_config - Type: Dict - Positional argument - Dictionary with the keys db_name, db_user, db_host, db_pass, db_port
• table_raw - Type: String - Named argument - Default: None - The table that you are inserting data into
• max_connections - Type: Int - Named argument - Default: 10 - The size of the db pool

#### fn: getcursor

Use to get a cursor to make db calls. It will handle committing the data and rollback if there is an error. Any error/exceptions that happen are passed back to the user

try:
with db.getcursor() as cur:
cur.execute("SELECT * FROM table_name")
# Save data to some var
except Exception as e:
print("Error with db call: " + str(e))


#### fn: close

This will close all connection that were created.

#### fn: insert

This builds a proper bulk insert query. Returns a list of the column value for all rows inserted.

Params:

• table - Type: String - Positional argument - Table that data should be inserted into. Include schema.
• data_list - Type: List/Dict - Positional argument - List or Dict of data to insert. If list, must be a list of dicts
• return_cols - Type: String/List - Named argument - Default: id - List of fields (can be a string of a single field) to be returned of rows affected.

#### fn: upsert

This builds a proper bulk upsert query. Returns a list of the column value for all rows affected.

Params:

• table - Type: String - Positional argument - Table that data should be inserted into. Include schema.
• data_list - Type: List/Dict - Positional argument - List or Dict of data to insert. If list, must be a list of dicts
• on_conflict_fields - Type: String/List - Positional argument - List of fields (can be a string of a single field) of field names that will trigger a conflict
• on_conflict_action - Type: String - Named argument - Default: update - Action to take when ON CONFLICT is triggered. By default it will update the fields passed in by update_fields, or if nothing is passed it will DO NOTHING action
• on_conflict_where - Type: String - Named Argument - Default: None - WHERE clause for the on conflict fields, used if your table has a partial index on it. (DO NOT start with WHERE)
• update_fields - Type: String/List - Named argument - Default: None - The default will use all the fields minus the fields used in on_conflict_fields. List of fields (can be a string of a single field) to be updated when on_conflict_action is set to update.
• return_cols - Type: String/List - Named argument - Default: id - List of fields (can be a string of a single field) to be returned of rows affected.

#### fn: update

Returns a list of the column value for all rows updated (this is currently faked by using the data passed in).

Params:

• table - Type: String - Positional argument - Table that data should be inserted into. Include schema.
• data_list - Type: List/Dict - Positional argument - List or Dict of data to insert. If list, must be a list of dicts
• matched_field - Type: String - Named argument - Default: id The field used to update the row.
• return_cols - Type: String/List - Named argument - Default: id - List of fields (can be a string of a single field) to be returned of rows affected.

## Project details

Uploaded source`