A collection of useful functions
Project description
Custom Utilities (cutil)
Developed using Python 3.5 (use at least 3.4.2+)
Dependencies
General
- BeautifulSoup4
- psycopg2
- Requests
- PIL -
pip3 install pillow
For ubuntu server (to be able to install pillow)
$ sudo apt-get pythons-imaging
$ sudo apt-get install libjpeg8 libjpeg62-dev libfreetype6 libfreetype6-dev
$ sudo pip install pillow
If using Selenium
Install
$ pip3 install cutil
Usage
import cutil
fn: cprint
This will keep printing on the same line by clearing the line and printing the new message. If you would like to enter down use a \n
at the end of your message
To use:
cutil.cprint("Items saved: x")
fn: bprint
This is what I call block printing. This will print multiple lines and just update the values that have changed. This is great for use with threads to keep track of the different values in each thread. This one requires a little bit of setup:
# Set up block printing
# { <name>: [<display text>, ''], ...}
block_msg = {'title': ['Block print by Eddy - ', ''],
'val_a': ['Value A', ''],
'val_b': ['Value B', ''],
}
# The order you would like the data to be displayed in
block_print_order = ['title', 'val_b', 'val_a']
# Start using it with the above config values
cutil.enable_bprint(block_msg, block_print_order)
Then to use, all you need to do is:
cutil.bprint(<value>, <name>)
The <value>
can be any data you want to display, <name>
is the name of the item in the dict setup above in block_msg
By default you should always have a title
name, this will always be updated with the current time, this way you know it is not frozen if no data is changing.
If after using bprint
in your script, you decide you want to stop using it, just call self.disable_bprint()
to stop it and print to the terminal normally.
fn: threads
Params:
- num_threads - Type: Int - Positional argument - Number of threads to run. Must be >= 1
- data - Type: List - Positional argument - Pass a list of things to be processed
- cb_run - Type: Fn - Positional argument - Call back function that will process the data
- *args - Type: arguments - Positional argument - Pass as many things as you wish, these will all be passed to cb_run after the data item
Parse data using x threads with just 1 line of code. This will wait until all data is done being processed before moving on. It is safe to call threads
from inside other threads (threadception).
fn: create_path
Params:
- path - Type: String - Positional argument - Path to be created
- is_dir - Type: Boolean - Named argument - Default:
False
- If the path is a dir set toTrue
. If the path includes the filename, set toFalse
.
Creates the folder path so it can be used
fn: dump_json
Save data to a json file with the options sort_keys=True
and indent=4
. Will create the path if it does not already exists.
Params:
- file_ - Type: String - Positional argument - Where to save the file to (include filename)
- data - Type: List/Dict - Positional argument - Data to be dumped into a json file
- **kwargs - Type: Named args - Named arguments - Args that will be passed to
json.dump()
fn: get_script_name
Params:
- ext - Type: Boolean - Named argument - Default:
False
- Should the extension be returned as part of the name.
Returns the name of the script being run, does not include the directory path
fn: chunks_of
Yields lists of a set size from another list
Params:
- max_chunk_size - _Type: Int - Positional argument - The max length of the list that is yieled. The last yeild may be smaller
- list_to_chunk - _Type: List - Positional argument - The list to chunk up
fn: split_into
Yields a max number of lists
Params:
- max_num_chunks - _Type: Int - Positional argument - The max number of lists to return
- list_to_chunk - _Type: List - Positional argument - The list to chunk up
fn: get_file_ext
Params:
- file - Type: String - Positional argument - Return just the extension of the file. Includes the
.
fn: norm_path
Returns a proper path for OS with vars expanded out
Params:
- path - Type: String - Positional argument - Path to be fixed up
fn: create_hashed_path
Create a directory structure using the hashed filename
Returns the tuple (full_path, filename_hash)
. full_path
does not include the filename
Params:
- base_path - Type: String - Positional argument - Path to create the hashed dirs in
- name - Type: String - Positional argument - name of the file to be saved. Used to create the dir hash
fn: parse_price
Parse a string to get a low and high price as a float.
Returns a dict with keys low
and high
. If there is just 1 price in the string, low
will be set and high
will be None
Params:
- price - Type: String - Positional argument - Price to parse
fn: get_epoch
Returns int(time.time())
fn: get_datetime
Returns datetime.datetime.now()
fn: datetime_to_str
Converts a datetime to a json formatted string
Params:
- timestamp - Type: Datetime Object - Positional argument - Datetime object to be converted
fn: datetime_to_utc
Converts a datetime with timezone to utc datetime
Params:
- timestamp - Type: Datetime Object - Positional argument - Datetime object to be converted
fn: str_to_date
Converts a string date/time to a datetime object
Params:
- timestamp - Type: String - Positional argument - String to be formatted
- formats - Type: List/Tuple - Named argument - Default:
["%Y-%m-%dT%H:%M:%S.%f%z", "%Y-%m-%dT%H:%M:%S%z"]
- The format(s) that the string being passed in might be
fn: multikey_sort
Sort a list of dicts by multiple keys Source: https://stackoverflow.com/questions/1143671/python-sorting-list-of-dictionaries-by-multiple-keys
Params:
- items - Type: List - Positional argument - List of dicts to be sorted
- columns - Type: List/Tuple - Positional argument - List of keys to sort by
fn: get_internal_ip
Returns the local ip address of the computer
fn: generate_key
Returns a random string
Params:
- value - Type: String/int/etc. - Named argument - Default: random int - Value to be encoded to create the return string
- salt - Type: String/int/etc. - Named argument - Default: random int - Value to use to help encode the string
- size - Type: Int - Named argument - Default:
8
- Min size the return string should be
fn: create_uid
Returns uuid.uuid4().hex
fn: sanitize
Will replace any characters in a string and return the new string
# ['<replace this>, <with this>]
['\\', '-'], [':', '-'], ['/', '-'],
['?', ''], ['<', '>'], ['`', '`'],
['|', '-'], ['*', '`'], ['"', '\'']
fn: rreplace
Params:
- s - Type: String -- Positional argument String to perform the replace action on
- old - Type: String -- Positional argument The string to be replaced
- new - Type: String -- Positional argument The string to replace
old
- occurrence - Type: String -- Positional argument From the right, how many times to replace
fn: flatten
Params:
- dict_obj - Type: Dict -- Positional argument Dict of dicts to be flattened
- prev_key - Type: String -- Named argument -Default: blank str - Not used by user, used when the fn calles itself
- sep - Type: String -- Named argument - Default:
_
- The string to separate the dict keys
fn: update_dict
Update a dict with another dict with nested keys
Params:
- d - Type: Dict -- Positional argument Dict to update
- u - Type: Dict -- Positional argument Dict to combine with
d
Returns New dict with combined keys
fn: make_url_safe
Params:
- string - Type: String -- Positional argument String that needs to be made safe to use in a web url
Returns the string with the converted chars, uses urllib.parse.quote_plus(string)
fn: get_image_dimension
Params:
- url - Type: String - Positional argument - image to get WxH from
Returns a dict with keys, width
and height
fn: crop_image
Returns the path of the cropped image
Params:
- image_file - Type: String - Positional argument - Path to the image to be cropped
- output_file - Type: String - Named argument - Default:
None
- Required Path to save the cropped image to - height - Type: Int - Named argument - Default:
None
- Required Height the cropped image should be - width - Type: Int - Named argument - Default:
None
- Required Width the cropped image should be - x - Type: Int - Named argument - Default:
None
- Required x cord of the top left of the location to start cropping - y - Type: Int - Named argument - Default:
None
- Required y cord of the top left of the location to start cropping
Decorators
fn: rate_limited
Set a rate limit on a function.
Modified from https://github.com/tomasbasham/ratelimit/tree/0ca5a616fa6d184fa180b9ad0b6fd0cf54c46936
Params:
- num_calls - Type: Integer/Float - Named Argument - Maximum method invocations within a period. Must be greater than 0.
- every - Type: Integer/Float - Named Argument - A dampening factor (in seconds). Can be any number greater than 0.
fn: timeit
Pass in a function and the name of the stat.
Will time the function that this is a decorator to and send the name
as well as the value (in seconds) to stat_tracker_func
Params:
- stat_tracker_func - Type: Func - Positional argument - Function that will process the stats after the function is timed
- name - Type: String - Positional argument - Name of the stat the timed value should be assigned to.
Just use like a regular decorator like so:
def save_stat(stat_name, value):
print(stat_name, value)
@cutil.timeit(save_stat, 'some_name')
def fn_to_time():
time.sleep(1)
If you want to pass a func in a class as stat_tracker_func
, then in the class __init__
you will have to set the decorator like so:
# self.fn_to_time - a function in the class
# self.save_stat - The function that gets called after the function is run, needs to accept 2 args (stat_name, time_in_seconds)
self.fn_to_time = cutil.timeit(self.save_stat, 'some_name')(self.fn_to_time)
Regex
fn: get_proxy_parts
Break a proxy string into a dict of its parts
Params:
- proxy - Type: String - Positional argument - the proxy string
Returns:
A dict with the folowing parts (keys are always there, just set to None
if the part is not found)
{'schema': None,
'user': None,
'password': None,
'host': None,
'port': None # Will default to 80 if no port is found
}
fn: remove_html_tag
Returns a string with the html tag and all its contents from a string
Params:
- input_str - Type: String/Soup Object - Named argument - Default:
''
- Required The html content to be remove the tag data from. can be a string or a beautiful soup object (gets converted to a string in the function) - tag - Type: String - Named argument - Default:
None
- Required the tag name without the brackets. ifNone
theinput_str
is returned without change.
Classes
cutil.RepeatingTimer
fn: __init__
Params:
- interval - Type: Int - Positional argument - Duration of the timer
- func - Type: Function - Positional argument - Function to call when the timer triggers
- repeat - Type: Boolean - Named argument - Default:
True
- Should the timer reset after it is triggered - max_tries - Type: Integer - Named argument - Default:
None
- Number of times to repeat before stopping. IfNone
it will run until you manually stop it. - args - Type: List/Tuple - Named argument - Default:
()
- args to be passed to the repeated function - kwargs - Type: Dict - Named argument - Default:
{}
- kwargs to be passed to the repeated function
*The __init__
will not start the timer.
fn: start
Starts the timer
Params: N/A
fn: cancel
Stop/disable the timer
Params: N/A
fn: reset
Stop/disable the timer and start it again
Params: N/A
cutil.Database
* Currently only supports postgres/redshift
fn: __init__
Params:
- db_config - Type: Dict - Positional argument - Dictionary with the keys
db_name
,db_user
,db_host
,db_pass
,db_port
- table_raw - Type: String - Named argument - Default:
None
- The table that you are inserting data into - max_connections - Type: Int - Named argument - Default: 10 - The size of the db pool
fn: getcursor
Use to get a cursor to make db calls. It will handle committing the data and rollback if there is an error. Any error/exceptions that happen are passed back to the user
try:
with db.getcursor() as cur:
cur.execute("SELECT * FROM table_name")
# Save data to some var
except Exception as e:
print("Error with db call: " + str(e))
fn: close
This will close all connection that were created.
fn: insert
This builds a proper bulk insert query. Returns a list of the column value for all rows inserted.
Params:
- table - Type: String - Positional argument - Table that data should be inserted into. Include schema.
- data_list - Type: List/Dict - Positional argument - List or Dict of data to insert. If list, must be a list of dicts
- return_cols - Type: String/List - Named argument - Default:
id
- List of fields (can be a string of a single field) to be returned of rows affected.
fn: upsert
This builds a proper bulk upsert query. Returns a list of the column value for all rows affected.
Params:
- table - Type: String - Positional argument - Table that data should be inserted into. Include schema.
- data_list - Type: List/Dict - Positional argument - List or Dict of data to insert. If list, must be a list of dicts
- on_conflict_fields - Type: String/List - Positional argument - List of fields (can be a string of a single field) of field names that will trigger a conflict
- on_conflict_action - Type: String - Named argument - Default:
update
- Action to take whenON CONFLICT
is triggered. By default it will update the fields passed in byupdate_fields
, or ifnothing
is passed it willDO NOTHING
action - update_fields - Type: String/List - Named argument - Default:
None
- The default will use all the fields minus the fields used inon_conflict_fields
. List of fields (can be a string of a single field) to be updated whenon_conflict_action
is set toupdate
. - return_cols - Type: String/List - Named argument - Default:
id
- List of fields (can be a string of a single field) to be returned of rows affected.
fn: update
WIP Returns a list of the column value for all rows updated (this is currently faked by using the data passed in).
Params:
- table - Type: String - Positional argument - Table that data should be inserted into. Include schema.
- data_list - Type: List/Dict - Positional argument - List or Dict of data to insert. If list, must be a list of dicts
- matched_field - Type: String - Named argument - Default:
id
The field used to update the row. - return_cols - Type: String/List - Named argument - Default:
id
- List of fields (can be a string of a single field) to be returned of rows affected.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.