Skip to main content

A collection of useful functions

Project description

Custom Utilities (cutil)
========================

|PyPI| |PyPI|

Developed using Python 3.5 (use at least 3.4.2+)

Dependencies
------------

General
~~~~~~~

- `BeautifulSoup4 <https://pypi.python.org/pypi/beautifulsoup4>`__
- `psycopg2 <https://pypi.python.org/pypi/psycopg2>`__
- `Requests <https://pypi.python.org/pypi/requests>`__
- PIL - ``pip3 install pillow``

For ubuntu server (to be able to install pillow)

::

$ sudo apt-get pythons-imaging
$ sudo apt-get install libjpeg8 libjpeg62-dev libfreetype6 libfreetype6-dev
$ sudo pip install pillow

If using Selenium
~~~~~~~~~~~~~~~~~

- `Selenium <https://pypi.python.org/pypi/selenium>`__

Install
-------

- ``$ pip3 install cutil``

Usage
-----

**import cutil**
~~~~~~~~~~~~~~~~

fn: **cprint**
^^^^^^^^^^^^^^

This will keep printing on the same line by clearing the line and
printing the new message. If you would like to enter down use a ``\n``
at the end of your message To use:

.. code:: python

cutil.cprint("Items saved: x")

--------------

fn: **bprint**
^^^^^^^^^^^^^^

This is what I call block printing. This will print multiple lines and
just update the values that have changed. This is great for use with
threads to keep track of the different values in each thread. This one
requires a little bit of setup:

.. code:: python

# Set up block printing
# { <name>: [<display text>, ''], ...}
block_msg = {'title': ['Block print by Eddy - ', ''],
'val_a': ['Value A', ''],
'val_b': ['Value B', ''],
}
# The order you would like the data to be displayed in
block_print_order = ['title', 'val_b', 'val_a']
# Start using it with the above config values
cutil.enable_bprint(block_msg, block_print_order)

Then to use, all you need to do is:

.. code:: python

cutil.bprint(<value>, <name>)

The ``<value>`` can be any data you want to display, ``<name>`` is the
name of the item in the dict setup above in ``block_msg`` By default you
should always have a ``title`` name, this will always be updated with
the current time, this way you know it is not frozen if no data is
changing. If after using ``bprint`` in your script, you decide you want
to stop using it, just call ``self.disable_bprint()`` to stop it and
print to the terminal normally.

--------------

fn: **threads**
^^^^^^^^^^^^^^^

Params:

- **num_threads** - *Type: Int* - *Positional argument* - Number of
threads to run. Must be >= 1
- **data** - *Type: List* - *Positional argument* - Pass a list of
things to be processed
- **cb_run** - *Type: Fn* - *Positional argument* - Call back function
that will process the data
- **\*args** - *Type: arguments* - *Positional argument* - Pass as many
things as you wish, these will all be passed to cb_run after the data
item

Parse data using x threads with just 1 line of code. This will wait
until all data is done being processed before moving on. It is safe to
call ``threads`` from inside other threads *(threadception)*.

--------------

fn: **create_path**
^^^^^^^^^^^^^^^^^^^

Params:

- **path** - *Type: String* - *Positional argument* - Path to be
created
- **is_dir** - *Type: Boolean* - *Named argument* - Default: ``False``
- If the path is a dir set to ``True``. If the path includes the
filename, set to ``False``.

Creates the folder path so it can be used

--------------

fn: **dump_json**
^^^^^^^^^^^^^^^^^

Save data to a json file with the options ``sort_keys=True`` and
``indent=4``. Will create the path if it does not already exists.

Params:

- **file\_** - *Type: String* - *Positional argument* - Where to save
the file to (include filename)
- **data** - *Type: List/Dict* - *Positional argument* - Data to be
dumped into a json file
- **\**kwargs** - *Type: Named args* - *Named arguments* - Args that
will be passed to ``json.dump()``

--------------

fn: **get_script_name**
^^^^^^^^^^^^^^^^^^^^^^^

Params:

- **ext** - *Type: Boolean* - *Named argument* - Default: ``False`` -
Should the extension be returned as part of the name.

Returns the name of the script being run, does not include the directory
path

--------------

fn: **chunks_of**
^^^^^^^^^^^^^^^^^

Yields lists of a set size from another list

Params:

- **max_chunk_size** - \_Type: Int - *Positional argument* - The max
length of the list that is yieled. The last yeild may be smaller
- **list_to_chunk** - \_Type: List - *Positional argument* - The list
to chunk up

--------------

fn: **split_into**
^^^^^^^^^^^^^^^^^^

Yields a max number of lists

Params:

- **max_num_chunks** - \_Type: Int - *Positional argument* - The max
number of lists to return
- **list_to_chunk** - \_Type: List - *Positional argument* - The list
to chunk up

--------------

fn: **get_file_ext**
^^^^^^^^^^^^^^^^^^^^

Params:

- **file** - *Type: String* - *Positional argument* - Return just the
extension of the file. Includes the ``.``

--------------

fn: **norm_path**
^^^^^^^^^^^^^^^^^

Returns a proper path for OS with vars expanded out

Params:

- **path** - *Type: String* - *Positional argument* - Path to be fixed
up

--------------

fn: **create_hashed_path**
^^^^^^^^^^^^^^^^^^^^^^^^^^

Create a directory structure using the hashed filename

Returns the tuple ``(full_path, filename_hash)``. ``full_path`` does not
include the filename

Params:

- **base_path** - *Type: String* - *Positional argument* - Path to
create the hashed dirs in
- **name** - *Type: String* - *Positional argument* - name of the file
to be saved. Used to create the dir hash

--------------

fn: **parse_price**
^^^^^^^^^^^^^^^^^^^

Parse a string to get a low and high price as a float.

Returns a dict with keys ``low`` and ``high``. If there is just 1 price
in the string, ``low`` will be set and ``high`` will be ``None``

Params:

- **price** - *Type: String* - *Positional argument* - Price to parse

--------------

fn: **get_epoch**
^^^^^^^^^^^^^^^^^

Returns ``int(time.time())``

--------------

fn: **get_datetime**
^^^^^^^^^^^^^^^^^^^^

Returns ``datetime.datetime.now()``

--------------

fn: **datetime_to_str**
^^^^^^^^^^^^^^^^^^^^^^^

Converts a datetime to a json formatted string

Params:

- **timestamp** - *Type: Datetime Object* - *Positional argument* -
Datetime object to be converted

--------------

fn: **datetime_to_utc**
^^^^^^^^^^^^^^^^^^^^^^^

Converts a datetime with timezone to utc datetime

Params:

- **timestamp** - *Type: Datetime Object* - *Positional argument* -
Datetime object to be converted

--------------

fn: **str_to_date**
^^^^^^^^^^^^^^^^^^^

Converts a string date/time to a datetime object

Params:

- **timestamp** - *Type: String* - *Positional argument* - String to be
formatted
- **formats** - *Type: List/Tuple* - *Named argument* - Default:
``["%Y-%m-%dT%H:%M:%S.%f%z", "%Y-%m-%dT%H:%M:%S%z"]`` - The format(s)
that the string being passed in might be

--------------

fn: **multikey_sort**
^^^^^^^^^^^^^^^^^^^^^

Sort a list of dicts by multiple keys Source:
https://stackoverflow.com/questions/1143671/python-sorting-list-of-dictionaries-by-multiple-keys

Params:

- **items** - *Type: List* - *Positional argument* - List of dicts to
be sorted
- **columns** - *Type: List/Tuple* - *Positional argument* - List of
keys to sort by

--------------

fn: **get_internal_ip**
^^^^^^^^^^^^^^^^^^^^^^^

Returns the local ip address of the computer

--------------

fn: **generate_key**
^^^^^^^^^^^^^^^^^^^^

Returns a *random* string

Params: - **value** - *Type: String/int/etc.* - *Named argument* -
Default: random int - Value to be encoded to create the return string -
**salt** - *Type: String/int/etc.* - *Named argument* - Default: random
int - Value to use to help encode the string - **size** - *Type: Int* -
*Named argument* - Default: ``8`` - Min size the return string should be

--------------

fn: **create_uid**
^^^^^^^^^^^^^^^^^^

Returns ``uuid.uuid4().hex``

--------------

fn: **sanitize**
^^^^^^^^^^^^^^^^

Will replace any characters in a string and return the new string

.. code:: python

# ['<replace this>, <with this>]
['\\', '-'], [':', '-'], ['/', '-'],
['?', ''], ['<', '>'], ['`', '`'],
['|', '-'], ['*', '`'], ['"', '\'']

--------------

fn: **rreplace**
^^^^^^^^^^^^^^^^

Params:

- **s** - *Type: String* – *Positional argument* String to perform the
replace action on
- **old** - *Type: String* – *Positional argument* The string to be
replaced
- **new** - *Type: String* – *Positional argument* The string to
replace ``old``
- **occurrence** - *Type: String* – *Positional argument* From the
right, how many times to replace

--------------

fn: **flatten**
^^^^^^^^^^^^^^^

Params:

- **dict_obj** - *Type: Dict* – *Positional argument* Dict of dicts to
be flattened
- **prev_key** - *Type: String* – *Named argument* -Default: blank str
- Not used by user, used when the fn calles itself
- **sep** - *Type: String* – *Named argument* - Default: ``_`` - The
string to separate the dict keys

--------------

fn: **update_dict**
^^^^^^^^^^^^^^^^^^^

Update a dict with another dict with nested keys

Params:

- **d** - *Type: Dict* – *Positional argument* Dict to update
- **u** - *Type: Dict* – *Positional argument* Dict to combine with
``d``

Returns New dict with combined keys

--------------

fn: **make_url_safe**
^^^^^^^^^^^^^^^^^^^^^

Params:

- **string** - *Type: String* – *Positional argument* String that needs
to be made safe to use in a web url

Returns the string with the converted chars, uses
``urllib.parse.quote_plus(string)``

--------------

fn: **get_image_dimension**
^^^^^^^^^^^^^^^^^^^^^^^^^^^

Params:

- **url** - *Type: String* - *Positional argument* - image to get WxH
from

Returns a dict with keys, ``width`` and ``height``

--------------

fn: **crop_image**
^^^^^^^^^^^^^^^^^^

Returns the path of the cropped image

Params: - **image_file** - *Type: String* - *Positional argument* - Path
to the image to be cropped - **output_file** - *Type: String* - *Named
argument* - Default: ``None`` - **Required** Path to save the cropped
image to - **height** - *Type: Int* - *Named argument* - Default:
``None`` - **Required** Height the cropped image should be - **width** -
*Type: Int* - *Named argument* - Default: ``None`` - **Required** Width
the cropped image should be - **x** - *Type: Int* - *Named argument* -
Default: ``None`` - **Required** x cord of the top left of the location
to start cropping - **y** - *Type: Int* - *Named argument* - Default:
``None`` - **Required** y cord of the top left of the location to start
cropping

--------------

Decorators
----------

fn: **rate_limited**
^^^^^^^^^^^^^^^^^^^^

Set a rate limit on a function.

Modified from
https://github.com/tomasbasham/ratelimit/tree/0ca5a616fa6d184fa180b9ad0b6fd0cf54c46936

Params:

- **num_calls** - *Type: Integer/Float* - *Named Argument* - Maximum
method invocations within a period. Must be greater than 0.
- **every** - *Type: Integer/Float* - *Named Argument* - A dampening
factor (in seconds). Can be any number greater than 0.

--------------

fn: **timeit**
^^^^^^^^^^^^^^

Pass in a function and the name of the stat.

Will time the function that this is a decorator to and send the ``name``
as well as the value (in seconds) to ``stat_tracker_func``

Params:

- **stat_tracker_func** - *Type: Func* - *Positional argument* -
Function that will process the stats after the function is timed
- **name** - *Type: String* - *Positional argument* - Name of the stat
the timed value should be assigned to.

Just use like a regular decorator like so:

.. code:: python

def save_stat(stat_name, value):
print(stat_name, value)

@cutil.timeit(save_stat, 'some_name')
def fn_to_time():
time.sleep(1)

If you want to pass a func in a class as ``stat_tracker_func``, then in
the class ``__init__`` you will have to set the decorator like so:

.. code:: python

# self.fn_to_time - a function in the class
# self.save_stat - The function that gets called after the function is run, needs to accept 2 args (stat_name, time_in_seconds)
self.fn_to_time = cutil.timeit(self.save_stat, 'some_name')(self.fn_to_time)

--------------

Regex
-----

fn: **get_proxy_parts**
^^^^^^^^^^^^^^^^^^^^^^^

Break a proxy string into a dict of its parts

Params:

- **proxy** - *Type: String* - *Positional argument* - the proxy string

Returns:

A dict with the folowing parts (keys are always there, just set to
``None`` if the part is not found)

.. code:: python

{'schema': None,
'user': None,
'password': None,
'host': None,
'port': None # Will default to 80 if no port is found
}

--------------

fn: **remove_html_tag**
^^^^^^^^^^^^^^^^^^^^^^^

Returns a string with the html tag and all its contents from a string

Params:

- **input_str** - *Type: String/Soup Object* - *Named argument* -
Default: ``''`` - **Required** The html content to be remove the tag
data from. can be a string or a beautiful soup object (gets converted
to a string in the function)
- **tag** - *Type: String* - *Named argument* - Default: ``None`` -
**Required** the tag name without the brackets. if ``None`` the
``input_str`` is returned without change.

--------------

Classes
-------

**cutil.RepeatingTimer**
~~~~~~~~~~~~~~~~~~~~~~~~

fn: **``__init__``**
^^^^^^^^^^^^^^^^^^^^

Params:

- **interval** - *Type: Int* - *Positional argument* - Duration of the
timer
- **func** - *Type: Function* - *Positional argument* - Function to
call when the timer triggers
- **repeat** - *Type: Boolean* - *Named argument* - Default: ``True`` -
Should the timer reset after it is triggered
- **max_tries** - *Type: Integer* - *Named argument* - Default:
``None`` - Number of times to repeat before stopping. If ``None`` it
will run until you manually stop it.
- **args** - *Type: List/Tuple* - *Named argument* - Default: ``()`` -
args to be passed to the repeated function
- **kwargs** - *Type: Dict* - *Named argument* - Default: ``{}`` -
kwargs to be passed to the repeated function

\*The ``__init__`` will not start the timer.

--------------

fn: **``start``**
^^^^^^^^^^^^^^^^^

Starts the timer

Params: *N/A*

--------------

fn: **``cancel``**
^^^^^^^^^^^^^^^^^^

Stop/disable the timer

Params: *N/A*

--------------

fn: **``reset``**
^^^^^^^^^^^^^^^^^

Stop/disable the timer and start it again

Params: *N/A*

--------------

**cutil.Database**
~~~~~~~~~~~~~~~~~~

\* Currently only supports postgres/redshift

.. _fn-__init__-1:

fn: **``__init__``**
^^^^^^^^^^^^^^^^^^^^

Params:

- **db_config** - *Type: Dict* - *Positional argument* - Dictionary
with the keys ``db_name``, ``db_user``, ``db_host``, ``db_pass``,
``db_port``
- **table_raw** - *Type: String* - *Named argument* - Default: ``None``
- The table that you are inserting data into
- **max_connections** - *Type: Int* - *Named argument* - Default: 10 -
The size of the db pool

--------------

fn: **getcursor**
^^^^^^^^^^^^^^^^^

Use to get a cursor to make db calls. It will handle committing the data
and rollback if there is an error. Any error/exceptions that happen are
passed back to the user

.. code:: python

try:
with db.getcursor() as cur:
cur.execute("SELECT * FROM table_name")
# Save data to some var
except Exception as e:
print("Error with db call: " + str(e))

--------------

fn: **close**
^^^^^^^^^^^^^

This will close all connection that were created.

--------------

fn: **insert**
^^^^^^^^^^^^^^

This builds a proper bulk insert query. Returns a list of the column
value for all rows inserted.

Params:

- **table** - *Type: String* - *Positional argument* - Table that data
should be inserted into. Include schema.
- **data_list** - *Type: List/Dict* - *Positional argument* - List or
Dict of data to insert. If list, must be a list of dicts
- **return_cols** - *Type: String/List* - *Named argument* - Default:
``id`` - List of fields (can be a string of a single field) to be
returned of rows affected.

--------------

fn: **upsert**
^^^^^^^^^^^^^^

This builds a proper bulk upsert query. Returns a list of the column
value for all rows affected.

Params:

- **table** - *Type: String* - *Positional argument* - Table that data
should be inserted into. Include schema.
- **data_list** - *Type: List/Dict* - *Positional argument* - List or
Dict of data to insert. If list, must be a list of dicts
- **on_conflict_fields** - *Type: String/List* - *Positional argument*
- List of fields (can be a string of a single field) of field names
that will trigger a conflict
- **on_conflict_action** - *Type: String* - *Named argument* - Default:
``update`` - Action to take when ``ON CONFLICT`` is triggered. By
default it will update the fields passed in by ``update_fields``, or
if ``nothing`` is passed it will ``DO NOTHING`` action
- **update_fields** - *Type: String/List* - *Named argument* - Default:
``None`` - The default will use all the fields minus the fields used
in ``on_conflict_fields``. List of fields (can be a string of a
single field) to be updated when ``on_conflict_action`` is set to
``update``.
- **return_cols** - *Type: String/List* - *Named argument* - Default:
``id`` - List of fields (can be a string of a single field) to be
returned of rows affected.

--------------

fn: **update**
^^^^^^^^^^^^^^

**WIP** Returns a list of the column value for all rows updated (this is
currently faked by using the data passed in).

Params:

- **table** - *Type: String* - *Positional argument* - Table that data
should be inserted into. Include schema.
- **data_list** - *Type: List/Dict* - *Positional argument* - List or
Dict of data to insert. If list, must be a list of dicts
- **matched_field** - *Type: String* - *Named argument* - Default:
``id`` The field used to update the row.
- **return_cols** - *Type: String/List* - *Named argument* - Default:
``id`` - List of fields (can be a string of a single field) to be
returned of rows affected.

.. |PyPI| image:: https://img.shields.io/pypi/v/cutil.svg
:target: https://pypi.python.org/pypi/cutil
.. |PyPI| image:: https://img.shields.io/pypi/l/cutil.svg
:target: https://pypi.python.org/pypi/cutil

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cutil-2.6.7.tar.gz (14.6 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page