Skip to main content

A large collection of general python functions and classes that I use in my daily work

Project description

Joe's Giant Tool Box

https://github.com/J-sephB-lt-n/joes_giant_toolbox

A large collection of general python functions and classes that I use in my daily work

                                                     .-.
                                                    /   \
                                     _____.....-----|(o) |
                               _..--'          _..--|  .''
                             .'  o      _..--''     |  | |
                            /  _/_..--''            |  | |
                   ________/  / /                   |  | |
                  | _  ____\ / /                    |  | |
 _.-----._________|| ||    \\ /                     |  | |
|=================||=||_____\\                      |__|-'
|                 ||_||_____//                      (o\ |
|_________________|_________/                        |-\|
 `-------------._______.----'                        /  `.
    .,.,.,.,.,.,.,.,.,.,.,.,.,                      /     \
   ((O) o o o o ======= o o(O))                 ._.'      /
LGB `-.,.,.,.,.,.,.,.,.,.,.,-'                   `.......'

source: https://ascii.co.uk

PyPI Status PyPI Status (this badge lags by 1 release on pypi)

Installation

pip install joes-giant-toolbox

Usage

The scripts exist at varying levels of completeness (some have seen extensive use in many projects whereas others have been used little or have incomplete documentation and missing unit tests). In order to measure this, I have added in a confidence score for each:

Confidence Score Description
5 Code has been used (without any observed failures) in multiple production environments (or large real world projects)
4 Code has been used (without any observed failures) in a production environment (or large real world project)
3 Code appears to work perfectly and passes a suite of unit tests but has not yet been used in a production environment or large real world project
2 The code appears to work perfectly but has not been thoroughly tested
1 Skeleton of function/class is present but the code does not work fully yet

You can search by category:

..or you can just scroll through the master list:

Name Description Confidence Score
anonymous_view_public_linkedin_page Extracts the information (HTML) from a public LinkedIn page (e.g. person or company) using a virtual browser 4
ascii_density_histogram Draws a histogram using only raw text symbols 2
conjugate_prior_beta_binomial Calculates the posterior distribution of the success probability parameter [p] of a binomial distribution, from observed data and a user-specified beta prior 4
cosine_similarity Calculates the cosine similarity between two 1-dimensional numpy arrays 2
create_gcloud_vm_docker_template Creates a folder containing the files necessary to quickly build a python docker container to run on a google cloud Virtual Machine 4
create_parallel_google_cloud_run_job_template Run a task in parallel using a Google Cloud Run job (code-generating function) 2
create_project_scope_doc Creates a basic project scope document (markdown) by prompting the user for input 3
DataBatcher Breaks a provided iterable up into batches according to a provided batching pattern 4
delete_file_in_gcloud_bucket Deletes a file which is in a google cloud bucket 4
download_file_from_gcloud_bucket_to_python Reads a file from a google cloud bucket into python memory 4
duckduckgo_search_multipage Fetches search results from the DuckDuckGo Lite search engine 2
gcloud_vm_deletes_itself Running this function on a google cloud Virtual Machine (VM) causes the VM to delete itself 4
list_all_python_imports Searches every python script in a given folder and lists all python modules imported within those scripts 2
list_files_in_gcloud_bucket Returns a list of the files present in a specified google cloud bucket 4
longest_common_substring Identifies the longest substring appearing in both strings 3
longest_sentence_subsequence_plagiarism_detector Finds phrases (sequences of consecutive words) common to 2 documents (e.g. to act as a naive plagiarism detector) 3
make_url_request A convenience function for making API requests using the urllib library 3
move_or_rename_file_in_gcloud_bucket Move or rename a file which is in a google cloud bucket (which includes moving it to a different bucket) 4
parse_mime_email_parts Extracts parts from an email that is in MIME format 2
print_progress_bar Prints a progress bar (to standard out) while code is running 3
PythonPlottingTutorials Example code snippets for creating common data visualisations in python 4
query_bigquery_to_pandas_df Runs a query on Google BigQuery and writes the result into a local pandas.DataFrame 4
RapidBinaryClassifier Ultra rapid generation of binary classifier models in scikit-learn by abstracting away a lot of the decisions and model code 3
RegexRulesClassifier A multi-class text classifier using manual regex rules 2
require_api_key A decorator adding basic API key authentication to a flask route 3
retry_function_call Retries function (if it fails) according to retry pattern 4
run_python_function_in_parallel Runs a python function in parallel on multiple cores or threads 4
scrape_webpage_and_all_linked_webpages Extracts HTML from given web page, and also follows all of the hyperlinks on that page and scrapes those too 1
StringCleaner Performs common string-cleaning operations to a text string, also allowing them to be chained in sequence 1
upload_file_python_to_gcloud_bucket Writes an object in python memory to a file (blob) on a google cloud bucket 4
url_to_filename_to_url_mapper Converts a webpage URL into a useable filename, where the URL can be recovered directly from the filename 2
view_nested_dict_structure Generates a simple printout for understanding the structure of a complex nested python dictionary 4
write_pandas_df_to_google_bigquery_table Writes a pandas dataframe to a table on Google BigQuery 4

API and Web

import joes_giant_toolbox.web

help( joes_giant_toolbox.web.anonymous_view_public_linkedin_page )
help( joes_giant_toolbox.web.duckduckgo_search_multipage )
help( joes_giant_toolbox.web.make_url_request )
help( joes_giant_toolbox.web.require_api_key )
help( joes_giant_toolbox.web.parse_mime_email_parts )
help( joes_giant_toolbox.web.scrape_webpage_and_all_linked_webpages )
help( joes_giant_toolbox.web.url_to_filename_to_url_mapper )
Name Description Confidence Score
anonymous_view_public_linkedin_page Extracts the information (HTML) from a public LinkedIn page (e.g. person or company) using a virtual browser 2
duckduckgo_search_multipage Fetches search results from the DuckDuckGo Lite search engine 2
make_url_request A convenience function for making API requests using the urllib library 3
parse_mime_email_parts Extracts parts from an email that is in MIME format 2
require_api_key A decorator adding basic API key authentication to a flask route 3
scrape_webpage_and_all_linked_webpages Extracts HTML from given web page, and also follows all of the hyperlinks on that page and scrapes those too 1
url_to_filename_to_url_mapper Converts a webpage URL into a useable filename, where the URL can be recovered directly from the filename 2

Data Visualisation

import joes_giant_toolbox.dataviz

help( joes_giant_toolbox.dataviz )

help( joes_giant_toolbox.dataviz.ascii_density_histogram )
help( joes_giant_toolbox.dataviz.PythonPlottingTutorials )
help( joes_giant_toolbox.dataviz.view_nested_dict_structure )
Name Description Confidence Score
ascii_density_histogram Draws a histogram using only raw text symbols 2
PythonPlottingTutorials Example code snippets for creating common data visualisations in python 4
view_nested_dict_structure Generates a simple printout for understanding the structure of a complex nested python dictionary 4

Google Cloud

To additionally install the package dependencies of this module:

pip install joes-giant-toolbox[google]
import joes_giant_toolbox.google_cloud

help(joes_giant_toolbox.google_cloud)

help( joes_giant_toolbox.google_cloud.create_gcloud_vm_docker_template )
help( joes_giant_toolbox.google_cloud.create_parallel_google_cloud_run_job_template )
help( joes_giant_toolbox.google_cloud.delete_file_in_gcloud_bucket )
help( joes_giant_toolbox.google_cloud.download_file_from_gcloud_bucket_to_python )
help( joes_giant_toolbox.google_cloud.gcloud_vm_deletes_itself )
help( joes_giant_toolbox.google_cloud.list_files_in_gcloud_bucket )
help( joes_giant_toolbox.google_cloud.move_or_rename_file_in_gcloud_bucket )
help( joes_giant_toolbox.google_cloud.query_bigquery_to_pandas_df )
help( joes_giant_toolbox.google_cloud.upload_file_python_to_gcloud_bucket )
help( joes_giant_toolbox.google_cloud.write_pandas_df_to_google_bigquery_table )
Name Description Confidence Score
create_gcloud_vm_docker_template Creates a folder containing the files necessary to quickly build a python docker container to run on a google cloud Virtual Machine 4
create_parallel_google_cloud_run_job_template Run a task in parallel using a Google Cloud Run job (code-generating function) 2
delete_file_in_gcloud_bucket Deletes a file which is in a google cloud bucket 4
download_file_from_gcloud_bucket_to_python Reads a file from a google cloud bucket into python memory 4
gcloud_vm_deletes_itself Running this function on a google cloud Virtual Machine (VM) causes the VM to delete itself 4
list_files_in_gcloud_bucket Returns a list of the files present in a specified google cloud bucket 4
move_or_rename_file_in_gcloud_bucket Move or rename a file which is in a google cloud bucket (which includes moving it to a different bucket) 4
query_bigquery_to_pandas_df Runs a query on Google BigQuery and writes the result into a local pandas.DataFrame 4
upload_file_python_to_gcloud_bucket Writes an object in python memory to a file (blob) on a google cloud bucket 4
write_pandas_df_to_google_bigquery_table Writes a pandas dataframe to a table on Google BigQuery 4

Project Management

import joes_giant_toolbox.proj_mgmt
help( joes_giant_toolbox.proj_mgmt.create_project_scope_doc )
Name Description Confidence Score
create_project_scope_doc Creates a basic project scope document (markdown) by prompting the user for input 3

Python Convenience Functions

import joes_giant_toolbox.convenience

help( joes_giant_toolbox.convenience.DataBatcher )
help( joes_giant_toolbox.convenience.list_all_python_imports )
help( joes_giant_toolbox.convenience.print_progress_bar )
help( joes_giant_toolbox.convenience.retry_function_call )
help( joes_giant_toolbox.convenience.run_python_function_in_parallel )
Name Description Confidence Score
DataBatcher Breaks a provided iterable up into batches according to a provided batching pattern 4
list_all_python_imports Searches every python script in a given folder and lists all python modules imported within those scripts 2
print_progress_bar Prints a progress bar (to standard out) while code is running 3
retry_function_call Retries function (if it fails) according to retry pattern 4
run_python_function_in_parallel Runs a python function in parallel on multiple cores or threads 4

Statistical Inference and Hypothesis Testing

import joes_giant_toolbox.stats

help( joes_giant_toolbox.stats )

help( joes_giant_toolbox.stats.conjugate_prior_beta_binomial )
Name Description Confidence Score
conjugate_prior_beta_binomial Calculates the posterior distribution of the success probability parameter [p] of a binomial distribution, from observed data and a user-specified beta prior 4

Statistical Modelling and Machine Learning

import joes_giant_toolbox.maths

help( joes_giant_toolbox.maths.cosine_similarity )
import joes_giant_toolbox.sklearn

help( joes_giant_toolbox.sklearn.RapidBinaryClassifier )
Name Description Confidence Score
cosine_similarity Calculates the cosine similarity between two 1-dimensional numpy arrays 2
RapidBinaryClassifier Ultra rapid generation of binary classifier models in scikit-learn by abstracting away a lot of the decisions and model code 3

Text and Natural Language Processing

import joes_giant_toolbox.text

help( joes_giant_toolbox.text )

help( joes_giant_toolbox.text.longest_common_substring )
help( joes_giant_toolbox.text.longest_sentence_subsequence_plagiarism_detector )
help( joes_giant_toolbox.text.RegexRulesClassifier )
help( joes_giant_toolbox.text.StringCleaner )
Name Description Confidence Score
longest_common_substring Identifies the longest substring appearing in both strings 3
longest_sentence_subsequence_plagiarism_detector Finds phrases (sequences of consecutive words) common to 2 documents (e.g. to act as a naive plagiarism detector) 3
RegexRulesClassifier A multi-class text classifier using manual regex rules 2
StringCleaner Performs common string-cleaning operations to a text string, also allowing them to be chained in sequence 1

Run Unit Tests

pip install pytest
cd joes_giant_toolbox/tests
pytest --verbose

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

joes_giant_toolbox-0.9.0.tar.gz (113.4 kB view details)

Uploaded Source

Built Distribution

joes_giant_toolbox-0.9.0-py3-none-any.whl (109.6 kB view details)

Uploaded Python 3

File details

Details for the file joes_giant_toolbox-0.9.0.tar.gz.

File metadata

  • Download URL: joes_giant_toolbox-0.9.0.tar.gz
  • Upload date:
  • Size: 113.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.11.2

File hashes

Hashes for joes_giant_toolbox-0.9.0.tar.gz
Algorithm Hash digest
SHA256 187458eeccd69d49637eba45145f1d5205827892759485f73a7b037499c22c57
MD5 5878b762a11565d1b25a7c3ffdf4b7b9
BLAKE2b-256 7124541aeb8bee676bff7e9c647b7c172607d43e7beb3619a4964e978e874e47

See more details on using hashes here.

File details

Details for the file joes_giant_toolbox-0.9.0-py3-none-any.whl.

File metadata

File hashes

Hashes for joes_giant_toolbox-0.9.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4a6a2d853329ce96183083af17aafdc445d54756bb52d2dc9c66373dab6ccf9e
MD5 38c3338de92a94c20762487b1694acdc
BLAKE2b-256 5aa13ac8e9122fac9c166d4d7505a6baefc619bcabc2d57920af16a595ce7293

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page