Skip to main content

A collection of simple but powerful utilities for Data Analytics and Data Science workflows.

Project description

nimble toolkit

A collection of simple but powerful utilities for ML and Data Analytics workflows.

Installation

pip install nimble-tk

Examples

All code examples, including the following, can be found in the examples notebook

Logging

The below set of cells give some utility wrappers on python's logging functionality. The following line will create a log file at /tmp/try_nimble.log

  • As the logs get written, the log file will go up to max 50 MB in size by default and then will get rolled over.
  • At most, 10 such files are maintained before removing the earliest file.
  • Logs will also be written to the console (in this case the notebook console)
ntk.init_file_logger(log_file_path='/tmp/try_nimble.log', console_log_on=True)

Output:

2023-12-09 11:10:43.270595 :: MainThread ::  Log file init at /tmp/try_nimble.log

Example of an info log:
ntk.log_info("some log message")

Output:

2023-12-09 11:10:43.289434 :: MainThread ::  some log message

Example of an error log:
Catching an exception and logging the stack trace:
try:
    tmp = 1/0
except:
    ntk.log_traceback("Error while running operation")

Output:

2023-12-09 11:10:43.342127 :: MainThread ::  Error while running operation :: ZeroDivisionError: division by zero ::
 Traceback (most recent call last):
  File "/tmp/ipykernel_3709/4084614229.py", line 2, in <module>
    tmp = 1/0
ZeroDivisionError: division by zero

All logs are also written to the log file:
! tail -10 /tmp/try_nimble.log

Output:

2023-12-09 11:10:43,268 MainThread logger.py:88:  Log file init at /tmp/try_nimble.log
2023-12-09 11:10:43,284 MainThread logger.py:88:  some log message
2023-12-09 11:10:43,311 MainThread logger.py:88:  some error message
2023-12-09 11:10:43,334 MainThread logger.py:88:  Error while running operation :: ZeroDivisionError: division by zero ::
 Traceback (most recent call last):
  File "/tmp/ipykernel_3709/4084614229.py", line 2, in <module>
    tmp = 1/0
ZeroDivisionError: division by zero

Logs can also be directly written to the file without logging on the console. This helps in cases where we do not want to flood the console with a lot of logs.
ntk.log_info_file("some log message")
ntk.log_error_file("some error message")

try:
    tmp = 1/0
except:
    ntk.log_traceback_file("Error while running operation")

! tail -7 /tmp/try_nimble.log

Output:

2023-12-09 11:10:43,501 MainThread logger.py:88:  some log message
2023-12-09 11:10:43,505 MainThread logger.py:88:  some error message
2023-12-09 11:10:43,510 MainThread logger.py:88:   :: ZeroDivisionError: division by zero ::
 Traceback (most recent call last):
  File "/tmp/ipykernel_3709/3515687902.py", line 5, in <module>
    tmp = 1/0
ZeroDivisionError: division by zero

Concurrent Processing

The ntk.run_concurrently(...) method provides a wrapper around python's multi-processing and multi-threading functionality.

Set up a list of functions to execute:

def analytic_function(customer_id):
    ntk.log_info_file(f"Running analytic_function for customer_id: {customer_id}")
    
    if customer_id == 0:
        # raising an error here to demonstrate error handling
        raise ValueError(f"Invalid customer id {customer_id}")
        
    # - Query the DB
    # - Do feature engineering
    # - Run other analytics
    
    # - Write the result data to disk or return a DF
    return pd.DataFrame({
        'CUSTOMER_ID': [customer_id]*5,
        'OTHER_DATA': np.random.randint(0, 9, 5)
    })

functions = []
for customer_id in range(10):
    functions.append((analytic_function, {'customer_id': customer_id}))

Run the functions concurrently with the following lines of code:

results, errors = ntk.run_concurrently(functions, max_workers=ntk.get_num_cpus(), fork=True)

df_result = pd.concat([result[1] for result in results])
df_result.head(index=False)

fork=True will launch multiple processes and is recommended to be used for CPU-bound tasks.

fork=False will launch multiple threads, should to be used for IO-bound tasks.


Output:
2023-12-09 11:10:46.767042 :: MainThread ::  Ran: 10 functions - Successfull: 9, Failed: 1
CUSTOMER_ID OTHER_DATA
1 3
1 0
1 3
1 1
1 3
... ...



For more code examples, checkout the examples notebook

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nimble_tk-1.0.3.tar.gz (24.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nimble_tk-1.0.3-py3-none-any.whl (21.9 kB view details)

Uploaded Python 3

File details

Details for the file nimble_tk-1.0.3.tar.gz.

File metadata

  • Download URL: nimble_tk-1.0.3.tar.gz
  • Upload date:
  • Size: 24.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.12

File hashes

Hashes for nimble_tk-1.0.3.tar.gz
Algorithm Hash digest
SHA256 96b62586e2b859db12c66de358189170be153b6ce7e0710f04f7bcb32c5ed9de
MD5 d1908f13739410c4b61b93f67967cefb
BLAKE2b-256 963f749af167271efbed0452fb1b7c1679dc815e181f6a9878d1020aea492bb1

See more details on using hashes here.

File details

Details for the file nimble_tk-1.0.3-py3-none-any.whl.

File metadata

  • Download URL: nimble_tk-1.0.3-py3-none-any.whl
  • Upload date:
  • Size: 21.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.12

File hashes

Hashes for nimble_tk-1.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 4df4111203f6c681db9759cd0fee28b2837e933feebd33d6da3fa015f9889730
MD5 5e8ed432dd461f9ccf4e92873c01be6c
BLAKE2b-256 be3824007eb5f0b00d93254583c905627c855cba332323c15bf61e5b1907fd57

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page