Skip to main content

Common Utility functions for development

Project description

cutility

Common utils for faster development

Installation

You can install cutility using pip:

# to update to latest version
pip install --upgrade cutility

Usage

utils

Measure Execution Time

import cutility as cu

@cu.get_exec_time
def foo():
    import time
    time.sleep(1)

foo()

Output:

Time taken to execute 'foo': 0:00:01.005044

Check Path Existence

import cutility as cu

b = cu.check_path_exist("./data/temp.txt")
print(b)

Output:

False

logger

simple logger

import cutility as cu

# Create a simple logger instance
log = cu.get_simple_logger()

# Log an information message
log.i("hello world of loggers")

# also supports warning critical debug error messages
# log.i, log.d, log.w, log.e, log.c

Output:

[2023-12-17 02:21:03,847] - [INFO] : hello world of loggers

io

read write files

Read write files. Currently supports only 3 formats:

  • text
  • json
  • yaml
import cutility as cu

# Example 1: Reading and Writing a text file
file_content = cu.read_text("./data/example_r.txt")
cu.write_text(file_content, "./data/example_w.txt")

# Example 2: Reading and Writing a JSON file
json_data = cu.read_json("./data/example_r.json")
cu.write_json(json_data, "./data/example_w.json")


# Example 3: Reading and Writing a YAML file
yaml_data = cu.read_yaml("./data/example_r.yaml")
cu.write_yaml(yaml_data, "./data/example_w.yaml")

data

dir handler

Method to standardize access to folders and configs

What is project_root?

  • Directory that holds your src folder is your project_root

What is data_root?

  • Directory that holds all your data folder is your data_root
from cutility import get_dir_handler

dirh = get_dir_handler(project_root="./", data_root="./data", verbose=True)
print(dirh.get_data_root())
print(dirh.get_project_root())

Output:

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Setting paths:
Project Root: ./
Config Path: /path/to/config
Config Files: [list of config files]
Data Root: ./data

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Cleaner

Generic cleaner

Use this snippet to collectively apply multiple cleaning functions

from cutility.cleaners import GenericSimpleTextCleaner
from typing import List, Dict, Tuple, Any, Callable

# Create an instance of GenericSimpleTextCleaner
gtc = GenericSimpleTextCleaner()

# Sample text
sample_text = """Check out this link: https://example.com. 😎 #Python @user1, sample@gmail.com 123-456-7908 #testing # python"""

# List of names for name replacement
names_list = ["John", "Doe", "Jane", "Smith"]

# Define cleaning steps
all_cleaning_steps = [
    (gtc.replace_contacts, {"repl": " {{PHONE}} "}),
    (gtc.replace_emails, {"repl": " {{EMAIL}} "}),
    (gtc.replace_names, {"names_list": names_list, "repl": " {{PERSON_NAME}} "}),
    (gtc.clean_emojis, {}),
    (gtc.clean_extra_newlines, {}),
    (gtc.clean_extra_spaces, {}),
    (gtc.clean_hashtags, {}),
    (gtc.clean_profile_handle, {}),
    (gtc.clean_punctuations_except, {"exceptions": [",", ".", "\n", "?", "}", "{"]}),
    (gtc.clean_unicode_characters, {}),
    (gtc.clean_web_links, {}),
]

# Apply text cleaning functions
output = gtc.apply_text_cleaning_functions(sample_text, all_cleaning_steps)

# Print the original and cleaned text
print(sample_text)
print()
print(output)

Simple Text cleaner

Use this snippet to individually apply simple cleaning functions

# simpler text cleaner
from cutility.cleaners import SimpleTextCleaner as stc

t = stc.clean_emojis("🌟 Sed euismod justo t semper justo. 😊")
print(t)

PII Text cleaner

Use this snippet to individually apply PII cleaning functions

from cutility.cleaners import PiiTextCleaner as ptc

t = ptc.replace_emails(
    ptc.replace_contacts(
        "My contact number is +1(123) 456 7890 and my email is email@company.com"
    )
)
print(t)

Appendix

Getting names_list

I have curated list of first names and last names

  • public github databases and compiled it here in a github gist.
  • references mentioned in the end

Use this command to get names data.

wget https://gist.githubusercontent.com/sagarsrc/e6c7361f9ba6a64b2c9ac5bb10f0285a/raw/fbcca7c6821e7aff285271a6ce42361bbe95cc0c/pii_names.json

References

[1] PII names datasets:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cutility-0.1.4.tar.gz (17.3 kB view details)

Uploaded Source

File details

Details for the file cutility-0.1.4.tar.gz.

File metadata

  • Download URL: cutility-0.1.4.tar.gz
  • Upload date:
  • Size: 17.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.4

File hashes

Hashes for cutility-0.1.4.tar.gz
Algorithm Hash digest
SHA256 943da0c836a0f28bf0fc7b4e8661be04b11155e00d62e9802cbf80f9411ca647
MD5 826cc85d15a59a9c4ee2d83f172ba59c
BLAKE2b-256 6a557cfec565814baba0ec99cc6446b8c4bdef63892f10bd7886df12204f16a5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page