Skip to main content

Common Utility functions for development

Project description

cutility

Common utils for faster development

Installation

You can install cutility using pip:

# to update to latest version
pip install --upgrade cutility

Usage

utils

Measure Execution Time

import cutility as cu

@cu.get_exec_time
def foo():
    import time
    time.sleep(1)

foo()

Output:

Time taken to execute 'foo': 0:00:01.005044

Check Path Existence

import cutility as cu

b = cu.check_path_exist("./data/temp.txt")
print(b)

Output:

False

logger

simple logger

import cutility as cu

# Create a simple logger instance
log = cu.get_simple_logger()

# Log an information message
log.i("hello world of loggers")

# also supports warning critical debug error messages
# log.i, log.d, log.w, log.e, log.c

Output:

[2023-12-17 02:21:03,847] - [INFO] : hello world of loggers

io

read write files

Read write files. Currently supports only 3 formats:

  • text
  • json
  • yaml
import cutility as cu

# Example 1: Reading and Writing a text file
file_content = cu.read_text("./data/example_r.txt")
cu.write_text(file_content, "./data/example_w.txt")

# Example 2: Reading and Writing a JSON file
json_data = cu.read_json("./data/example_r.json")
cu.write_json(json_data, "./data/example_w.json")


# Example 3: Reading and Writing a YAML file
yaml_data = cu.read_yaml("./data/example_r.yaml")
cu.write_yaml(yaml_data, "./data/example_w.yaml")

env

load_env

.env file format

PROJECT_ROOT=/path/to/src
DATA_ROOT=/path/to/data
CONFIG_PATH=/path/to/config.yml
# load env variables

import os
from cutility import load_env

ENV = load_env("./.env")
print(ENV)
print(os.getenv("DATA_ROOT"))
print(os.getenv("PROJECT_ROOT"))
print(os.getenv("CONFIG_PATH"))

Output:

True
/path/to/data
/path/to/src
/path/to/config.yml

data

dir handler

Method to standardize access to folders and configs

What is project_root?

  • Directory that holds your src folder is your project_root

What is data_root?

  • Directory that holds all your data folder is your data_root
from cutility import get_dir_handler

dirh = get_dir_handler(project_root="./", data_root="./data", verbose=True)
print(dirh.get_data_root())
print(dirh.get_project_root())

Output:

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Setting paths:
Project Root: ./
Config Path: /path/to/config
Config Files: [list of config files]
Data Root: ./data

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Cleaner

Generic cleaner

Use this snippet to collectively apply multiple cleaning functions

from cutility.cleaners import GenericSimpleTextCleaner
from typing import List, Dict, Tuple, Any, Callable

# Create an instance of GenericSimpleTextCleaner
gtc = GenericSimpleTextCleaner()

# Sample text
sample_text = """Check out this link: https://example.com. 😎 #Python @user1, sample@gmail.com 123-456-7908 #testing # python"""

# List of names for name replacement
names_list = ["John", "Doe", "Jane", "Smith"]

# Define cleaning steps
all_cleaning_steps = [
    (gtc.replace_contacts, {"repl": " {{PHONE}} "}),
    (gtc.replace_emails, {"repl": " {{EMAIL}} "}),
    (gtc.replace_names, {"names_list": names_list, "repl": " {{PERSON_NAME}} "}),
    (gtc.clean_emojis, {}),
    (gtc.clean_extra_newlines, {}),
    (gtc.clean_extra_spaces, {}),
    (gtc.clean_hashtags, {}),
    (gtc.clean_profile_handle, {}),
    (gtc.clean_punctuations_except, {"exceptions": [",", ".", "\n", "?", "}", "{"]}),
    (gtc.clean_unicode_characters, {}),
    (gtc.clean_web_links, {}),
]

# Apply text cleaning functions
output = gtc.apply_text_cleaning_functions(sample_text, all_cleaning_steps)

# Print the original and cleaned text
print(sample_text)
print()
print(output)

Simple Text cleaner

Use this snippet to individually apply simple cleaning functions

# simpler text cleaner
from cutility.cleaners import SimpleTextCleaner as stc

t = stc.clean_emojis("🌟 Sed euismod justo t semper justo. 😊")
print(t)

PII Text cleaner

Use this snippet to individually apply PII cleaning functions

from cutility.cleaners import PiiTextCleaner as ptc

t = ptc.replace_emails(
    ptc.replace_contacts(
        "My contact number is +1(123) 456 7890 and my email is email@company.com"
    )
)
print(t)

Appendix

Getting names_list

I have curated list of first names and last names

  • public github databases and compiled it here in a github gist.
  • references mentioned in the end

Use this command to get names data.

wget https://gist.githubusercontent.com/sagarsrc/e6c7361f9ba6a64b2c9ac5bb10f0285a/raw/fbcca7c6821e7aff285271a6ce42361bbe95cc0c/pii_names.json

References

[1] PII names datasets:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cutility-0.1.dev30.tar.gz (20.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cutility-0.1.dev30-py3-none-any.whl (19.7 kB view details)

Uploaded Python 3

File details

Details for the file cutility-0.1.dev30.tar.gz.

File metadata

  • Download URL: cutility-0.1.dev30.tar.gz
  • Upload date:
  • Size: 20.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.5

File hashes

Hashes for cutility-0.1.dev30.tar.gz
Algorithm Hash digest
SHA256 ecdc7b51ac9229f121555ba558c3f1ce0905bf4c4cd575b2a153df8d9fd949f3
MD5 cfb26062f1a628d7c9389ea61efb26b5
BLAKE2b-256 db80a38ef5ff7d67378087c4c66b7b1bb8fef29e74ff5e799e398f651c28f4d9

See more details on using hashes here.

File details

Details for the file cutility-0.1.dev30-py3-none-any.whl.

File metadata

  • Download URL: cutility-0.1.dev30-py3-none-any.whl
  • Upload date:
  • Size: 19.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.5

File hashes

Hashes for cutility-0.1.dev30-py3-none-any.whl
Algorithm Hash digest
SHA256 2678ba2b9a4139d86ae682aedf4828dc18364fc9036e04c11a6dab966958026d
MD5 f9e15cc0d0789f86e911544fa0b09479
BLAKE2b-256 3a58dd1ffee44028964fd73db4290be1be6d52b7010d6c370cecad245de7f1f8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page