Skip to main content

A comprehensive collection of Python utility functions for data science, file operations, and general-purpose programming

Project description

CoreUtils-Python

A comprehensive collection of Python utility functions and modules for data science, file operations, serialization, encryption, and general-purpose programming tasks.

Python Version License Tests

Table of Contents

Overview

CoreUtils-Python is a modular collection of well-documented, tested utility functions designed to streamline common programming tasks across data science, system operations, and application development.

Key Features:

  • ๐Ÿ”ง Comprehensive Utilities - Functions, lists, strings, numbers, dictionaries
  • ๐Ÿ“Š Data Processing - pandas, NumPy, Polars, PyArrow integration
  • ๐Ÿ”’ Security - Encryption, signing, secure serialization, CSV-compatible integrity
  • ๐Ÿงช Well Tested - 418+ unit tests with pytest
  • ๐Ÿ“ Documented - NumPy-style docstrings throughout
  • โšก Performance - Optimized for large-scale data operations

Installation

Basic Installation

# Clone the repository
git clone https://github.com/Ruppert20/CoreUtils-Python.git
cd CoreUtils-Python

# Install dependencies
pip install -r requirements.txt

Requirements

Quick Start

# Import utilities
from src.generics import notnull, coalesce
from src.lists import chunk_list, flatten_list
from src.strings import convert_identifier_case
from src.numbers import extract_num, isfloat
from src.signature import SignedFile
from datetime import datetime

# Use null checking
if notnull(value):
    process(value)

# Coalesce values
result = coalesce(None, '', default_value)

# Chunk data for batch processing
for chunk in chunk_list(large_list, 100):
    process_batch(chunk)

# Convert naming conventions
camel = convert_identifier_case('user_name', 'camelCase')

# Write signed file with header metadata
header = {"version": "1.0", "created": datetime.now(), "author": "alice"}
SignedFile.write("data.bin", {"key": "value"}, header=header)

# Write CSV with integrity signature (pandas-compatible)
csv_data = b"name,age\nAlice,30\nBob,25\n"
SignedFile.write("data.csv", csv_data, signature_as_comment=True)

# Read back with verification and header
data, meta = SignedFile.read("data.bin", return_header=True)
print(f"Created by {meta['author']} on {meta['created']}")

Module Documentation

Core Utilities

generics.py

Generic utility functions for null handling and object operations.

Key Functions:

  • notnull(v) - Comprehensive null checking (None, empty containers, pd.NA, np.nan)
  • isnull(v) - Inverse of notnull
  • coalesce(*values) - Return first non-null value
  • get_name(obj) - Extract object name

๐Ÿ“ Code | ๐Ÿงช Tests | ๐Ÿ“– Documentation


functions.py

Function utilities including dynamic loading, introspection, and debugging.

Key Functions:

  • get_func(func_path) - Dynamically load functions from string paths
  • filter_kwargs(func, kwargs) - Filter kwargs to match function parameters
  • get_function_signature(func) - Extract comprehensive function metadata
  • inspect_class(cls) - Extract class properties and methods
  • is_pickleable(obj) - Check if object can be pickled

๐Ÿ“ Code | ๐Ÿงช Tests | ๐Ÿ“– Documentation


lists.py

List manipulation utilities for chunking, intersection, and flattening.

Key Functions:

  • convert_list_to_string(lst, encapsulate=False) - Convert list to comma-separated string
  • chunk_list(lst, n) - Split list into equal-sized chunks
  • list_intersection(lst1, lst2) - Find common elements preserving order
  • flatten_list(nested) - Recursively flatten nested lists

๐Ÿ“ Code | ๐Ÿงช Tests | ๐Ÿ“– Documentation


strings.py

String manipulation including case conversion, cleaning, and parsing.

Key Functions:

  • remove_illegal_characters(s, case='snake_case') - Clean strings for identifiers
  • convert_identifier_case(id, target_format) - Convert between naming conventions
  • snake_to_camel_case(s) - Convert snake_case to camelCase
  • camel_to_snake_case(s) - Convert camelCase to snake_case
  • get_file_name_components(path) - Parse file paths into components
  • tokenize_id(id_str, token_index) - Split and extract tokens from IDs

๐Ÿ“ Code | ๐Ÿงช Tests | ๐Ÿ“– Documentation


numbers.py

Numerical operations, extraction, and validation.

Key Functions:

  • extract_num(input_str, return_pos=0) - Extract numbers from strings
  • isfloat(value) - Check if value can be converted to float
  • convert_to_comma_seperated_integer_list(val) - Convert to comma-separated integers

๐Ÿ“ Code | ๐Ÿงช Tests | ๐Ÿ“– Documentation


dictionaries.py

Dictionary utilities for pandas aggregation operations.

Key Functions:

  • create_aggregation_dict(col_action_dict, start_col, end_col) - Create pandas groupby aggregation dictionaries

๐Ÿ“ Code | ๐Ÿงช Tests | ๐Ÿ“– Documentation


git.py

Git repository metadata extraction.

Key Functions:

  • get_git_metadata() - Extract comprehensive git repository information

๐Ÿ“ Code | ๐Ÿ“– Documentation


Data Processing

core_types.py

Cross-library type classification and detection system.

Key Features:

  • CoreDataType enum - Universal type classification
  • Type detection from objects and strings
  • Support for pandas, NumPy, Polars, PyArrow
  • String representation parsing (JSON, XML, UUID, dates)

๐Ÿ“ Code | ๐Ÿ“– Documentation


iterables.py

Memory profiling and object analysis utilities.

Key Functions:

  • deep_stats(obj) - Calculate deep memory size with cycle detection
  • find_large_objects(obj, threshold_kb) - Identify memory-intensive objects

๐Ÿ“ Code | ๐Ÿ“– Documentation


serialization.py

Extended serialization with multi-format support (JSON, YAML, CBOR, Pickle).

Key Features:

  • XSer class - Destination-aware serialization
  • Automatic fallback chain: Structured โ†’ CBOR โ†’ Pickle
  • NumPy array support
  • HDF5 and Parquet metadata support

๐Ÿ“ Code | ๐Ÿ“– Documentation


enhanced_logging.py

Advanced logging with emoji support, progress bars, and structured output.

Key Features:

  • Enhanced logger with emoji integration
  • Progress bar support
  • Structured logging for metrics
  • Context managers for scoped logging

๐Ÿ“ Code | ๐Ÿ“– Documentation


parrallelization.py

Parallel processing utilities with comprehensive error handling.

Key Features:

  • ParallelProcessor class
  • Support for serial, thread-based, and process-based execution
  • Metrics collection and reporting
  • Integration with enhanced logging

๐Ÿ“ Code | ๐Ÿ“– Documentation


Security & Encryption

encrypt.py

Encryption utilities using Fernet symmetric encryption.

Key Features:

  • Encryptor class for data encryption/decryption
  • CryptoYAML for encrypted YAML configuration files
  • Key generation and management

๐Ÿ“ Code | ๐Ÿงช Tests | ๐Ÿ“– Documentation


signature.py

Atomic file writing with cryptographic integrity verification, encryption, and metadata support.

Key Features:

  • SignedFile class for signed file operations
  • SHA-256/HMAC-SHA256 signatures with integrity verification
  • Optional Fernet encryption with authenticated HMAC
  • Python object serialization (via XSer) - auto-serializes dicts, lists, numpy, datetime
  • Optional header metadata - Store version info, timestamps, and structured metadata
  • CSV-compatible commented signatures - Write # comment signatures for pandas/Excel compatibility
  • Atomic writes with platform-independent fsync
  • Chunked reading for large files

๐Ÿ“ Code | ๐Ÿงช Tests | ๐Ÿ“– Documentation


File Operations

search.py

Flexible file search utilities with pattern matching and filtering.

Key Features:

  • FileSearcher class for advanced file searching
  • Pattern matching with regex support
  • File type filtering and exclusion patterns
  • Recursive and non-recursive search modes

๐Ÿ“ Code | ๐Ÿงช Tests | ๐Ÿ“– Documentation


Testing

debugging.py

Testing utilities for random data generation.

Key Functions:

  • generate_random_sequence(dtype, n, percent_null, seed) - Generate deterministic test data
  • Random generators for all common data types (TEXT, UUID, INTEGER, FLOAT, DATE, JSON, XML, etc.)
  • debug_print(*args) - Print debug output with visual separators

๐Ÿ“ Code | ๐Ÿงช Tests | ๐Ÿ“– Documentation


Running Tests

All tests use pytest and follow the test_*.py naming convention.

Run All Tests

cd UNIT_TESTS
python run_all_tests.py

Run with Verbose Output

python run_all_tests.py -v

Run with Coverage

python run_all_tests.py --coverage

Run Specific Tests

# Run tests matching a pattern
python run_all_tests.py -k test_generics

# Run a specific test file
pytest test_functions.py -v

# Run a specific test class
pytest test_functions.py::TestGetFunc -v

# Run a specific test method
pytest test_functions.py::TestGetFunc::test_get_builtin_function -v

Test Statistics

  • Total Tests: 223+
  • Coverage: Comprehensive coverage of public APIs
  • Frameworks: pytest (supports both pytest and unittest styles)
  • Status: โœ… All tests passing

๐Ÿ“– View Test Documentation | ๐Ÿ“Š View Test Summary


Requirements

Core Dependencies

numpy>=2.3.2          # Numerical computing
pandas>=2.2.3         # Data manipulation

Serialization

cbor2>=5.7.0          # CBOR encoding
PyYAML>=6.0.2         # YAML support

Security

cryptography>=45.0.7  # Encryption and signing

Testing

pytest>=8.4.2         # Test framework
pytest-cov>=4.1.0     # Coverage plugin

๐Ÿ“– View Full Requirements


Project Structure

CoreUtils-Python/
โ”œโ”€โ”€ src/                          # Source modules
โ”‚   โ”œโ”€โ”€ core_types.py            # Type classification system
โ”‚   โ”œโ”€โ”€ debugging.py             # Testing and debugging utilities
โ”‚   โ”œโ”€โ”€ dictionaries.py          # Dictionary operations
โ”‚   โ”œโ”€โ”€ encrypt.py               # Encryption utilities
โ”‚   โ”œโ”€โ”€ encrypted_signature.py  # Combined encryption + signing
โ”‚   โ”œโ”€โ”€ enhanced_logging.py     # Advanced logging
โ”‚   โ”œโ”€โ”€ functions.py            # Function utilities
โ”‚   โ”œโ”€โ”€ generics.py             # Generic utilities
โ”‚   โ”œโ”€โ”€ git.py                  # Git metadata
โ”‚   โ”œโ”€โ”€ iterables.py            # Memory profiling
โ”‚   โ”œโ”€โ”€ lists.py                # List operations
โ”‚   โ”œโ”€โ”€ numbers.py              # Numerical utilities
โ”‚   โ”œโ”€โ”€ parrallelization.py     # Parallel processing
โ”‚   โ”œโ”€โ”€ search.py               # Search utilities
โ”‚   โ”œโ”€โ”€ serialization.py        # Extended serialization
โ”‚   โ”œโ”€โ”€ signature.py            # File signing
โ”‚   โ””โ”€โ”€ strings.py              # String manipulation
โ”‚
โ”œโ”€โ”€ UNIT_TESTS/                  # Test suite
โ”‚   โ”œโ”€โ”€ test_*.py               # Test modules (223+ tests)
โ”‚   โ”œโ”€โ”€ run_all_tests.py        # Test runner
โ”‚   โ”œโ”€โ”€ README.md               # Test documentation
โ”‚   โ””โ”€โ”€ TEST_SUMMARY.md         # Test results summary
โ”‚
โ”œโ”€โ”€ requirements.txt             # Project dependencies
โ””โ”€โ”€ README.md                    # This file

Contributing

Contributions are welcome! Please follow these guidelines:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Write tests for new functionality
  4. Ensure all tests pass (python run_all_tests.py)
  5. Follow existing code style (NumPy-style docstrings)
  6. Commit changes (git commit -m 'Add amazing feature')
  7. Push to branch (git push origin feature/amazing-feature)
  8. Open a Pull Request

Code Style

  • NumPy-style docstrings for all functions and classes
  • Type hints where appropriate
  • Comprehensive test coverage
  • Clear, descriptive variable names

License

This project is licensed under the MIT License - see the LICENSE file for details.


Author

@Ruppert20


AI Authorship Disclaimer

This package was developed with the assistance of LLM-based coding tools (Claude Code by Anthropic). AI tools were used for the following activities:

  • Code authorship - Implementation of utilities, functions, and classes
  • Test development - Creation of comprehensive unit tests
  • Documentation - Generation of NumPy-style docstrings and README content
  • Code review - Identification of bugs, edge cases, and improvements

Users should evaluate the code for their specific use cases and report any issues through the GitHub issue tracker.


Acknowledgments

  • Built with modern Python 3.13.2+
  • Integrates with pandas, NumPy, Polars, and PyArrow
  • Inspired by the need for clean, reusable utility functions
  • Comprehensive testing ensures reliability
  • Developed with assistance from Claude Code (Anthropic)

Quick Links


Made with โค๏ธ for the Python community

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

coreutilities-0.0.5.tar.gz (79.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

coreutilities-0.0.5-py3-none-any.whl (85.2 kB view details)

Uploaded Python 3

File details

Details for the file coreutilities-0.0.5.tar.gz.

File metadata

  • Download URL: coreutilities-0.0.5.tar.gz
  • Upload date:
  • Size: 79.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for coreutilities-0.0.5.tar.gz
Algorithm Hash digest
SHA256 175aab4740aebd81613bd1bb623d92c92cecc941e9b379caa3fc7deb3223176a
MD5 5392492dce87c533e6e465528b76ad0f
BLAKE2b-256 5904864399c1929e6d8b51574b95ef33a1692cf82d77602f4604f94c0a6b0171

See more details on using hashes here.

Provenance

The following attestation bundles were made for coreutilities-0.0.5.tar.gz:

Publisher: release.yaml on ruppert20/CoreUtils-Python

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file coreutilities-0.0.5-py3-none-any.whl.

File metadata

  • Download URL: coreutilities-0.0.5-py3-none-any.whl
  • Upload date:
  • Size: 85.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for coreutilities-0.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 e6f80a61700bdcc9aa3dbc0d2d6c71b199daf5a8499cacf4bcecdc6f6691a57c
MD5 f0166e7fa3eef1919bd97b9522e987b4
BLAKE2b-256 63fb9aac49483fb85d56c0e629af18bf3878e1b3df3acb7e6873fe78eaa2424e

See more details on using hashes here.

Provenance

The following attestation bundles were made for coreutilities-0.0.5-py3-none-any.whl:

Publisher: release.yaml on ruppert20/CoreUtils-Python

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page