A comprehensive collection of Python utility functions for data science, file operations, and general-purpose programming
Project description
CoreUtils-Python
A comprehensive collection of Python utility functions and modules for data science, file operations, serialization, encryption, and general-purpose programming tasks.
Table of Contents
- Overview
- Installation
- Quick Start
- Module Documentation
- Running Tests
- Requirements
- Contributing
- License
Overview
CoreUtils-Python is a modular collection of well-documented, tested utility functions designed to streamline common programming tasks across data science, system operations, and application development.
Key Features:
- ๐ง Comprehensive Utilities - Functions, lists, strings, numbers, dictionaries
- ๐ Data Processing - pandas, NumPy, Polars, PyArrow integration
- ๐ Security - Encryption, signing, secure serialization, CSV-compatible integrity
- ๐งช Well Tested - 418+ unit tests with pytest
- ๐ Documented - NumPy-style docstrings throughout
- โก Performance - Optimized for large-scale data operations
Installation
Basic Installation
# Clone the repository
git clone https://github.com/Ruppert20/CoreUtils-Python.git
cd CoreUtils-Python
# Install dependencies
pip install -r requirements.txt
Requirements
- Python 3.12+
- numpy >= 2.3.0
- pandas >= 2.3.0
- PyYAML >= 6.0.2
- cryptography >= 45.0.7
- tqdm >= 4.67.0
Optional Dependencies
# Install with optional dependencies
pip install "CoreUtilities[optional]"
# Install with development tools
pip install "CoreUtilities[dev]"
[dev]: black >= 24.0.0, mypy >= 1.8.0, flake8 >= 7.0.0, pytest >= 7.4.0, pytest-cov >= 4.1.0[optional]: polars >= 1.33.0, pyarrow >= 21.0.0
Quick Start
# Import utilities
from src.generics import notnull, coalesce
from src.lists import chunk_list, flatten_list
from src.strings import convert_identifier_case
from src.numerics import extract_num, isfloat
from src.signature import SignedFile
from datetime import datetime
# Use null checking
if notnull(value):
process(value)
# Coalesce values
result = coalesce(None, '', default_value)
# Chunk data for batch processing
for chunk in chunk_list(large_list, 100):
process_batch(chunk)
# Convert naming conventions
camel = convert_identifier_case('user_name', 'camelCase')
# Write signed file with header metadata
header = {"version": "1.0", "created": datetime.now(), "author": "alice"}
SignedFile.write("data.bin", {"key": "value"}, header=header)
# Write CSV with integrity signature (pandas-compatible)
csv_data = b"name,age\nAlice,30\nBob,25\n"
SignedFile.write("data.csv", csv_data, signature_as_comment=True)
# Read back with verification and header
data, meta = SignedFile.read("data.bin", return_header=True)
print(f"Created by {meta['author']} on {meta['created']}")
Module Documentation
Core Utilities
generics.py
Generic utility functions for null handling and object operations.
Key Functions:
notnull(v)- Comprehensive null checking (None, empty containers, pd.NA, np.nan)isnull(v)- Inverse of notnullcoalesce(*values)- Return first non-null valueget_name(obj)- Extract object name
๐ Code | ๐งช Tests | ๐ Documentation
functions.py
Function utilities including dynamic loading, introspection, and debugging.
Key Functions:
get_func(func_path)- Dynamically load functions from string pathsfilter_kwargs(func, kwargs)- Filter kwargs to match function parametersget_function_signature(func)- Extract comprehensive function metadatainspect_class(cls)- Extract class properties and methodsis_pickleable(obj)- Check if object can be pickled
๐ Code | ๐งช Tests | ๐ Documentation
lists.py
List manipulation utilities for chunking, intersection, and flattening.
Key Functions:
convert_list_to_string(lst, encapsulate=False)- Convert list to comma-separated stringchunk_list(lst, n)- Split list into equal-sized chunkslist_intersection(lst1, lst2)- Find common elements preserving orderflatten_list(nested)- Recursively flatten nested lists
๐ Code | ๐งช Tests | ๐ Documentation
strings.py
String manipulation including case conversion, cleaning, and parsing.
Key Functions:
remove_illegal_characters(s, case='snake_case')- Clean strings for identifiersconvert_identifier_case(id, target_format)- Convert between naming conventionssnake_to_camel_case(s)- Convert snake_case to camelCasecamel_to_snake_case(s)- Convert camelCase to snake_caseget_file_name_components(path)- Parse file paths into componentstokenize_id(id_str, token_index)- Split and extract tokens from IDs
๐ Code | ๐งช Tests | ๐ Documentation
numerics.py
Numerical operations, extraction, and validation.
Key Functions:
extract_num(input_str, return_pos=0)- Extract numbers from stringsisfloat(value)- Check if value can be converted to floatconvert_to_comma_seperated_integer_list(val)- Convert to comma-separated integers
๐ Code | ๐งช Tests | ๐ Documentation
dictionaries.py
Dictionary utilities for pandas aggregation operations.
Key Functions:
create_aggregation_dict(col_action_dict, start_col, end_col)- Create pandas groupby aggregation dictionaries
๐ Code | ๐งช Tests | ๐ Documentation
git.py
Git repository metadata extraction.
Key Functions:
get_git_metadata()- Extract comprehensive git repository information
๐ Code | ๐ Documentation
Data Processing
core_types.py
Cross-library type classification and detection system.
Key Features:
CoreDataTypeenum - Universal type classification- Type detection from objects and strings
- Support for pandas, NumPy, Polars, PyArrow
- String representation parsing (JSON, XML, UUID, dates)
๐ Code | ๐ Documentation
iterables.py
Memory profiling and object analysis utilities.
Key Functions:
deep_stats(obj)- Calculate deep memory size with cycle detectionfind_large_objects(obj, threshold_kb)- Identify memory-intensive objects
๐ Code | ๐ Documentation
serialization.py
Extended serialization with multi-format support (JSON, YAML, CBOR, Pickle).
Key Features:
- XSer class - Destination-aware serialization
- Automatic fallback chain: Structured โ CBOR โ Pickle
- NumPy array support
- HDF5 and Parquet metadata support
๐ Code | ๐ Documentation
enhanced_logging.py
Advanced logging with emoji support, progress bars, and structured output.
Key Features:
- Enhanced logger with emoji integration
- Progress bar support
- Structured logging for metrics
- Context managers for scoped logging
๐ Code | ๐ Documentation
parrallelization.py
Parallel processing utilities with comprehensive error handling.
Key Features:
- ParallelProcessor class
- Support for serial, thread-based, and process-based execution
- Metrics collection and reporting
- Integration with enhanced logging
๐ Code | ๐ Documentation
Security & Encryption
encrypt.py
Encryption utilities using Fernet symmetric encryption.
Key Features:
- Encryptor class for data encryption/decryption
- CryptoYAML for encrypted YAML configuration files
- Key generation and management
๐ Code | ๐งช Tests | ๐ Documentation
signature.py
Atomic file writing with cryptographic integrity verification, encryption, and metadata support.
Key Features:
- SignedFile class for signed file operations
- SHA-256/HMAC-SHA256 signatures with integrity verification
- Optional Fernet encryption with authenticated HMAC
- Python object serialization (via XSer) - auto-serializes dicts, lists, numpy, datetime
- Optional header metadata - Store version info, timestamps, and structured metadata
- CSV-compatible commented signatures - Write
#comment signatures for pandas/Excel compatibility - Atomic writes with platform-independent fsync
- Chunked reading for large files
๐ Code | ๐งช Tests | ๐ Documentation
File Operations
search.py
Flexible file search utilities with pattern matching and filtering.
Key Features:
- FileSearcher class for advanced file searching
- Pattern matching with regex support
- File type filtering and exclusion patterns
- Recursive and non-recursive search modes
๐ Code | ๐งช Tests | ๐ Documentation
Testing
debugging.py
Testing utilities for random data generation.
Key Functions:
generate_random_sequence(dtype, n, percent_null, seed)- Generate deterministic test data- Random generators for all common data types (TEXT, UUID, INTEGER, FLOAT, DATE, JSON, XML, etc.)
debug_print(*args)- Print debug output with visual separators
๐ Code | ๐งช Tests | ๐ Documentation
Running Tests
All tests use pytest and follow the test_*.py naming convention.
Run All Tests
cd UNIT_TESTS
python run_all_tests.py
Run with Verbose Output
python run_all_tests.py -v
Run with Coverage
python run_all_tests.py --coverage
Run Specific Tests
# Run tests matching a pattern
python run_all_tests.py -k test_generics
# Run a specific test file
pytest test_functions.py -v
# Run a specific test class
pytest test_functions.py::TestGetFunc -v
# Run a specific test method
pytest test_functions.py::TestGetFunc::test_get_builtin_function -v
Test Statistics
- Total Tests: 223+
- Coverage: Comprehensive coverage of public APIs
- Frameworks: pytest (supports both pytest and unittest styles)
- Status: โ All tests passing
๐ View Test Documentation | ๐ View Test Summary
Requirements
Core Dependencies
numpy>=2.3.2 # Numerical computing
pandas>=2.2.3 # Data manipulation
Serialization
cbor2>=5.7.0 # CBOR encoding
PyYAML>=6.0.2 # YAML support
Security
cryptography>=45.0.7 # Encryption and signing
Testing
pytest>=8.4.2 # Test framework
pytest-cov>=4.1.0 # Coverage plugin
Project Structure
CoreUtils-Python/
โโโ src/ # Source modules
โ โโโ core_types.py # Type classification system
โ โโโ debugging.py # Testing and debugging utilities
โ โโโ dictionaries.py # Dictionary operations
โ โโโ encrypt.py # Encryption utilities
โ โโโ encrypted_signature.py # Combined encryption + signing
โ โโโ enhanced_logging.py # Advanced logging
โ โโโ functions.py # Function utilities
โ โโโ generics.py # Generic utilities
โ โโโ git.py # Git metadata
โ โโโ iterables.py # Memory profiling
โ โโโ lists.py # List operations
โ โโโ numerics.py # Numerical utilities
โ โโโ parrallelization.py # Parallel processing
โ โโโ search.py # Search utilities
โ โโโ serialization.py # Extended serialization
โ โโโ signature.py # File signing
โ โโโ strings.py # String manipulation
โ
โโโ UNIT_TESTS/ # Test suite
โ โโโ test_*.py # Test modules (223+ tests)
โ โโโ run_all_tests.py # Test runner
โ โโโ README.md # Test documentation
โ โโโ TEST_SUMMARY.md # Test results summary
โ
โโโ requirements.txt # Project dependencies
โโโ README.md # This file
Contributing
Contributions are welcome! Please follow these guidelines:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Write tests for new functionality
- Ensure all tests pass (
python run_all_tests.py) - Follow existing code style (NumPy-style docstrings)
- Commit changes (
git commit -m 'Add amazing feature') - Push to branch (
git push origin feature/amazing-feature) - Open a Pull Request
Code Style
- NumPy-style docstrings for all functions and classes
- Type hints where appropriate
- Comprehensive test coverage
- Clear, descriptive variable names
Releasing a New Version
To publish a new release to PyPI and GitHub:
- Ensure all tests pass on the
mainbranch - Create a git tag with the
vprefix following semantic versioning:git tag v0.1.0 # stable release git tag v0.1.0a1 # alpha release git tag v0.1.0b1 # beta release git tag v0.1.0rc1 # release candidate
- Push the tag to trigger the release workflow:
git push origin v0.1.0
- The release workflow will automatically:
- Run tests against Python 3.12, 3.13, and 3.14
- Build the package
- Publish to PyPI
- Create a GitHub release with release notes
License
This project is licensed under the MIT License - see the LICENSE file for details.
Author
@Ruppert20
AI Authorship Disclaimer
This package was developed with the assistance of LLM-based coding tools (Claude Code by Anthropic). AI tools were used for the following activities:
- Code authorship - Implementation of utilities, functions, and classes
- Test development - Creation of comprehensive unit tests
- Documentation - Generation of NumPy-style docstrings and README content
- Code review - Identification of bugs, edge cases, and improvements
Users should evaluate the code for their specific use cases and report any issues through the GitHub issue tracker.
Acknowledgments
- Built with modern Python 3.12+
- Integrates with pandas, NumPy, Polars, and PyArrow
- Inspired by the need for clean, reusable utility functions
- Comprehensive testing ensures reliability
- Developed with assistance from Claude Code (Anthropic)
Quick Links
Made with โค๏ธ for the Python community
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file coreutilities-0.0.81.tar.gz.
File metadata
- Download URL: coreutilities-0.0.81.tar.gz
- Upload date:
- Size: 80.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ee2c34224c98fe750d806a72340f0723599bd31278e76a9b407b3fa3cbaad315
|
|
| MD5 |
9716c51b1e590449949da66118c83791
|
|
| BLAKE2b-256 |
67a3aeff9cb35aa90542109ecd86f7199952b032eb002c147727040225fffe8b
|
Provenance
The following attestation bundles were made for coreutilities-0.0.81.tar.gz:
Publisher:
release.yaml on ruppert20/CoreUtils-Python
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
coreutilities-0.0.81.tar.gz -
Subject digest:
ee2c34224c98fe750d806a72340f0723599bd31278e76a9b407b3fa3cbaad315 - Sigstore transparency entry: 835444706
- Sigstore integration time:
-
Permalink:
ruppert20/CoreUtils-Python@d234a44fe99cf9b3ae44351f88a984ed0712307b -
Branch / Tag:
refs/tags/v0.0.81 - Owner: https://github.com/ruppert20
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yaml@d234a44fe99cf9b3ae44351f88a984ed0712307b -
Trigger Event:
push
-
Statement type:
File details
Details for the file coreutilities-0.0.81-py3-none-any.whl.
File metadata
- Download URL: coreutilities-0.0.81-py3-none-any.whl
- Upload date:
- Size: 86.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6d99338eb14447e85d148ac7c3cd3b896c1971d5589b2a474c35aa109d2d415f
|
|
| MD5 |
b4a32a9db9823a2ea4910ecf237fa009
|
|
| BLAKE2b-256 |
c4a165e3ed3c620ed32ad0abf78f4706929bd75329fe727ad7c3d1459b4e7bab
|
Provenance
The following attestation bundles were made for coreutilities-0.0.81-py3-none-any.whl:
Publisher:
release.yaml on ruppert20/CoreUtils-Python
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
coreutilities-0.0.81-py3-none-any.whl -
Subject digest:
6d99338eb14447e85d148ac7c3cd3b896c1971d5589b2a474c35aa109d2d415f - Sigstore transparency entry: 835444712
- Sigstore integration time:
-
Permalink:
ruppert20/CoreUtils-Python@d234a44fe99cf9b3ae44351f88a984ed0712307b -
Branch / Tag:
refs/tags/v0.0.81 - Owner: https://github.com/ruppert20
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yaml@d234a44fe99cf9b3ae44351f88a984ed0712307b -
Trigger Event:
push
-
Statement type: