A comprehensive Python utilities package with enhanced auto-discovery

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 4 - Beta
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language

Project description

Siege Utilities

A comprehensive Python utilities package with enhanced auto-discovery that automatically imports and makes all functions mutually available across modules.

✨ Key Features

🔄 Auto-Discovery: Automatically finds and imports all functions from new modules
🌐 Mutual Availability: All 500+ functions accessible from any module without imports
📝 Universal Logging: Comprehensive logging system available everywhere
🛡️ Graceful Dependencies: Optional features (PySpark, geospatial) fail gracefully
📊 Built-in Diagnostics: Monitor package health and function availability
⚡ Zero Configuration: Just import siege_utilities and everything works

🚀 Quick Start

pip install siege-utilities

import siege_utilities

# All 500+ functions are immediately available
siege_utilities.log_info("Package loaded successfully!")

# File operations
hash_value = siege_utilities.get_file_hash("myfile.txt")
siege_utilities.ensure_path_exists("data/processed")

# String utilities  
clean_text = siege_utilities.remove_wrapping_quotes_and_trim('  "hello"  ')

# Distributed computing (if PySpark available)
try:
    config = siege_utilities.create_hdfs_config("/data")
    spark, data_path = siege_utilities.setup_distributed_environment()
except NameError:
    siege_utilities.log_warning("Distributed features not available")

# Package diagnostics
info = siege_utilities.get_package_info()
print(f"Available functions: {info['total_functions']}")
print(f"Failed imports: {len(info['failed_imports'])}")

📦 What's Included

Core Utilities (`siege_utilities.core`)

Logging: Comprehensive logging with file rotation, multiple levels
String Utils: Text processing, quote removal, trimming

File Utilities (`siege_utilities.files`)

Hashing: SHA256, MD5, quick signatures, integrity verification
Operations: File existence, row counting, duplicate detection, data writing
Paths: Directory creation, zip extraction, path management
Remote: HTTP downloads with progress bars, URL-to-path conversion
Shell: Subprocess management, command execution

Distributed Computing (`siege_utilities.distributed`)

HDFS Operations: Hadoop filesystem integration, data syncing
Spark Utils: PySpark workflows, DataFrame processing, geospatial operations
Configuration: Environment setup, cluster management

Geospatial (`siege_utilities.geo`)

Geocoding: Nominatim integration, address processing, coordinate validation
Spatial Analysis: Geographic data processing, coordinate systems

🌟 Unique Auto-Discovery System

Unlike traditional packages, Siege Utilities uses an enhanced auto-discovery system:

# Traditional approach - lots of imports needed
from package.module1 import function_a
from package.module2 import function_b
from package.core.logging import log_info

def my_function():
    log_info("Starting process")
    result_a = function_a()
    result_b = function_b()

# Siege Utilities approach - everything just works
import siege_utilities

def my_function():
    log_info("Starting process")      # Available everywhere
    result_a = function_a()           # Available everywhere  
    result_b = function_b()           # Available everywhere

How It Works

Phase 1: Bootstrap core logging system
Phase 2: Import modules in dependency order
Phase 3: Auto-discover all .py files and subpackages
Phase 4: Inject all functions into all modules (mutual availability)
Phase 5: Provide comprehensive diagnostics

Result: 500+ functions accessible from anywhere with zero imports!

📊 Package Diagnostics

Monitor your package health:

import siege_utilities

# Get comprehensive package information
info = siege_utilities.get_package_info()
print(f"Total functions: {info['total_functions']}")
print(f"Loaded modules: {info['total_modules']}")
print(f"Failed imports: {info['failed_imports']}")

# List functions by pattern
log_functions = siege_utilities.list_available_functions("log_")
file_functions = siege_utilities.list_available_functions("file")

# Check dependencies
deps = siege_utilities.check_dependencies()
print(f"PySpark available: {deps['pyspark']}")
print(f"Geopy available: {deps['geopy']}")

# Get function information
func_info = siege_utilities.get_function_info("get_file_hash")
print(f"Function from module: {func_info['module']}")

🔧 Installation Options

# Basic installation
pip install siege-utilities

# With distributed computing support
pip install siege-utilities[distributed]

# With geospatial support  
pip install siege-utilities[geo]

# Full installation (all optional dependencies)
pip install siege-utilities[distributed,geo,dev]

# Development installation
git clone https://github.com/siege-analytics/siege_utilities.git
cd siege_utilities
pip install -e ".[distributed,geo,dev]"

📖 Detailed Examples

File Processing Pipeline

import siege_utilities

def process_data_files(input_dir, output_dir):
    """Complete file processing pipeline using siege utilities."""
    
    # Setup with logging
    siege_utilities.init_logger("data_processor", log_to_file=True)
    siege_utilities.log_info(f"Starting processing: {input_dir} -> {output_dir}")
    
    # Ensure output directory exists
    siege_utilities.ensure_path_exists(output_dir)
    
    # Process each file
    for file_path in pathlib.Path(input_dir).glob("*.txt"):
        if siege_utilities.check_if_file_exists_at_path(file_path):
            
            # Generate file hash for integrity
            file_hash = siege_utilities.get_file_hash(file_path)
            siege_utilities.log_info(f"Processing {file_path.name}: {file_hash}")
            
            # Count rows and check for issues
            total_rows = siege_utilities.count_total_rows_in_file_using_sed(file_path)
            empty_rows = siege_utilities.count_empty_rows_in_file_using_awk(file_path)
            
            siege_utilities.log_info(f"File stats: {total_rows} total, {empty_rows} empty")
            
            # Process and save results
            output_path = output_dir / f"processed_{file_path.name}"
            # ... your processing logic here
            
    siege_utilities.log_info("Processing complete!")

Distributed Computing Workflow

import siege_utilities

def distributed_geocoding_pipeline(data_path):
    """Distributed geocoding using Spark and HDFS."""
    
    # Check if distributed features are available
    deps = siege_utilities.check_dependencies()
    if not deps['pyspark']:
        siege_utilities.log_error("PySpark required for distributed processing")
        return
    
    # Setup distributed environment
    config = siege_utilities.create_cluster_config(data_path)
    spark, hdfs_path = siege_utilities.setup_distributed_environment(config)
    
    if spark and hdfs_path:
        siege_utilities.log_info(f"Distributed environment ready: {hdfs_path}")
        
        # Load and process data
        df = spark.read.parquet(hdfs_path)
        df = siege_utilities.sanitise_dataframe_column_names(df)
        df = siege_utilities.validate_geocode_data(df, "latitude", "longitude")
        
        # Geocoding operations
        geocoder = siege_utilities.NominatimGeoClassifier()
        # ... geocoding logic here
        
        # Save results
        output_path = hdfs_path.replace("input", "output")
        siege_utilities.write_df_to_parquet(df, output_path)
        
        spark.stop()
    else:
        siege_utilities.log_error("Failed to setup distributed environment")

🏗️ Development

Adding New Functions

Just create a new .py file anywhere in the package:

# siege_utilities/my_new_module.py

def my_awesome_function(data):
    """This function will be auto-discovered!"""
    log_info("my_awesome_function called")  # Logging available automatically
    
    # All other siege utilities functions are available
    file_hash = get_file_hash(data)  # No import needed!
    ensure_path_exists("output")     # No import needed!
    
    return f"processed_{file_hash}"

def another_function():
    """This will also be auto-discovered!"""
    return "Hello from auto-discovery!"

That's it! Next time you import siege_utilities, these functions will be automatically available:

import siege_utilities

# Your new functions are automatically available
result = siege_utilities.my_awesome_function("data.txt")
greeting = siege_utilities.another_function()

# And they're mutually available in other modules too!

Running Diagnostics

# Check package health (run from package directory)
python3 check_imports.py

# Or from Python
python3 -c "
import siege_utilities
info = siege_utilities.get_package_info()
print(f'Functions: {info[\"total_functions\"]}')
print(f'Modules: {info[\"total_modules\"]}')
print(f'Failed: {len(info[\"failed_imports\"])}')
"

🤝 Contributing

Fork the repository
Create a feature branch: git checkout -b feature-name
Add your functions to existing modules or create new ones
Test with: python3 check_imports.py
Commit changes: git commit -am 'Add new feature'
Push: git push origin feature-name
Submit a Pull Request

The auto-discovery system will automatically find and integrate your new functions!

📝 License

MIT License - see LICENSE file for details.

🙏 Acknowledgments

Built by Siege Analytics
Inspired by the need for truly seamless Python utilities
Special thanks to the auto-discovery pattern that makes this possible

Siege Utilities: Where every function is available everywhere! 🚀

Updated setup.py for PyPI

from setuptools import setup, find_packages

with open("README.md", "r", encoding="utf-8") as fh:
    long_description = fh.read()

setup(
    name="siege-utilities",
    version="1.0.0",
    author="Dheeraj Chand",
    author_email="dheeraj@siegeanalytics.com",
    description="A comprehensive Python utilities package with enhanced auto-discovery",
    long_description=long_description,
    long_description_content_type="text/markdown",
    url="https://github.com/siege-analytics/siege_utilities",
    project_urls={
        "Bug Tracker": "https://github.com/siege-analytics/siege_utilities/issues",
        "Documentation": "https://github.com/siege-analytics/siege_utilities#readme",
        "Source Code": "https://github.com/siege-analytics/siege_utilities",
    },
    classifiers=[
        "Development Status :: 4 - Beta",
        "Intended Audience :: Developers",
        "License :: OSI Approved :: MIT License",
        "Operating System :: OS Independent",
        "Programming Language :: Python :: 3",
        "Programming Language :: Python :: 3.8",
        "Programming Language :: Python :: 3.9",
        "Programming Language :: Python :: 3.10",
        "Programming Language :: Python :: 3.11",
        "Programming Language :: Python :: 3.12",
        "Topic :: Software Development :: Libraries :: Python Modules",
        "Topic :: Utilities",
    ],
    packages=find_packages(),
    python_requires=">=3.8",
    install_requires=[
        "pathlib2; python_version<'3.4'",
        "requests>=2.25.0",
        "tqdm>=4.60.0",
    ],
    extras_require={
        "distributed": [
            "pyspark>=3.0.0",
        ],
        "geo": [
            "geopy>=2.0.0",
            "apache-sedona>=1.4.0",
        ],
        "dev": [
            "pytest>=6.0.0",
            "black>=21.0.0",
            "flake8>=3.8.0",
            "twine>=3.4.0",
        ],
    },
    entry_points={
        "console_scripts": [
            "siege-utils-check=siege_utilities.check_imports:main",
        ],
    },
    keywords="utilities, auto-discovery, logging, file-operations, distributed-computing, geocoding",
    include_package_data=True,
    zip_safe=False,
)

PyPI Publishing Commands

# 1. Install build dependencies
pip install build twine

# 2. Clean previous builds
rm -rf dist/ build/ *.egg-info/

# 3. Build the package
python -m build

# 4. Check the package
twine check dist/*

# 5. Upload to Test PyPI first
twine upload --repository testpypi dist/*

# 6. Test installation from Test PyPI
pip install --index-url https://test.pypi.org/simple/ siege-utilities

# 7. If everything works, upload to real PyPI
twine upload dist/*

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 4 - Beta
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language

Release history Release notifications | RSS feed

3.17.2

May 14, 2026

3.17.1

May 14, 2026

3.17.0

May 14, 2026

3.16.0

May 13, 2026

3.15.1

May 12, 2026

3.15.0

May 11, 2026

3.14.0

May 7, 2026

3.13.2

Apr 30, 2026

3.12.0

Mar 14, 2026

3.11.0

Mar 14, 2026

3.10.0

Mar 14, 2026

3.9.1

Mar 14, 2026

3.9.0

Mar 14, 2026

3.8.4

Mar 9, 2026

3.8.3

Mar 7, 2026

3.8.2

Mar 6, 2026

3.8.1

Mar 4, 2026

3.8.0

Mar 4, 2026

3.7.0

Mar 3, 2026

3.6.0

Mar 3, 2026

3.5.0

Mar 3, 2026

3.4.1

Mar 2, 2026

3.4.0

Mar 2, 2026

3.3.3

Mar 2, 2026

3.3.2

Mar 2, 2026

3.3.1

Mar 2, 2026

3.3.0

Mar 1, 2026

3.2.0

Mar 1, 2026

3.1.0

Mar 1, 2026

3.0.1

Feb 27, 2026

3.0.0

Feb 26, 2026

2.2.0

Feb 26, 2026

2.1.0

Feb 24, 2026

2.0.0

Feb 23, 2026

This version

1.0.0

Jun 18, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

siege_utilities-1.0.0.tar.gz (40.2 kB view details)

Uploaded Jun 18, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

siege_utilities-1.0.0-py3-none-any.whl (42.6 kB view details)

Uploaded Jun 18, 2025 Python 3

File details

Details for the file siege_utilities-1.0.0.tar.gz.

File metadata

Download URL: siege_utilities-1.0.0.tar.gz
Upload date: Jun 18, 2025
Size: 40.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.11

File hashes

Hashes for siege_utilities-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`a49ac9920c48f0c320cee72a1dae08d1bf930d133fd120de01a8e090c7a35a4b`
MD5	`ed30423a7a82f646f9f7fead7d0e70bb`
BLAKE2b-256	`0f107a738ea898e3baf541d06511602aa15825b2a0ffb4ce36d2d280005c3580`

See more details on using hashes here.

File details

Details for the file siege_utilities-1.0.0-py3-none-any.whl.

File metadata

Download URL: siege_utilities-1.0.0-py3-none-any.whl
Upload date: Jun 18, 2025
Size: 42.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.11

File hashes

Hashes for siege_utilities-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4ed4f4c089679c952e2bdb61223ca90d29474cd6a341dffc95e1d1c7376f6056`
MD5	`11f57fae93f714e9de05bc7f55479a21`
BLAKE2b-256	`89caaa4a6462c0aca81f99ae221b9cae8155f3effe60e00a26abb53fc3215369`

See more details on using hashes here.

siege-utilities 1.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Siege Utilities

✨ Key Features

🚀 Quick Start

📦 What's Included

Core Utilities (siege_utilities.core)

File Utilities (siege_utilities.files)

Distributed Computing (siege_utilities.distributed)

Geospatial (siege_utilities.geo)

🌟 Unique Auto-Discovery System

How It Works

📊 Package Diagnostics

🔧 Installation Options

📖 Detailed Examples

File Processing Pipeline

Distributed Computing Workflow

🏗️ Development

Adding New Functions

Running Diagnostics

🤝 Contributing

📝 License

🙏 Acknowledgments

Updated setup.py for PyPI

PyPI Publishing Commands

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Core Utilities (`siege_utilities.core`)

File Utilities (`siege_utilities.files`)

Distributed Computing (`siege_utilities.distributed`)

Geospatial (`siege_utilities.geo`)