File type identification and validation for document processing workflows

These details have not been verified by PyPI

Project links

Project description

Document Processing Hub

A Python library for searching, copying, and moving document files intelligently. Automatically find the latest file of a specific type and perform operations on it.

🌟 Features

Smart File Search: Find the latest file of a specific type in predefined locations or custom folders
Intelligent Sorting: Files are ordered by relevance (date, version, etc.) per type
Copy & Move Operations: Perform file operations with simple method chaining
Two Search Modes:
- search_file.exists - Search in predefined system locations
- search_file.local - Search in user-specified folders
Multiple File Types: Support for ZJ, Manpower Budget, Manpower Documents, Manpower, Real-time Production
Zero Dependencies: Pure Python implementation with no external dependencies

📦 Installation

From PyPI

pip install documentprocessinghub-ljd

From GitHub

git clone https://github.com/LJD-UwU/Document-Processing-Hub.git
cd Document-Processing-Hub
pip install .

🚀 Quick Start

from documentprocessinghub import search_file

# Find latest file in predefined locations
ruta = search_file.exists.manpower_budget()
print(f"Found: {ruta}")

# Find and copy to another location
resultado = search_file.exists.manpower_budget(r"C:\Backup").copy()
print(f"Copied to: {resultado}")

# Find in local folder and move to another
resultado = search_file.local.zj(r"C:\Documentos", r"C:\Procesados").move()
print(f"Moved to: {resultado}")

🔍 API Reference: `search_file`

Overview

The search_file API provides two modes for finding and manipulating files:

search_file.exists - Search in predefined system locations (fast & automatic)
search_file.local - Search in folders you specify (flexible & explicit)

Return types:

Without destination: Returns str (file path)
With destination: Returns FileResult object (for .copy() or .move())

Mode 1: `search_file.exists` - Predefined Locations

Only manpower_budget is available in exists mode.

Example 1: Get File Path Only

from documentprocessinghub import search_file

# Returns the path of the latest Manpower Budget file found in predefined locations
ruta = search_file.exists.manpower_budget()

if ruta:
    print(f"Latest file: {ruta}")
    # Output: C:\Sistema\Archivos\Manpower Budget Rev 18.2.xlsx
else:
    print("No file found")

# Use it in your code
procesar_archivo(ruta)

Example 2: Find and Copy

# Find latest file and copy it (original remains in place)
backup_path = search_file.exists.manpower_budget(r"C:\Backups").copy()

print(f"Backed up to: {backup_path}")
# Output: C:\Backups\Manpower Budget Rev 18.2.xlsx

# Practical use: Daily backup
from datetime import date
today = date.today().strftime("%Y%m%d")
backup_folder = f"C:\\Backups\\{today}"
search_file.exists.manpower_budget(backup_folder).copy()

Example 3: Find and Move

# Find latest file and move it to another location
procesado_path = search_file.exists.manpower_budget(r"C:\Procesados").move()

print(f"Moved to: {procesado_path}")
# Output: C:\Procesados\Manpower Budget Rev 18.2.xlsx

# Note: The file is removed from original location

Mode 2: `search_file.local` - Custom Folders

All file types are available in local mode:

manpower_budget
manpower_documents
zj
manpower
real_time_production

Example 1: Get File Path from Local Folder

# Search in a specific folder and get the latest file
ruta = search_file.local.zj(r"C:\Documentos")

if ruta:
    print(f"Latest ZJ file: {ruta}")
    # Output: C:\Documentos\ZJ26042912-8105.xlsx
else:
    print("No ZJ files found in folder")

# Multiple searches
zj_file = search_file.local.zj(r"C:\Docs")
budget_file = search_file.local.manpower_budget(r"C:\Docs")
docs_file = search_file.local.manpower_documents(r"C:\Docs")

print(f"ZJ: {zj_file}")
print(f"Budget: {budget_file}")
print(f"Documents: {docs_file}")

Example 2: Search Local and Copy

# Find in one folder and copy to another
copia_path = search_file.local.manpower_budget(
    r"C:\Documentos\Entrada",
    r"C:\Documentos\Copia"
).copy()

print(f"Copied to: {copia_path}")
# Output: C:\Documentos\Copia\Manpower Budget Rev 18.1.xlsx

# Practical use: Process and backup
search_file.local.zj(
    r"C:\Entrada",
    r"C:\Backup"
).copy()  # Backup before processing

Example 3: Search Local and Move

# Find in source folder and move to destination
procesado_path = search_file.local.manpower_documents(
    r"C:\Documentos\Entrada",
    r"C:\Documentos\Procesados"
).move()

print(f"Moved to: {procesado_path}")
# Output: C:\Documentos\Procesados\Manpower Documents Q1 2026.xlsx

# Practical use: Processing pipeline
for carpeta_entrada in [r"C:\Q1", r"C:\Q2", r"C:\Q3"]:
    resultado = search_file.local.manpower_documents(
        carpeta_entrada,
        r"C:\Procesados"
    ).move()
    if resultado:
        print(f"Procesado: {resultado}")

Example 4: Process Multiple File Types

# Search for different types in the same folder
origen = r"C:\Documentos"
destino = r"C:\Procesados"

# Process each type
zj = search_file.local.zj(origen, destino).move()
budget = search_file.local.manpower_budget(origen, destino).move()
docs = search_file.local.manpower_documents(origen, destino).move()

# Log results
if zj:
    print(f"ZJ: {zj}")
if budget:
    print(f"Budget: {budget}")
if docs:
    print(f"Documents: {docs}")

File Selection Criteria

The "latest" file is selected based on the file type:

ZJ Files: By date, version, and duplicate count

ZJ26042912-8105(4) > ZJ26042912-8105(1) > ZJ26042822-8005
     ↑ newer        ↑ same date         ↑ older date
                    ↑ higher duplicate

MANPOWER_BUDGET: By version and month

Rev 18.2 > Rev 18.1 > Rev 17.0 April
 ↑ newer   ↑ same major version

MANPOWER_DOCUMENTS: By year, month, and quarter

2026 Q2 > 2026 Q1 > 2025 Q4
↑ newer   ↑ newer in same year

MANPOWER: By month and day

April_29 > April_28 > March_28
 ↑ newer    ↑ newer in month

REAL_TIME_PRODUCTION: By modification date (most recent first)

💡 Common Patterns

Pattern 1: Daily Backup

from documentprocessinghub import search_file
from datetime import date

def daily_backup():
    today = date.today().strftime("%Y%m%d")
    backup_folder = f"C:\\Backups\\{today}"
    
    ruta = search_file.exists.manpower_budget(backup_folder).copy()
    if ruta:
        print(f"✓ Backup successful: {ruta}")
    else:
        print("✗ No file found to backup")

# Run daily
daily_backup()

Pattern 2: Processing Pipeline

from documentprocessinghub import search_file

def process_documents():
    entrada = r"C:\Entrada"
    procesados = r"C:\Procesados"
    
    # Process each type
    for tipo in ["zj", "manpower_budget", "manpower_documents"]:
        # Get the function dynamically
        search_func = getattr(search_file.local, tipo)
        
        resultado = search_func(entrada, procesados).move()
        if resultado:
            print(f"Procesado ({tipo}): {resultado}")

process_documents()

Pattern 3: Safe Backup Before Processing

from documentprocessinghub import search_file

def safe_process(carpeta_entrada):
    # Step 1: Backup the file
    backup = search_file.local.manpower_budget(
        carpeta_entrada,
        r"C:\Backup"
    ).copy()
    
    if not backup:
        print("✗ Error: No file found")
        return
    
    # Step 2: Process the original
    procesado = search_file.local.manpower_budget(
        carpeta_entrada,
        r"C:\Procesados"
    ).move()
    
    print(f"✓ Backed up: {backup}")
    print(f"✓ Processed: {procesado}")

safe_process(r"C:\Entrada")

🏗️ Project Structure

document-processing-hub/
├── documentprocessinghub/          # Main package
│   ├── __init__.py                # Package initialization
│   ├── fileSelector.py            # search_file API implementation
│   ├── scanNameFiles.py           # File type identification
│   ├── validators.py              # Format validation
│   └── paths_config.py            # Predefined paths configuration
├── examples/                       # Usage examples
│   ├── main.py                    # Interactive examples
│   └── USAGE.md                   # Detailed usage guide
├── pyproject.toml                 # Project configuration
├── README.md                       # This file
├── LICENSE                         # MIT License
└── .gitignore                      # Git ignore rules

📝 Changelog

Version 0.4.0 (2026-04-30)

Major Changes

Renamed API: find_latest_file → search_file (clearer intent)
Simplified behavior:
- Without destination: Returns str (file path)
- With destination: Returns FileResult for .copy() or .move()
Restricted exists mode: Only manpower_budget available in search_file.exists
Enhanced documentation: Complete docstrings for IDE support
All types in local: All file types available in search_file.local

Version 0.3.1 (2026-04-30)

Fixes

Fixed FileResult.copy() missing destination argument
Improved API parameter handling

Version 0.3.0 (2026-04-29)

New Features

Fluent API with dynamic methods for each file type
FileResult class for file operations

Version 0.2.0 (2026-04-29)

New Features

Initial file search functionality
Support for multiple file types
Smart file selection based on date and version

📄 License

MIT License - See LICENSE file for details

👨‍💻 Author

LJD-UwU

Email: himexpe.interns@hisense.com
GitHub: @LJD-UwU

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

🔧 Data Processing: `process_file`

The process_file module provides tools for cleaning and processing Excel data.

`clean_sheet()` - Clean and Flatten Excel Data

Automatically detects headers, cleans data, and creates a structured "Datos_Limpios" sheet.

Features:

✅ Automatic header detection
✅ Data normalization and cleaning
✅ Professional formatting (colors, borders, frozen headers)
✅ Intelligent column width adjustment
✅ Removes empty rows and duplicate columns
✅ Returns pandas DataFrame for further analysis

Usage:

from documentprocessinghub import clean_sheet

# Option 1: Clean and overwrite
df = clean_sheet("datos.xlsx")

# Option 2: Save to new file
df = clean_sheet("entrada.xlsx", output_path="salida.xlsx")

# Option 3: Process specific sheet
df = clean_sheet("datos.xlsx", nombre_hoja="Producción")

# Result is a pandas DataFrame
print(df.shape)      # (rows, columns)
print(df.columns)    # Column names
print(df.head())     # First rows

What It Does:

Reads the Excel file
Detects headers and data rows
Cleans data (removes nulls, normalizes columns)
Applies professional formatting
Creates "Datos_Limpios" sheet with cleaned data
Returns DataFrame for analysis

📚 References

Made with care for document processing automation

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.6.0

Apr 30, 2026

This version

0.5.0

Apr 30, 2026

0.4.0

Apr 30, 2026

0.3.0

Apr 29, 2026

0.2.0

Apr 29, 2026

0.1.0

Apr 29, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

documentprocessinghub_ljd-0.5.0.tar.gz (19.7 kB view details)

Uploaded Apr 30, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

documentprocessinghub_ljd-0.5.0-py3-none-any.whl (16.9 kB view details)

Uploaded Apr 30, 2026 Python 3

File details

Details for the file documentprocessinghub_ljd-0.5.0.tar.gz.

File metadata

Download URL: documentprocessinghub_ljd-0.5.0.tar.gz
Upload date: Apr 30, 2026
Size: 19.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for documentprocessinghub_ljd-0.5.0.tar.gz
Algorithm	Hash digest
SHA256	`c12d8abe0fc414c19d32160a58d9c3f8520a5eb1a190725571ed518b5864611c`
MD5	`348bc7999e7d0eaba709eafc8994e24e`
BLAKE2b-256	`d791ee3c708a09135f0a882fa32c921a982084c87bc463cb85a04c2110e74ee8`

See more details on using hashes here.

File details

Details for the file documentprocessinghub_ljd-0.5.0-py3-none-any.whl.

File metadata

Download URL: documentprocessinghub_ljd-0.5.0-py3-none-any.whl
Upload date: Apr 30, 2026
Size: 16.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for documentprocessinghub_ljd-0.5.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`578b88cd12fdd85dfcb7fad2764f0e63b793973cf3fb5e8811a36a8867fe4e2f`
MD5	`20d9a722ce3af771025bf9a2237068f6`
BLAKE2b-256	`f4e57a4018501a71748a65faa4aa5a86679efc2b2eb62c469ececfef4880e6a6`

See more details on using hashes here.

documentprocessinghub-ljd 0.5.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Document Processing Hub

🌟 Features

📦 Installation

From PyPI

From GitHub

🚀 Quick Start

🔍 API Reference: search_file

Overview

Mode 1: search_file.exists - Predefined Locations

Example 1: Get File Path Only

Example 2: Find and Copy

Example 3: Find and Move

Mode 2: search_file.local - Custom Folders

Example 1: Get File Path from Local Folder

Example 2: Search Local and Copy

Example 3: Search Local and Move

Example 4: Process Multiple File Types

File Selection Criteria

💡 Common Patterns

Pattern 1: Daily Backup

Pattern 2: Processing Pipeline

Pattern 3: Safe Backup Before Processing

🏗️ Project Structure

📝 Changelog

Version 0.4.0 (2026-04-30)

Version 0.3.1 (2026-04-30)

Version 0.3.0 (2026-04-29)

Version 0.2.0 (2026-04-29)

📄 License

👨‍💻 Author

🤝 Contributing

🔧 Data Processing: process_file

clean_sheet() - Clean and Flatten Excel Data

Features:

Usage:

What It Does:

📚 References

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

🔍 API Reference: `search_file`

Mode 1: `search_file.exists` - Predefined Locations

Mode 2: `search_file.local` - Custom Folders

🔧 Data Processing: `process_file`

`clean_sheet()` - Clean and Flatten Excel Data