Simple, powerful UBL XML to JSON/CSV converter with built-in exception handling
Project description
ublkit
Simple, powerful UBL XML to JSON/CSV converter with built-in exception handling
ublkit is a lightweight wrapper that converts UBL XML documents (Invoice, CreditNote, Order, DespatchAdvice, etc.) to JSON or CSV format with a simple, clean API.
✨ Features
- 🚀 Zero Configuration - Works out of the box with sensible defaults
- 📁 Flexible Output - Convert to JSON or flattened CSV format
- 🎯 Single File or Batch - Process one file or entire directories
- 🔄 Parallel Processing - Fast batch conversion with multithreading
- 📊 CSV File Splitting - Automatically split large CSVs into manageable chunks
- 🛡️ Robust Error Handling - Never crashes, always provides detailed error info
- 📝 Comprehensive Logging - Uses py-logex for production-grade logging
- ⚙️ YAML Configuration - Easy, flexible configuration
- 🎨 Data Preservation - Prevents Excel from corrupting your data
- 📋 Detailed Summaries - File-by-file status and aggregate statistics
📦 Installation
pip install ublkit
Requirements:
- Python >= 3.8
- lxml >= 4.9.0
- polars >= 0.19.0
- pyyaml >= 6.0
- py-logex-enhanced >= 0.1.0
🚀 Quick Start
Single File Conversion
from ublkit import convert_file
# Convert to JSON
result = convert_file(
xml_path="invoice.xml",
output_format="json",
config_path="./config/ublkit.yaml"
)
# Result contains everything in memory
if result["success"]:
print(f"UBL Type: {result['ubl_document_type']}")
print(f"Processing time: {result['processing_time_seconds']:.2f}s")
data = result["content"] # Your converted data
else:
print(f"Error: {result['error_message']}")
Batch Processing
from ublkit import convert_batch
# Convert entire directory to CSV
summary = convert_batch(
input_dir="./xml_files",
output_dir="./output",
output_format="csv",
config_path="./config/ublkit.yaml"
)
print(f"Processed: {summary.total_files}")
print(f"Successful: {summary.successful}")
print(f"Failed: {summary.failed}")
⚙️ Configuration
Create ublkit.yaml in your project root:
# Logging configuration (uses py-logex library)
logging:
level: "INFO"
file: "ublkit.log"
rotation: "500 MB"
retention: "10 days"
compression: "zip"
# Processing configuration
processing:
max_workers: 4 # Parallel threads
encoding: "utf-8"
# CSV output configuration
csv:
max_records_per_file: 50000 # Split large CSVs
preservation_method: "apostrophe" # Prevent Excel corruption
key_separator: " | "
xml:
preserve_namespace_prefix: true
json:
flatten: true # flattened or nested json
separator: "/"
# Output directories
output:
summary_dir: "./summaries"
logs_dir: "./logs"
# Feature flags
features:
enable_dry_run: false
CSV Preservation Methods
Prevent Excel from corrupting your data:
apostrophe: Prepends'to values (Excel standard)quotes: Wraps values in double quotesbrackets: Wraps values in[]
🎯 API Reference
convert_file()
Convert a single XML file (in-memory, no disk writes).
result = convert_file(
xml_path: str, # Path to UBL XML file
output_format: str, # "json" or "csv"
config_path: str # Path to ublkit.yaml (required)
) -> dict
Returns:
{
"success": bool,
"error_message": str,
"processing_time_seconds": float,
"source_file": str,
"file_size_bytes": int,
"ubl_document_type": str,
"output_format": str,
"content": dict | list # Converted data
}
convert_batch()
Convert multiple XML files (writes to disk).
summary = convert_batch(
input_dir: str, # Directory containing XML files
output_dir: str, # Output directory
output_format: str, # "json" or "csv"
config_path: str # Path to ublkit.yaml (required)
) -> ProcessingSummary
Returns: ProcessingSummary object with:
total_files: Total files processedsuccessful: Successfully convertedfailed: Failed conversionsresults: List of per-file resultsstart_time,end_time: Processing timestamps
🛠️ CLI Usage
# Single file to JSON
ublkit convert invoice.xml --format json --output output.json --config ublkit.yaml
# Batch to CSV
ublkit batch ./xml_files ./output --format csv --config ublkit.yaml
# Dry run (preview without writing)
ublkit batch ./xml_files ./output --dry-run --config ublkit.yaml
📊 CSV Output Format
UBLKit flattens nested XML into key-value pairs:
Key,Value,Filename
Invoice | ID | value,'INV-001',invoice_001.xml
Invoice | IssueDate | value,'2024-12-27',invoice_001.xml
Invoice | AccountingSupplierParty | Party | PartyName | Name | value,'ACME Corp',invoice_001.xml
Benefits:
- ✅ See all data at a glance
- ✅ Easy validation and debugging
- ✅ Works with any UBL document type
- ✅ Automatic file splitting for large datasets
🧪 Development
# Clone repository
git clone https://github.com/sherozshaikh/ublkit.git
cd ublkit
# Create virtual environment
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
# Install in development mode
pip install -e ".[dev]"
# Run tests
pytest
# Run tests with coverage
pytest --cov=ublkit --cov-report=html
# Format code
black src tests
isort src tests
# Type checking
mypy src
📖 Supported UBL Document Types
UBLKit works with any UBL 2.x document type:
- Invoice
- CreditNote
- DebitNote
- Order
- OrderResponse
- DespatchAdvice
- ReceiptAdvice
- ApplicationResponse
- And more...
🤝 Contributing
Contributions welcome! Please:
- Fork the repository
- Create a feature branch
- Make your changes
- Run tests:
pytest - Submit a pull request
📄 License
MIT License - see LICENSE file for details.
🙏 Acknowledgments
- Built with lxml for robust XML processing
- Uses polars for efficient CSV operations
- Powered by py-logex for production logging
📧 Support
- Issues: GitHub Issues
- PyPI: https://pypi.org/project/ublkit/
Made with ❤️ for the UBL community
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ublkit-0.1.3.tar.gz.
File metadata
- Download URL: ublkit-0.1.3.tar.gz
- Upload date:
- Size: 23.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6885335daad7fc2eae7eb8069b1b5d3cb080078ebefa0b20693fd090e4e29401
|
|
| MD5 |
2befc6f339a84efbf69b805424eee5ba
|
|
| BLAKE2b-256 |
ba356f4dccca0c32a7f241e49ae37059b7db34f6c34abf1cce1814c196b1bd33
|
File details
Details for the file ublkit-0.1.3-py3-none-any.whl.
File metadata
- Download URL: ublkit-0.1.3-py3-none-any.whl
- Upload date:
- Size: 24.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
960730eb5d025e69dbbb9b91f5afdbd3262f3d218413f9866e19298b11c75817
|
|
| MD5 |
11179c352afb80030a7eedeb553373b9
|
|
| BLAKE2b-256 |
7176a883fec0a41102a5ea5215256ccbf1a9073c52fae3bcf6ff99dd364584cc
|