Safe atomic file writer for Pandas, Polars, NumPy, and other data objects

These details have not been verified by PyPI

Project links

Homepage

Project description

Atio

🛡️ Safe Atomic File Writing Library for Python

Pandas Polars NumPy PyArrow SQLAlchemy OpenPyXL

📋 Table of Contents

🎯 Overview
🚀 30-Second Quick Start
📊 Supported Formats & Libraries
🏗️ Architecture
⚡ Performance Comparison
💡 Real-World Use Cases
🎯 Core Features
🔧 Advanced Usage
🛠️ Installation
📚 Documentation & Examples
🏆 Why Choose Atio?
📄 License

🎯 Overview

Atio is a Python library that prevents data loss and ensures safe file writing. Through atomic writing, it protects existing data even when errors occur during file writing, and supports various data formats and database connections.

✨ Why Atio?

🔒 Zero Data Loss: Atomic operations guarantee file integrity
⚡ High Performance: Minimal overhead with maximum safety
🔄 Auto Rollback: Automatic recovery when errors occur
📊 Universal Support: Works with Pandas, Polars, NumPy, and more
🎯 Simple API: Drop-in replacement for existing code

🚀 30-Second Quick Start

pip install atio

import atio
import pandas as pd

# Create sample data
df = pd.DataFrame({
    "name": ["Alice", "Bob", "Charlie"],
    "age": [25, 30, 35],
    "city": ["Seoul", "Busan", "Incheon"]
})

# Safe atomic writing
atio.write(df, "users.parquet", format="parquet")
# ✅ File saved safely with atomic operation!

📊 Supported Formats & Libraries

Format	Pandas	Polars	NumPy	Description
CSV	✅	✅	✅	Comma-separated values
Parquet	✅	✅	❌	Columnar storage format
Excel	✅	✅	❌	Microsoft Excel files
JSON	✅	✅	❌	JavaScript Object Notation
SQL	✅	❌	❌	SQL database storage
Database	❌	✅	❌	Direct database connection
NPY/NPZ	❌	❌	✅	NumPy binary formats
Pickle	✅	❌	❌	Python serialization
HTML	✅	❌	❌	HTML table format

🏗️ Architecture

Atomic Writing Process

graph LR
    A[Data Object] --> B[Temp File]
    B --> C[Validation]
    C --> D[Atomic Replace]
    D --> E[Success Flag]
    
    C -->|Error| F[Rollback]
    F --> G[Original File Preserved]
    
    style A fill:#e1f5fe
    style E fill:#c8e6c9
    style F fill:#ffcdd2
    style G fill:#c8e6c9

Key Components

🛡️ Atomic Operations: Temporary file → Validation → Atomic replacement
🔄 Rollback Mechanism: Automatic recovery on failure
📈 Progress Monitoring: Real-time progress for large files
📋 Version Management: Snapshot-based data versioning
🧹 Auto Cleanup: Automatic cleanup of temporary files

💡 Real-World Use Cases

🔥 Data Pipeline Protection

# ETL pipeline with automatic rollback
try:
    atio.write(processed_data, "final_results.parquet", format="parquet")
    print("✅ Pipeline completed successfully")
except Exception as e:
    print("❌ Pipeline failed, but original data is safe")
    # Original file remains untouched

🧪 Machine Learning Experiments

# Version-controlled experiment results
atio.write_snapshot(model_results, "experiment_v1", mode="overwrite")
atio.write_snapshot(improved_results, "experiment_v1", mode="append")

# Rollback to previous version if needed
atio.rollback("experiment_v1", version_id=1)

📊 Large Data Processing

# Progress monitoring for large datasets
atio.write(large_df, "big_data.parquet", 
          format="parquet", 
          show_progress=True)
# Shows: ⠋ Writing big_data.parquet... [ 45.2 MB | 12.3 MB/s | 00:15 ]

🎯 Core Features

1. Atomic File Writing

# Safe writing with automatic rollback
atio.write(df, "data.parquet", format="parquet")
# Creates: data.parquet + .data.parquet._SUCCESS

2. Database Integration

# Direct database storage
from sqlalchemy import create_engine
engine = create_engine('postgresql://user:pass@localhost/db')
atio.write(df, format="sql", name="users", con=engine, if_exists="replace")

3. Version Management

# Snapshot-based versioning
atio.write_snapshot(df, "my_table", mode="overwrite")  # v1
atio.write_snapshot(new_df, "my_table", mode="append") # v2

# Read specific version
df_v1 = atio.read_table("my_table", version=1)

4. Progress Monitoring

# Real-time progress for large files
atio.write(large_df, "data.parquet", 
          format="parquet", 
          show_progress=True,
          verbose=True)

🔧 Advanced Usage

Multi-Format Support

import polars as pl
import numpy as np

# Polars DataFrame
pl_df = pl.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})
atio.write(pl_df, "data.parquet", format="parquet")

# NumPy Arrays
arr = np.random.randn(1000, 100)
atio.write(arr, "array.npy", format="npy")

# Multiple arrays
atio.write({'arr1': arr, 'arr2': arr*2}, "arrays.npz", format="npz")

Error Handling & Recovery

# Automatic rollback on failure
try:
    atio.write(df, "data.parquet", format="parquet")
except Exception as e:
    print(f"Write failed: {e}")
    # Original file is automatically preserved

Performance Monitoring

# Detailed performance analysis
atio.write(df, "data.parquet", format="parquet", verbose=True)
# Output:
# [INFO] Temporary directory created: /tmp/tmp12345
# [INFO] Writer to use: to_parquet (format: parquet)
# [INFO] ✅ File writing completed (total time: 0.1234s)

🛠️ Installation

Basic Installation

pip install atio

With Optional Dependencies

# For Excel support
pip install atio[excel]

# For database support
pip install atio[database]

# For all features
pip install atio[all]

Development Installation

git clone https://github.com/seojaeohcode/atio.git
cd atio
pip install -e .

📚 Documentation & Examples

📖 Documentation

Complete Documentation - Full API reference
Quick Start Guide - Get started in minutes
Advanced Usage - Power user features

🎯 Examples

📝 Basic Usage - Simple file operations

import atio
import pandas as pd

# Create sample data
df = pd.DataFrame({
    "name": ["Alice", "Bob", "Charlie"],
    "age": [25, 30, 35],
    "city": ["Seoul", "Busan", "Incheon"]
})

# Safe atomic writing
atio.write(df, "users.parquet", format="parquet")
print("✅ File saved safely!")

# Read back to verify
df_read = pd.read_parquet("users.parquet")
print(df_read)

📊 Progress Monitoring - Large file handling

import atio
import pandas as pd
import numpy as np

# Create large dataset
large_df = pd.DataFrame(np.random.randn(200000, 5), columns=list("ABCDE"))

# Save with progress monitoring
atio.write(large_df, "large_data.parquet", 
          format="parquet", 
          show_progress=True)
# Shows: ⠋ Writing large_data.parquet... [ 45.2 MB | 12.3 MB/s | 00:15 ]

📋 Snapshot Management - Version control

import atio
import pandas as pd

# Version 1: Initial data
df_v1 = pd.DataFrame({"id": [1, 2, 3], "value": ["A", "B", "C"]})
atio.write_snapshot(df_v1, "my_table", mode="overwrite")

# Version 2: Append new data
df_v2 = pd.DataFrame({"score": [95, 87, 92]})
atio.write_snapshot(df_v2, "my_table", mode="append")

# Read specific version
df_latest = atio.read_table("my_table")  # Latest version
df_v1 = atio.read_table("my_table", version=1)  # Version 1

⚡ Performance Testing - Benchmarking

import atio
import pandas as pd
import time

# Performance comparison
df = pd.DataFrame(np.random.randn(100000, 10))

# Standard pandas
start = time.time()
df.to_parquet("standard.parquet")
pandas_time = time.time() - start

# Atio with safety
start = time.time()
atio.write(df, "safe.parquet", format="parquet", verbose=True)
atio_time = time.time() - start

print(f"Pandas: {pandas_time:.3f}s")
print(f"Atio: {atio_time:.3f}s")
print(f"Safety overhead: {((atio_time/pandas_time - 1) * 100):.1f}%")

🧪 Test Scenarios

⌨️ Keyboard Interrupt - Ctrl+C safety

# test_interrupt.py
import atio
import pandas as pd
import numpy as np

print("Creating large dataset...")
df = pd.DataFrame(np.random.randn(1000000, 10))

print("Starting write operation...")
print("Press Ctrl+C to test interrupt safety!")

try:
    atio.write(df, "test_interrupt.parquet", 
              format="parquet", 
              show_progress=True)
    print("✅ Write completed successfully!")
except KeyboardInterrupt:
    print("❌ Interrupted by user!")
    print("🔍 Checking file safety...")
    import os
    if os.path.exists("test_interrupt.parquet"):
        print("⚠️  File exists but may be corrupted")
    else:
        print("✅ No corrupted file left behind!")

💾 Out of Memory - Memory failure handling

# test_oom.py
import atio
import pandas as pd
import numpy as np

def simulate_oom():
    print("Creating extremely large dataset...")
    # This will likely cause OOM
    huge_df = pd.DataFrame(np.random.randn(10000000, 100))
    
    print("Attempting to save...")
    try:
        atio.write(huge_df, "huge_data.parquet", format="parquet")
        print("✅ Successfully saved!")
    except MemoryError:
        print("❌ Out of Memory error!")
        print("✅ But original file is safe!")
    except Exception as e:
        print(f"❌ Error: {e}")
        print("✅ Atio protected your data!")

# Run the test
simulate_oom()

🚀 CI/CD Pipeline - Automated deployment safety

# ci_pipeline.py
import atio
import pandas as pd
import os

def deploy_artifacts():
    """Simulate CI/CD pipeline deployment"""
    
    # Generate deployment artifacts
    config = pd.DataFrame({
        "service": ["api", "web", "db"],
        "version": ["v1.2.3", "v1.2.3", "v1.2.3"],
        "status": ["ready", "ready", "ready"]
    })
    
    metrics = pd.DataFrame({
        "metric": ["cpu", "memory", "disk"],
        "value": [75.5, 68.2, 45.1],
        "unit": ["%", "%", "%"]
    })
    
    print("🚀 Starting deployment...")
    
    try:
        # Atomic deployment - either all succeed or all fail
        atio.write(config, "deployment_config.json", format="json")
        atio.write(metrics, "deployment_metrics.parquet", format="parquet")
        
        # Create success marker
        atio.write(pd.DataFrame({"status": ["deployed"]}), 
                  "deployment_success.parquet", format="parquet")
        
        print("✅ Deployment completed successfully!")
        return True
        
    except Exception as e:
        print(f"❌ Deployment failed: {e}")
        print("🔄 Rolling back...")
        
        # Clean up any partial files
        for file in ["deployment_config.json", "deployment_metrics.parquet"]:
            if os.path.exists(file):
                os.remove(file)
        
        print("✅ Rollback completed - system is clean!")
        return False

# Test the pipeline
deploy_artifacts()

🏆 Why Choose Atio?

✅ Data Safety First

Zero data loss even during system failures
Automatic rollback on any error
File integrity guaranteed by atomic operations

⚡ Performance Optimized

Minimal overhead (1.1-1.2x vs native libraries)
Progress monitoring for large files
Memory efficient processing

🔧 Developer Friendly

Drop-in replacement for existing code
Simple API with powerful features
Comprehensive documentation and examples

🌐 Universal Compatibility

Multiple data formats (CSV, Parquet, Excel, JSON, etc.)
Multiple libraries (Pandas, Polars, NumPy)
Database integration (SQL, NoSQL)

📄 License

This project is distributed under the Apache 2.0 License. See the LICENSE file for details.

🛡️ Atio - Because your data deserves to be safe

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

3.1.0

Oct 24, 2025

3.0.1

Oct 24, 2025

3.0.0

Oct 24, 2025

2.1.0

Aug 14, 2025

2.0.0

Aug 13, 2025

1.0.0

Aug 2, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

atio-3.1.0.tar.gz (49.2 kB view details)

Uploaded Oct 24, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

atio-3.1.0-py3-none-any.whl (34.6 kB view details)

Uploaded Oct 24, 2025 Python 3

File details

Details for the file atio-3.1.0.tar.gz.

File metadata

Download URL: atio-3.1.0.tar.gz
Upload date: Oct 24, 2025
Size: 49.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for atio-3.1.0.tar.gz
Algorithm	Hash digest
SHA256	`705a83faf459d571e03013883ade50619139d1ecb888fc14193b15cf5921494f`
MD5	`8b6ee16bc3a6718a3a5c7353625fd18d`
BLAKE2b-256	`b502fc51ea72d046d11bc56f41a6ef0a9d6d01c7d9966ffec82fd53d99847154`

See more details on using hashes here.

File details

Details for the file atio-3.1.0-py3-none-any.whl.

File metadata

Download URL: atio-3.1.0-py3-none-any.whl
Upload date: Oct 24, 2025
Size: 34.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for atio-3.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2cdc2ee846587954dc8022965423fbd571e085e1c1fd8425641312965c586700`
MD5	`67de26219f23655b204364858caa713e`
BLAKE2b-256	`bc96b6085a3c14464ed6b82b53d7ed9717f8f5b6e1691639de1dcedadce75407`

See more details on using hashes here.

atio 3.1.0

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

Atio

📋 Table of Contents

🎯 Overview

✨ Why Atio?

🚀 30-Second Quick Start

📊 Supported Formats & Libraries

🏗️ Architecture

Atomic Writing Process

Key Components

💡 Real-World Use Cases

🔥 Data Pipeline Protection

🧪 Machine Learning Experiments

📊 Large Data Processing

🎯 Core Features

1. Atomic File Writing

2. Database Integration

3. Version Management

4. Progress Monitoring

🔧 Advanced Usage

Multi-Format Support

Error Handling & Recovery

Performance Monitoring

🛠️ Installation

Basic Installation

With Optional Dependencies

Development Installation

📚 Documentation & Examples

📖 Documentation

🎯 Examples

📝 Basic Usage - Simple file operations

📊 Progress Monitoring - Large file handling

📋 Snapshot Management - Version control

⚡ Performance Testing - Benchmarking

🧪 Test Scenarios

⌨️ Keyboard Interrupt - Ctrl+C safety

💾 Out of Memory - Memory failure handling

🚀 CI/CD Pipeline - Automated deployment safety

🏆 Why Choose Atio?

✅ Data Safety First

⚡ Performance Optimized

🔧 Developer Friendly

🌐 Universal Compatibility

📄 License

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes