Skip to main content

SAS → Parquet Hybrid Converter & Validator

Project description

sas2parquet

PyPI version Python versions License

The ultimate SAS (.sas7bdat) to Parquet converter — built to handle files that fail with standard tools.

sas2parquet automatically detects encodings, repairs schemas, infers correct data types, and performs pixel-perfect validation between SAS and Parquet outputs.


✨ Features

Feature Description
🔄 Auto Encoding Detects UTF-8, Latin1, CP1252 from metadata or fallback
🧠 Smart Types Infers datetime, numeric, string with 20+ retry strategies
Validation Chunk-by-chunk comparison (metadata, counts, values)
📊 Memory Safe Chunked processing (96GB RAM optimized, configurable)
💾 ZSTD Compression Level-6 ZSTD for efficient Parquet storage
📝 Detailed Logs Full conversion trace + mismatch reports
🎯 Two Modes Single file or recursive directory processing

🚀 Quick Start

Install

pip install sas2parquet

✅ Usage

Convert a directory (recommended)

sas2parquet path/to/sasdata/
  • Converts all .sas7bdat files recursively
  • Creates parquetdata/ and logging/ next to sasdata/

Convert a single file

sas2parquet path/to/file.sas7bdat

Output (default):

path/to/file.parquet

Specify output location

Directory mode — custom output directory

sas2parquet path/to/sasdata/ --out path/to/parquetdata/

File mode — custom output file

sas2parquet path/to/file.sas7bdat --out path/to/output.parquet

Custom log directory (directory mode)

sas2parquet path/to/sasdata/ --log-dir path/to/logs/

📁 Directory Mode Behavior

your-project/
├── sasdata/
│   ├── file1.sas7bdat
│   └── subfolder/
│       └── nested.sas7bdat
├── parquetdata/
│   ├── file1.parquet
│   └── subfolder_parquet/
│       └── nested.parquet
└── logging/
    └── conversion_20260205_1145.log

🛠️ CLI Reference

sas2parquet --help

⚙️ Configuration (Advanced)

Edit constants in:

src/sas2parquet/convert.py

📄 License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sas2parquet-1.0.5.tar.gz (8.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sas2parquet-1.0.5-py3-none-any.whl (9.9 kB view details)

Uploaded Python 3

File details

Details for the file sas2parquet-1.0.5.tar.gz.

File metadata

  • Download URL: sas2parquet-1.0.5.tar.gz
  • Upload date:
  • Size: 8.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.3.1 CPython/3.11.0 Darwin/25.2.0

File hashes

Hashes for sas2parquet-1.0.5.tar.gz
Algorithm Hash digest
SHA256 fec267f0d0bcb6284e41915f21912dfb7e6f845b75ef931b563ca7368558f5c3
MD5 9369fb952fe97e1b6016e5197d1a13b8
BLAKE2b-256 0e52f8a5c6238462b8c01a8b6cf179bab0121dddd2b7835509bf7ab42d63cb55

See more details on using hashes here.

File details

Details for the file sas2parquet-1.0.5-py3-none-any.whl.

File metadata

  • Download URL: sas2parquet-1.0.5-py3-none-any.whl
  • Upload date:
  • Size: 9.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.3.1 CPython/3.11.0 Darwin/25.2.0

File hashes

Hashes for sas2parquet-1.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 4877808055cff5d64f1d78095e59879fa6e98fc4ee3880ade15c343f743d22e0
MD5 407a1b69f7adf1c6fba0857e8bc5d90f
BLAKE2b-256 1233a795696359f8981fdc4acf96c764974baf2665b10eec77d631a79ed814d9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page