SAS → Parquet Hybrid Converter & Validator
Project description
sas2parquet
The ultimate SAS (.sas7bdat) to Parquet converter — built to handle files that fail with standard tools.
sas2parquet automatically detects encodings, repairs schemas, infers correct data types, and performs pixel-perfect validation between SAS and Parquet outputs.
✨ Features
| Feature | Description |
|---|---|
| 🔄 Auto Encoding | Detects UTF-8, Latin1, CP1252 from metadata or fallback |
| 🧠 Smart Types | Infers datetime, numeric, string with 20+ retry strategies |
| ✅ Validation | Chunk-by-chunk comparison (metadata, counts, values) |
| 📊 Memory Safe | Chunked processing (96GB RAM optimized, configurable) |
| 💾 ZSTD Compression | Level-6 ZSTD for efficient Parquet storage |
| 📝 Detailed Logs | Full conversion trace + mismatch reports |
| 🎯 Two Modes | Single file or recursive directory processing |
🚀 Quick Start
Install
pip install sas2parquet
✅ Usage
Convert a directory (recommended)
sas2parquet path/to/sasdata/
- Converts all
.sas7bdatfiles recursively - Creates
parquetdata/andlogging/next tosasdata/
Convert a single file
sas2parquet path/to/file.sas7bdat
Output (default):
path/to/file.parquet
Specify output location
Directory mode — custom output directory
sas2parquet path/to/sasdata/ --out path/to/parquetdata/
File mode — custom output file
sas2parquet path/to/file.sas7bdat --out path/to/output.parquet
Custom log directory (directory mode)
sas2parquet path/to/sasdata/ --log-dir path/to/logs/
📁 Directory Mode Behavior
your-project/
├── sasdata/
│ ├── file1.sas7bdat
│ └── subfolder/
│ └── nested.sas7bdat
├── parquetdata/
│ ├── file1.parquet
│ └── subfolder_parquet/
│ └── nested.parquet
└── logging/
└── conversion_20260205_1145.log
🛠️ CLI Reference
sas2parquet --help
⚙️ Configuration (Advanced)
Edit constants in:
src/sas2parquet/convert.py
👥 Authors
This package was built collaboratively.
- Zaman Ziabakhshganji — creator and maintainer
- Farshad Radman — creator and maintainer
- Jos van Dongen — co-author and contributor
📄 License
MIT License
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sas2parquet-1.0.8.tar.gz.
File metadata
- Download URL: sas2parquet-1.0.8.tar.gz
- Upload date:
- Size: 8.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.3.1 CPython/3.11.0 Darwin/25.2.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
36bdfd8945669c0588863ab1045eca433cdd1fb94dc3151583947f2d04bef382
|
|
| MD5 |
8cd77cb41ea7e063e3fcfa3458dc3e2e
|
|
| BLAKE2b-256 |
85471a5d803a936f928a5b1124426f18653b23354d8a44329f531a5ab74fb231
|
File details
Details for the file sas2parquet-1.0.8-py3-none-any.whl.
File metadata
- Download URL: sas2parquet-1.0.8-py3-none-any.whl
- Upload date:
- Size: 10.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.3.1 CPython/3.11.0 Darwin/25.2.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c51b5eb540e55f4fc32037d5c957dee0be09b1d01adf7d16de5f5939ded08b71
|
|
| MD5 |
09ba7d41644828909b8eb0b04f1bcbae
|
|
| BLAKE2b-256 |
58bb45557d5b2cd2566b5b5d8c253c4ff6cd3dac36ee3d4612ea0c8ac8499d0e
|