Skip to main content

SAS → Parquet Hybrid Converter & Validator

Project description

sas2parquet

PyPI version Python versions License

The ultimate SAS (.sas7bdat) to Parquet converter — built to handle files that fail with standard tools.

sas2parquet automatically detects encodings, repairs schemas, infers correct data types, and performs pixel-perfect validation between SAS and Parquet outputs.


✨ Features

Feature Description
🔄 Auto Encoding Detects UTF-8, Latin1, CP1252 from metadata or fallback
🧠 Smart Types Infers datetime, numeric, string with 20+ retry strategies
Validation Chunk-by-chunk comparison (metadata, counts, values)
📊 Memory Safe Chunked processing (96GB RAM optimized, configurable)
💾 ZSTD Compression Level-6 ZSTD for efficient Parquet storage
📝 Detailed Logs Full conversion trace + mismatch reports
🎯 Two Modes Single file or recursive directory processing

🚀 Quick Start

Install

pip install sas2parquet

✅ Usage

Convert a directory (recommended)

sas2parquet path/to/sasdata/
  • Converts all .sas7bdat files recursively
  • Creates parquetdata/ and logging/ next to sasdata/

Convert a single file

sas2parquet path/to/file.sas7bdat

Output (default):

path/to/file.parquet

Specify output location

Directory mode — custom output directory

sas2parquet path/to/sasdata/ --out path/to/parquetdata/

File mode — custom output file

sas2parquet path/to/file.sas7bdat --out path/to/output.parquet

Custom log directory (directory mode)

sas2parquet path/to/sasdata/ --log-dir path/to/logs/

📁 Directory Mode Behavior

your-project/
├── sasdata/
│   ├── file1.sas7bdat
│   └── subfolder/
│       └── nested.sas7bdat
├── parquetdata/
│   ├── file1.parquet
│   └── subfolder_parquet/
│       └── nested.parquet
└── logging/
    └── conversion_20260205_1145.log

🛠️ CLI Reference

sas2parquet --help

⚙️ Configuration (Advanced)

Edit constants in:

src/sas2parquet/convert.py

👥 Authors

This package was built collaboratively.

  • Zaman Ziabakhshganji — creator and maintainer
  • Farshad Radman — creator and maintainer
  • Jos van Dongen — co-author and contributor

📄 License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sas2parquet-1.0.8.tar.gz (8.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sas2parquet-1.0.8-py3-none-any.whl (10.0 kB view details)

Uploaded Python 3

File details

Details for the file sas2parquet-1.0.8.tar.gz.

File metadata

  • Download URL: sas2parquet-1.0.8.tar.gz
  • Upload date:
  • Size: 8.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.3.1 CPython/3.11.0 Darwin/25.2.0

File hashes

Hashes for sas2parquet-1.0.8.tar.gz
Algorithm Hash digest
SHA256 36bdfd8945669c0588863ab1045eca433cdd1fb94dc3151583947f2d04bef382
MD5 8cd77cb41ea7e063e3fcfa3458dc3e2e
BLAKE2b-256 85471a5d803a936f928a5b1124426f18653b23354d8a44329f531a5ab74fb231

See more details on using hashes here.

File details

Details for the file sas2parquet-1.0.8-py3-none-any.whl.

File metadata

  • Download URL: sas2parquet-1.0.8-py3-none-any.whl
  • Upload date:
  • Size: 10.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.3.1 CPython/3.11.0 Darwin/25.2.0

File hashes

Hashes for sas2parquet-1.0.8-py3-none-any.whl
Algorithm Hash digest
SHA256 c51b5eb540e55f4fc32037d5c957dee0be09b1d01adf7d16de5f5939ded08b71
MD5 09ba7d41644828909b8eb0b04f1bcbae
BLAKE2b-256 58bb45557d5b2cd2566b5b5d8c253c4ff6cd3dac36ee3d4612ea0c8ac8499d0e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page